06 October 2008

First you find the haystack

I have talked a bit about recommender systems before. These are the subject of many mathematical models and you can get a lot of very dense articles about the subject if you want to know everything. I'm going to give a very high-level oversimplified version here.

There are two basic types of recommender systems, collaborative filtering ("This is what people like you prefer.") and content-based filtering ("If you like that, you should try this."). There is no pure recommender system, but generally Last.fm is a collaborative filtering system and Pandora is a content-based filtering system. Pandora relies on the large music genome database to identify music similar to music you tell it you like. Last.fm uses a large database of playing histories to identify people who listen to the same bands in similar ways as you, then recommends music from their play histories. On the gripping hand, Amazon.com mixes the Last.fm approach with a standard package of web shopping preference identification and recommendations, because they also track what you look at but don't listen to (and how long you look at each one, and how you follow a trail of links around, etc.)

In practice, any system in use is a hybrid. Pandora passes you individual tracks from bands that tend to have similar tracks to the ones you name, then refines that with your own feedback. They also offer you access to the stations other people have developed, so you can use a band name to find someone who has similar tastes to you, and then see what else they listen to. Last.fm will play you a station of recommendations based on similar users, but they also have user track tagging that you can use to get recommendations. That tagging is both content-based and collaborative. PayPlay.fm uses your Last.fm profile and their own content-based and collaborative data to make recommendations (and in my experience better than any of the others). And PandoraFM and similar bridge the Pandora and Last.fm systems, hybridizing the hybrids. Amazon is probably telling someone "You bought a dish drainer and gardening tools, so we recommend you try Hoobastank and Jack Johnson." I have no idea how the YouTube related videos feature works (matching substrings in titles, with some other content-based stuff on top of that. Hybridized with the user favorites and channels collaborative filtering, I'd guess), but it is busy doing something that leads you all around the videoverse.

That all looks like quite a tangle of possibilities, but there is a layer of preference/recommendation above all that. To use any of these systems, you have to choose to use it. Right at the start, you have made a big filter of who "people like me" are. SmartPunk users are probably not quite the same crowd as Soundflavor users. Youtube music hunters on safari are different from Pandora fanatic hackers. And how did you find out about the system you choose, anyway? Your face-to-face friends are involved in that filtering layer, as well.

What is sad, of course, is that I am such a geek that I have not only used all these recommender systems (and more), I have worried out how they work under the hood (at this high level). In the future, look for more on these systems and on music in social networking sites. I am also thinking about bellwethers, so you will probably be reading about them, too. And the nature and impact and money trail of the super ginormous databases that sit under all these systems.

No comments: