Should We Care About Content? Recommending by Proxy with Big Metadata

Data Science
Location: Room 1-6 Level: Intermediate
Average rating: ***..
(3.25, 4 ratings)

Classic recommender systems are built around proxy information; many users’ opinions of items are used to predict the opinions of other users on those items (or vice versa). This approach is entirely domain agnostic. As this core recommender approach has spread across various media domains (eg. books, film, music) many efforts have been made to use content-based information in addition to or instead of user opinions and other socio-cultural information. Despite this, content-based data has at best not been helpful and at worst significantly reduced the performance of a recommender.

In this talk we’ll examine the efficacy of content-based versus socio-cultural and opinion data in recommender systems, focusing on large (ie. Big) datasets available for public use. This will start with an overview of recommender systems and methods. In particular we’ll look at the blind spots various proxy data sources can have and what can be done about them. We’ll then look into the use and failure (or success!) of content-based analysis in two recommender system contests: the classic Netflix Prize and the currently-ongoing (it will complete on 9 August 2012) Million Song Dataset Challenge (http://www.kaggle.com/c/msdchallenge). By the end of the talk, attendees should be able to discern the difference between various types of input analysis for recommender systems, to understand when content-based methods are suitable, and when opinions and socio-cultural data are more appropriate. They might discover some new music as well.

Benjamin Fields

Goldsmiths University of London/Fun & Plausible Solutions

Ben leads Musicmetric’s data science team in an attempt to wrangle some sanity into the Internet’s vast supply of horribly formed music data. He has a PhD from the Intelligent Sound and Music Systems group in the Computing Department at Goldsmith University of London. His work there focused on merging social and acoustic similarity spaces to drive playlist creation and related user-facing systems. He is an expert on metadata, structured data, the semantic web and recommendation systems. In his spare time, he is a co-chair of the annual international Workshop On Music Recommendation And Discovery, has given an Ignite London talk about beer styles, occasionally DJs, is an accredited beer judge and homebrews beer. He thinks bios in the third person are weird but figures that’s how they’re meant to be written.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com or +1 (707) 827-7148

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.