Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

How knowledge graphs can help dramatically improve recommendations

Aurélien Geron (Kiwisoft)
11:1511:55 Thursday, 25 May 2017
Level: Beginner
Average rating: *****
(5.00, 4 ratings)

Who is this presentation for?

  • Engineers, product managers, team leaders, CTOs, UX, analysts, and anyone interested in a better understanding of their content, better recommendations, and more structured UX

Prerequisite knowledge

  • A high-level understanding of collaborative filtering

What you'll learn

  • Understand what knowledge graphs can do for you
  • Explore the main knowledge graphs available and their pros and cons
  • Discover the problems you may encounter and learn how to overcome them


Good recommendations turn occasional users into daily users. When users don’t have to search for good content, they engage more. Collaborative filtering is one of the most successful tools for recommending the right content to the right audience, yet it requires plenty of signal to work efficiently. New users with little history get completely unpersonalized recommendations, and fresh content gets recommended to no one (or worse, to the wrong audience). This is known as the cold-start problem.

So what’s the solution? If a brand-new user watches a freshly uploaded biking video, you would probably want to recommend more biking content, and possibly suggest related topics such as fitness. In short, instead of direct content-to-content recommendations, you would go for content-to-topic-to-content.

The first step is to build a system that automatically figures out what your content is about. One option would be to just parse the text (i.e., the video title and description) and extract unusually frequent tuples of words (using tf–idf). However, plain text can be ambiguous (does “football” refer to American football or to soccer?), and it can be redundant (“sky diving” is the same as “parachuting”) and messy (“Mikael Jackson” is clearly “Michael Jackson” misspelled).

Aurélien Géron shares a much better option: leveraging the power of knowledge graphs such as Wikidata, DBpedia, and Google’s KG (initially based on Freebase, which has been sunsetted in favor of Wikidata). Each node in a graph represents a unique, unambiguous topic, and these topics are connected into a gigantic machine-queryable graph. This structure can be exploited to provide meaningful, consistent, browsable, and personalized content (e.g., list the most famous professional soccer players born in the user’s city). Many signals can be used to identify a content’s topic, from text (title, description, comments, anchors, search queries, etc.) to user behavior (e.g., topics explored during the same session) to audiovisual content analysis (using deep learning), and beyond.

No tool is perfect, and knowledge graphs are no exception. In particular, although they are great at making good recommendations for new users and serving fresh content to the right audience, they are not ideal for new topics since it takes time for a new topic to be added to a knowledge graph. One solution is to use a mixed vocabulary including both KG topics and plain text.

Beyond recommendations, better understanding what your content is actually about can help you drive your content strategy (e.g., do users engage with cooking videos?), enable context-aware ad targeting (e.g., display a makeup ad on beauty tips content), improve search (e.g., people searching for “Paris” could get only results about the city, plus a disambiguation box asking them whether they meant “Paris Hilton”, or the band “Paris”), structure the user experience (e.g., if the content is about a movie, show the main actor bios), help detect spam (e.g., why is the content unrelated to its title?), and much more.

Photo of Aurélien Geron

Aurélien Geron


Aurélien Géron is a machine learning consultant at Kiwisoft and author of the best-selling O’Reilly book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. Previously, he led YouTube’s video classification team, was a founder and CTO of Wifirst, and was a consultant in a variety of domains: finance (JPMorgan and Société Générale), defense (Canada’s DOD), and healthcare (blood transfusion). He also published a few technical books (on C++, WiFi, and internet architectures), and he’s a lecturer at the Dauphine University in Paris. He lives in Singapore with his wife and three children.