Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Recommending 1+ billion items to 100+ million users in real time: Harnessing the structure of the user-to-object graph to extract ranking signals at scale

Jure Leskovec (Pinterest)
11:50am12:30pm Wednesday, March 15, 2017
Secondary topics:  Data Platform, ecommerce, Hardcore Data Science, Media
Average rating: ****.
(4.80, 10 ratings)

Who is this presentation for?

  • Data scientists, engineers, machine learning engineers, and product managers

What you'll learn

  • Explore Pinterest's modern recommendation engine, Pixie

Description

Pinterest is the web’s visual catalogue of ideas with over a billion pins, each with its own user, board, link, description, and image or video. Figuring out what to recommend to a specific user is crucial to helping users discover things they love.

On Pinterest, every pin must be saved to a board (a collection of pins). Each user might have several boards—for instance, one for travel, recipes, and men’s tennis shoes. Recommendations for pins, boards, or users can be derived from a graph with these candidates represented by nodes. Because of recent drops in RAM cost, terabyte-scale RAM machines are now readily available, such as the AWS X1 instance. These giant RAM machines can fit entire graphs connecting billions of pin, board, and user nodes into the main memory on a single machine, but in the virtual environment they have some unique performance challenges Pinterest had to overcome.

Jure Leskovec explains how Pinterest built its modern recommendation engine, Pixie, and the lessons learned along the way. Pixie is a graph-based, real-time recommendation system that enables a single machine to serve 500 queries per second with a p99 of 100 ms, where each query involved 300,000 edge traversals and up to 2,000 resulting recommendations. Pinterest’s previous recommendation systems were all Hadoop-based and would run on a daily basis to determine similarity between nodes based on common neighbors. By switching to a real-time random walk-based system, Pinterest obtained a more fine-grained, flexible estimate of a recommendation’s relative importance to the query nodes. The system can take in multiple query nodes and metaparameters to influence the random walk traversal. Many Pinterest teams were able to quickly iterate and find ideal parameters for their unique use cases, ranging from recommendation emails to the Pinterest home feed.

Jure covers Pixie’s data and algorithmic basis, its implementation, and the impact for Pinterest’s users and shares the process of how Pinterest identified the requirements for the system, how it was designed and built, what algorithms it uses, how it was deployed, how this new capability is reflected in the Pinterest product, how it has enabled the release of new Pinterest product features, and how it improved the Pinterest product.

Photo of Jure Leskovec

Jure Leskovec

Pinterest

Jure Leskovec is chief scientist at Pinterest and associate professor of computer science at Stanford University. Jure’s research focuses on computation over massive data and has applications in computer science, social sciences, economics, marketing, and healthcare. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper awards. Jure holds a bachelor’s degree in computer science from the University of Ljubljana, Slovenia, and a PhD in machine learning from Carnegie Mellon University and undertook postdoctoral training at Cornell University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)