Pinterest is the web’s visual catalogue of ideas with over a billion pins, each with its own user, board, link, description, and image or video. Figuring out what to recommend to a specific user is crucial to helping users discover things they love.
On Pinterest, every pin must be saved to a board (a collection of pins). Each user might have several boards—for instance, one for travel, recipes, and men’s tennis shoes. Recommendations for pins, boards, or users can be derived from a graph with these candidates represented by nodes. Because of recent drops in RAM cost, terabyte-scale RAM machines are now readily available, such as the AWS X1 instance. These giant RAM machines can fit entire graphs connecting billions of pin, board, and user nodes into the main memory on a single machine, but in the virtual environment they have some unique performance challenges Pinterest had to overcome.
Jure Leskovec explains how Pinterest built its modern recommendation engine, Pixie, and the lessons learned along the way. Pixie is a graph-based, real-time recommendation system that enables a single machine to serve 500 queries per second with a p99 of 100 ms, where each query involved 300,000 edge traversals and up to 2,000 resulting recommendations. Pinterest’s previous recommendation systems were all Hadoop-based and would run on a daily basis to determine similarity between nodes based on common neighbors. By switching to a real-time random walk-based system, Pinterest obtained a more fine-grained, flexible estimate of a recommendation’s relative importance to the query nodes. The system can take in multiple query nodes and metaparameters to influence the random walk traversal. Many Pinterest teams were able to quickly iterate and find ideal parameters for their unique use cases, ranging from recommendation emails to the Pinterest home feed.
Jure covers Pixie’s data and algorithmic basis, its implementation, and the impact for Pinterest’s users and shares the process of how Pinterest identified the requirements for the system, how it was designed and built, what algorithms it uses, how it was deployed, how this new capability is reflected in the Pinterest product, how it has enabled the release of new Pinterest product features, and how it improved the Pinterest product.
Jure Leskovec is chief scientist at Pinterest and associate professor of computer science at Stanford University. Jure’s research focuses on computation over massive data and has applications in computer science, social sciences, economics, marketing, and healthcare. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper awards. Jure holds a bachelor’s degree in computer science from the University of Ljubljana, Slovenia, and a PhD in machine learning from Carnegie Mellon University and undertook postdoctoral training at Cornell University.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.