Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Making recommendations using graphs and Spark

Harry Powell (Barclays), Raffael Strassnig (Barclays)
16:3517:15 Wednesday, 24 May 2017
Secondary topics:  Ecommerce, Financial services
Level: Intermediate
Average rating: ****.
(4.00, 6 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and graph-processing and Spark enthusiasts

Prerequisite knowledge

  • A basic understanding of high school-level probability, Scala and Spark, and computational complexity

What you'll learn

  • Learn how to think in new ways about conventional datasets and how to build recommendations using graphs
  • Explore strategies for distributed computation of complex calculations, including a new similarity metric—the expected degrees of separation (EDS)

Description

Harry Powell and Raffael Strassnig demonstrate how to model unobserved customer preferences over businesses by thinking about transactional data as a bipartite graph and then computing a new similarity metric—the expected degrees of separation (EDS)—to characterize the full graph.

EDS is hard to compute on large dataset because of the large number of possible paths between nodes. Harry and Raffael explore different strategies to evaluate EDS in a distributed way in Scala and Spark and propose an estimation approach that is consistent, unbiased, and scalable. They then present results for businesses in Bristol, UK, compare the properties of EDS with familiar graph-based metrics such as PageRank and shortest path, and discuss applications of the technology to other use cases. Harry and Raffael conclude by sharing a simple recommender.

Photo of Harry Powell

Harry Powell

Barclays

Harry Powell is director and head of advanced data analytics at Barclays.

Photo of Raffael Strassnig

Raffael Strassnig

Barclays

Raffael Strassnig is vice president and data scientist at Barclays, where he pushes the boundaries of predictive systems. Previously, Raffael worked on problems in dynamic advertising at Amazon and real-time analytics at Microsoft. In his free time, he enjoys solving maths riddles, programming in Scala, and cooking. He studied software engineering at the University of Technology in Graz, mathematics at the University of Vienna, and computational intelligence at the University of Technology in Vienna.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Michał Kucharczyk | BI & RISK MANAGEMENT SPECIALIST
26/05/2017 9:21 BST

Hello Harry and Raffael, do you plan to share the slides?