Harry Powell and Raffael Strassnig demonstrate how to model unobserved customer preferences over businesses by thinking about transactional data as a bipartite graph and then computing a new similarity metric—the expected degrees of separation (EDS)—to characterize the full graph.
EDS is hard to compute on large dataset because of the large number of possible paths between nodes. Harry and Raffael explore different strategies to evaluate EDS in a distributed way in Scala and Spark and propose an estimation approach that is consistent, unbiased, and scalable. They then present results for businesses in Bristol, UK, compare the properties of EDS with familiar graph-based metrics such as PageRank and shortest path, and discuss applications of the technology to other use cases. Harry and Raffael conclude by sharing a simple recommender.
Harry Powell is director and head of advanced data analytics at Barclays.
Raffael Strassnig is vice president and data scientist at Barclays, where he pushes the boundaries of predictive systems. Previously, Raffael worked on problems in dynamic advertising at Amazon and real-time analytics at Microsoft. In his free time, he enjoys solving maths riddles, programming in Scala, and cooking. He studied software engineering at the University of Technology in Graz, mathematics at the University of Vienna, and computational intelligence at the University of Technology in Vienna.
Comments on this page are now closed.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org