Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Writing distributed graph algorithms

Andrew Ray (Sam’s Club Technology)
1:50pm2:30pm Wednesday, March 7, 2018
Secondary topics:  Graphs and Time-series
Average rating: ***..
(3.00, 3 ratings)

Who is this presentation for?

  • Data scientists and data engineers

Prerequisite knowledge

  • A basic understanding of graph concepts

What you'll learn

  • Learn how to write distributed graph algorithms
  • Explore Pregel, PowerGraph, and GraphX


Distributed graph algorithms are an important concept for understanding large-scale connected data. One such algorithm, Google’s PageRank, changed internet search forever. Efficient implementations of these algorithms in distributed systems are essential to operate at scale. Andrew Ray offers a brief introduction to the distributed graph algorithm abstractions provided by Pregel, PowerGraph, and GraphX, drawing on real-world examples, and provides historical context for the evolution between these three abstractions.

Topics include:

  • How the Pregel abstraction solves Google’s PageRank problem at scale
  • How PowerGraph overcomes some of the weaknesses of Pregel
  • How GraphX combines the best parts of Pregel and PowerGraph in an easier-to-use package
  • Three key examples: Connected Components, Single Source Shortest Path, and PageRank
  • Practical GraphX tips and tricks
Photo of Andrew Ray

Andrew Ray

Sam’s Club Technology

Andrew Ray is a senior technical expert at Sam’s Club Technology. He is passionate about big data and has extensive experience working with Apache Spark and Hadoop. Previously, at Walmart, Andrew built an analytics platform on Hadoop that integrated data from multiple retail channels using fuzzy matching and distributed graph algorithms and led the adoption of Spark from proof of concept to production. He is an active contributor to the Apache Spark project, including SparkSQL and GraphX. Andrew holds a PhD in mathematics from the University of Nebraska, where he worked on extremal graph theory.