Distributed graph algorithms are an important concept for understanding large-scale connected data. One such algorithm, Google’s PageRank, changed internet search forever. Efficient implementations of these algorithms in distributed systems are essential to operate at scale. Andrew Ray offers a brief introduction to the distributed graph algorithm abstractions provided by Pregel, PowerGraph, and GraphX, drawing on real-world examples, and provides historical context for the evolution between these three abstractions.
Andrew Ray is a senior technical expert at Sam’s Club Technology. He is passionate about big data and has extensive experience working with Apache Spark and Hadoop. Previously, at Walmart, Andrew built an analytics platform on Hadoop that integrated data from multiple retail channels using fuzzy matching and distributed graph algorithms and led the adoption of Spark from proof of concept to production. He is an active contributor to the Apache Spark project, including SparkSQL and GraphX. Andrew holds a PhD in mathematics from the University of Nebraska, where he worked on extremal graph theory.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com