Skip to main content

Graph Analysis with One Trillion Edges on Apache Giraph

Avery Ching (Facebook)
Hadoop and Beyond
Ballroom CD
Average rating: ****.
(4.25, 4 ratings)

Graph analytics have applications beyond large web scale organizations. Many computing problems can be efficiently expressed and processed as a graph and can lead to useful insights that drive product and business decisions

While you can express graph algorithms as SQL queries in Hive or Hadoop MapReduce programs, an API designed specifically for graph processing makes writing many iterative graph computations (such as page rank, connected components, label propagation, graph-based clustering, etc.) easy to express in simpler and easier to understand code. Apache Giraph provides such a native graph processing API, runs on existing Hadoop infrastructure and can directly access HDFS and/or Hive tables.

This talk describes our efforts at Facebook to scale Apache Giraph to very large graphs of up to one trillion edges and how we run Apache Giraph in production. We will also talk about several algorithms that we have implemented and their use cases.

Avery Ching

Software engineer, Facebook

Avery has a PhD from Northwestern University in the area of parallel computing. He worked at Yahoo! Search for four years on the web map analytics platform, large-scale ad hoc serving infrastructure, and cluster management. During the past year and a half, he has been working at Facebook in the general area of big data computational frameworks (Corona – scalable MapReduce and Giraph – scalable graph processing).