Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Gaffer: A very scalable, open source graph database

11:1511:55 Thursday, 25 May 2017
Hadoop platform and applications
Location: Capital Suite 13
Level: Beginner

Prerequisite knowledge

  • A working knowledge of the Hadoop ecosystem, Apache Accumulo, and property graphs

What you'll learn

  • Explore Gaffer's history, architecture, data model, features, and functionality
  • See what's in store in the near future


Gaffer is a scalable open source graph database built on Accumulo or HBase (your choice). A Parquet implementation is in progress, and Gaffer can be extended to other technologies as well—its data ingest and query services easily integrate with Hadoop and Spark. Gaffer is designed to be very scalable and can ingest and store data streamed in at very high rates or bulk-loaded in large batches while providing fast, flexible query access.

Gaffer allows rich properties, such as data sketches, to be stored on entities and edges in the graph. Its built-in aggregation framework lets users specify complex logic that tells Gaffer how to evolve and aggregate these properties as new data is added. For example, each edge could have a count property that is maintained by a “sum” function. New instances of an existing edge can be added and the count is updated without having to query for the existing edge first.

This session explores Gaffer’s history, architecture, data model, features, and functionality and outlines some future goals for the project.