Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

GearPump: Real time DAG processing at scale

Sean Zhong (Previously Intel)
1:30pm–2:10pm Wednesday, 12/02/2015
IoT and Real-time
Location: 324 Level: Intermediate
Average rating: ****.
(4.00, 1 rating)
Slides:   1-PPTX 

Prerequisite Knowledge

Familiarity with similar frameworks such as flink, heron, data torrent,


Today’s big data streaming frameworks are in high demand to process immense amounts of data from a rapidly growing set of disparate data sources. Beyond the requirements of fault tolerance, persistence, and scalability, streaming engines need to provide a programming model where computations can be easily expressed and deployed in a distributed manner. GearPump meets this objective while differentiating itself from other streaming engines by elevating and promoting the actor model as the primary entity permeating the framework. What we set out to build is a simple and powerful streaming framework. The Scala language and Akka offer a higher level of abstraction allowing frameworks to focus more on the application and make the engine more lightweight. Main points:

  • Message-driven architecture is the right choice for real-time streaming. Real-time applications need to respond to messages immediately. Akka gives us all the facilities and building blocks for message-driven architecture; it simplifies the design and coding and allows code to evolve more quickly.
  • Actor is the right level of concurrency. It can give us better performance to scale up and scale out. We have seen other big data engines using process, thread, or self-managed thread pools to parallelize. The Akka Actor is much more lightweight. Moreover, it has many optimizations related to concurrency; Actor execution can be swapped in and out efficiently with respect to fairness and performance
Photo of Sean Zhong

Sean Zhong

Previously Intel

Sean Zhong was a cloud architect in Intel’s Big Data engineering group. Sean’s expertise is in streaming, and he is the creator of Apache Gearpump as well as a PMC member of Apache Storm. Besides streaming, Sean participates in many other Apache projects, including Hadoop NativeTask and HBase media object storage.