Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Stream storage with Apache BookKeeper

Sijie Guo (StreamNative)
4:20pm5:00pm Wednesday, March 7, 2018
Secondary topics:  Graphs and Time-series
Average rating: ***..
(3.67, 3 ratings)

Who is this presentation for?

  • Software engineers, engineering management, CIOs, and technology leaders

Prerequisite knowledge

  • A basic understanding of stream processing, storage techniques, and real-time processing

What you'll learn

  • Learn why storage is an important component of stream processing and how Apache BookKeeper satisfies the needs of stream storage


Apache BookKeeper, a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads, has been widely adopted by enterprises like Twitter, Yahoo, and Salesforce to serve several use cases, such as building high-availability or replication facilities for distributed systems (the HDFS NameNode and Twitter’s Manhattan key-value store), providing replication between machines in a single cluster and also across clusters (multi-data-center replication), serving as a store for publish/subscribe (pub/sub) messaging systems, such as Twitter’s EventBus and Apache Pulsar (incubating), and storing immutable objects for streaming jobs (E.G., snapshots of checkpointed data).

Apache DistributedLog—which has graduated from the Apache Incubator and to become a subproject of Apache BookKeeper—extends BookKeeper from low-level ledger API to a simpler, high-level stream API and simplifies the usage of BookKeeper. Sijie Guo walks you through the Stream Storage API in Apache BookKeeper, explains why Apache BookKeeper is so unique for being a real-time storage system, and demonstrates how it is used in an end-to-end real-time solution.

Photo of Sijie Guo

Sijie Guo


Sijie Guo is the founder and CEO of StreamNative, a data infrastructure startup offering a cloud native event streaming platform based on Apache Pulsar for enterprises. He’s also the vice president of Apache BookKeeper and a PMC member of Apache Pulsar. Previously, he was the tech lead for the Messaging Group at Twitter and worked on push notification infrastructure at Yahoo.