Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Stream storage with Apache BookKeeper

Sijie Guo (Streamlio)
4:20pm5:00pm Wednesday, March 7, 2018
Secondary topics:  Graphs and Time-series
Average rating: ***..
(3.67, 3 ratings)

Who is this presentation for?

  • Software engineers, engineering management, CIOs, and technology leaders

Prerequisite knowledge

  • A basic understanding of stream processing, storage techniques, and real-time processing

What you'll learn

  • Learn why storage is an important component of stream processing and how Apache BookKeeper satisfies the needs of stream storage

Description

Apache BookKeeper, a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads, has been widely adopted by enterprises like Twitter, Yahoo, and Salesforce to serve several use cases, such as building high-availability or replication facilities for distributed systems (the HDFS NameNode and Twitter’s Manhattan key-value store), providing replication between machines in a single cluster and also across clusters (multi-data-center replication), serving as a store for publish/subscribe (pub/sub) messaging systems, such as Twitter’s EventBus and Apache Pulsar (incubating), and storing immutable objects for streaming jobs (E.G., snapshots of checkpointed data).

Apache DistributedLog—which has graduated from the Apache Incubator and to become a subproject of Apache BookKeeper—extends BookKeeper from low-level ledger API to a simpler, high-level stream API and simplifies the usage of BookKeeper. Sijie Guo walks you through the Stream Storage API in Apache BookKeeper, explains why Apache BookKeeper is so unique for being a real-time storage system, and demonstrates how it is used in an end-to-end real-time solution.

Photo of Sijie Guo

Sijie Guo

Streamlio

Sijie Guo is the cofounder of Streamlio, a company focused on building a next-generation real-time data stack. Previously, he was the tech lead for the Messaging Group at Twitter, where he cocreated Apache DistributedLog, and worked on push notification infrastructure at Yahoo. He is the PMC chair of Apache BookKeeper.