Apache BookKeeper, a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads, has been widely adopted by enterprises like Twitter, Yahoo, and Salesforce to serve several use cases, such as building high-availability or replication facilities for distributed systems (the HDFS NameNode and Twitter’s Manhattan key-value store), providing replication between machines in a single cluster and also across clusters (multi-data-center replication), serving as a store for publish/subscribe (pub/sub) messaging systems, such as Twitter’s EventBus and Apache Pulsar (incubating), and storing immutable objects for streaming jobs (E.G., snapshots of checkpointed data).
Apache DistributedLog—which has graduated from the Apache Incubator and to become a subproject of Apache BookKeeper—extends BookKeeper from low-level ledger API to a simpler, high-level stream API and simplifies the usage of BookKeeper. Sijie Guo walks you through the Stream Storage API in Apache BookKeeper, explains why Apache BookKeeper is so unique for being a real-time storage system, and demonstrates how it is used in an end-to-end real-time solution.
Sijie Guo is the cofounder of Streamlio, a company focused on building a next-generation real-time data stack. Previously, he was the tech lead for the Messaging Group at Twitter, where he cocreated Apache DistributedLog, and worked on push notification infrastructure at Yahoo. He is the PMC chair of Apache BookKeeper.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com