Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Building DistributedLog, a high-performance replicated log service

Sijie Guo (StreamNative)
2:40pm–3:20pm Thursday, 03/31/2016
Data Innovations

Location: 210 C/G
Average rating: ***..
(3.50, 2 ratings)

Prerequisite knowledge

Attendees should have a basic knowledge of distributed-systems concepts (concurrency, fault tolerance, etc.).

Description

Systems like databases and messaging systems require durability. One common way to implement durability while keeping performance high is to use a log to persist updates to system state. The log is used to reconstruct the system state in the event of a crash. Moreover, logs are very powerful data structures for addressing challenging distributed-systems problems.

DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system and has been used widely elsewhere at Twitter in applications from the transactional database system to the search ingestion pipeline and the real-time data analytics platform.

Sijie Guo offers an overview of DistributedLog, detailing why Twitter built DistributedLog, the technical decisions and challenges behind building DistributedLog, and how Twitter uses it to support different workloads with different characteristics from a strongly consistent distributed database to a real-time data analytics pipeline. Sijie also discusses how Twitter runs the same software stack in multiple data centers to achieve global consistency.

Photo of Sijie Guo

Sijie Guo

StreamNative

Sijie Guo is the founder and CEO of StreamNative. StreamNative is a data infrastructure startup offering cloud native event streaming platform based on Apache Pulsar for enterprises. Previously, he was the tech lead for the Messaging Group at Twitter, and worked on push notification infrastructure at Yahoo. He is also the VP of Apache BookKeeper and PMC Member of Apache Pulsar.