Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Building DistributedLog, a high-performance replicated log service

Sijie Guo (Twitter)
2:40pm–3:20pm Thursday, 03/31/2016
Data Innovations

Location: 210 C/G
Average rating: ***..
(3.50, 2 ratings)

Prerequisite knowledge

Attendees should have a basic knowledge of distributed-systems concepts (concurrency, fault tolerance, etc.).

Description

Systems like databases and messaging systems require durability. One common way to implement durability while keeping performance high is to use a log to persist updates to system state. The log is used to reconstruct the system state in the event of a crash. Moreover, logs are very powerful data structures for addressing challenging distributed-systems problems.

DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system and has been used widely elsewhere at Twitter in applications from the transactional database system to the search ingestion pipeline and the real-time data analytics platform.

Sijie Guo offers an overview of DistributedLog, detailing why Twitter built DistributedLog, the technical decisions and challenges behind building DistributedLog, and how Twitter uses it to support different workloads with different characteristics from a strongly consistent distributed database to a real-time data analytics pipeline. Sijie also discusses how Twitter runs the same software stack in multiple data centers to achieve global consistency.

Photo of Sijie Guo

Sijie Guo

Twitter

Sijie Guo is a staff software engineer at Twitter, where he is tech lead of Message team. He is also the founder of Apache DistributedLog (incubating) and the PMC chair of Apache BookKeeper.