Systems like databases and messaging systems require durability. One common way to implement durability while keeping performance high is to use a log to persist updates to system state. The log is used to reconstruct the system state in the event of a crash. Moreover, logs are very powerful data structures for addressing challenging distributed-systems problems.
DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system and has been used widely elsewhere at Twitter in applications from the transactional database system to the search ingestion pipeline and the real-time data analytics platform.
Sijie Guo offers an overview of DistributedLog, detailing why Twitter built DistributedLog, the technical decisions and challenges behind building DistributedLog, and how Twitter uses it to support different workloads with different characteristics from a strongly consistent distributed database to a real-time data analytics pipeline. Sijie also discusses how Twitter runs the same software stack in multiple data centers to achieve global consistency.
Sijie Guo is a staff software engineer at Twitter, where he is tech lead of Message team. He is also the founder of Apache DistributedLog (incubating) and the PMC chair of Apache BookKeeper.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.