Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Maosong Fu offers an overview of the end-to-end real-time stack Twitter designed in order to meet this challenge, consisting of DistributedLog (the distributed and replicated messaging system) and Heron (the streaming system for real-time computation).
DistributedLog—a replicated log service built on top of Apache BookKeeper that provides infinite, ordered, append-only streams that can be used for building robust real-time systems—is the foundation of Twitter’s publish-subscribe system. Heron is Twitter’s next-generation streaming system built from ground up to address its scalability and reliability needs. Both systems have been in production for nearly two years and are widely used at Twitter in a range of diverse applications, such as the search ingestion pipeline, ad analytics, image classification, and more.
Maosong describes Heron and DistributedLog in detail, covering use cases and sharing the operating experiences and challenges of running large-scale real-time systems at scale.
Maosong Fu is the technical lead for Heron and real-time analytics at Twitter and the author of few publications in the distributed area. Maosong holds a master’s degree from Carnegie Mellon University and bachelor’s from Huazhong University of Science and Technology.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.