Today Metamarkets processes over 300 billion events per day, representing over 100 TB going through a single pipeline built entirely on open source technologies including Druid, Kafka, and Samza. Growing to such a scale presents engineering challenges on many levels, not just in design but also with operations, especially when downtime is not an option.
Xavier Léauté explores how Metamarkets used Kafka and Samza to build a multitenant pipeline to perform streaming joins and transformations of varying degrees of complexity, which then pushes data into Druid to make it available for immediate, interactive analysis at a rate of several hundreds of concurrent queries per second. Xavier describes how his team overcame the challenges around scaling this stack. With data growing an order of magnitude in the span of a few months, all systems involved started to show their limits. Xavier explains how Metamarkets uses extensive metric collection to manage both performance and costs and how it handles very heterogeneous processing workloads while keeping down operational complexity.
Xavier Léauté is a software engineer at Confluent as well as a founding Druid committer and PMC member. Prior to his current role he headed the backend engineering team at Metamarkets.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.