Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid

Xavier Léauté (Confluent)
2:05pm–2:45pm Thursday, 09/29/2016
Data innovations
Location: 1 E 07/1 E 08 Level: Intermediate

Prerequisite knowledge

  • A familiarity with Kafka and OLAP
  • An understanding of stream-processing basics
  • What you'll learn

  • Understand how Metamarkets's system performs in the real world and what it took to scale in practice
  • Description

    Today Metamarkets processes over 300 billion events per day, representing over 100 TB going through a single pipeline built entirely on open source technologies including Druid, Kafka, and Samza. Growing to such a scale presents engineering challenges on many levels, not just in design but also with operations, especially when downtime is not an option.

    Xavier Léauté explores how Metamarkets used Kafka and Samza to build a multitenant pipeline to perform streaming joins and transformations of varying degrees of complexity, which then pushes data into Druid to make it available for immediate, interactive analysis at a rate of several hundreds of concurrent queries per second. Xavier describes how his team overcame the challenges around scaling this stack. With data growing an order of magnitude in the span of a few months, all systems involved started to show their limits. Xavier explains how Metamarkets uses extensive metric collection to manage both performance and costs and how it handles very heterogeneous processing workloads while keeping down operational complexity.

    Photo of Xavier Léauté

    Xavier Léauté

    Confluent

    Xavier Léauté is a software engineer at Confluent as well as a founding Druid committer and PMC member. Prior to his current role he headed the backend engineering team at Metamarkets.