Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid

Xavier Léauté (Confluent)
12:05–12:45 Friday, 3/06/2016
Data innovations
Location: Capital Suite 14 Level: Intermediate
Average rating: ****.
(4.50, 6 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of stream processing, OLAP, and Kafka.

Description

Today, Metamarkets processes over 300 billion events per day—over 100 TB going through a single pipeline built entirely on open source technologies such as Druid, Kafka, and Samza. Working at such a scale presents engineering challenges on many levels, not just in terms of design but also in terms of operations, especially when downtime is not an option.

Xavier Léauté explores how Metamarkets used Kafka and Samza to build a multitenant pipeline to perform streaming joins and transformations of varying degrees of complexity and then push data into Druid to make it available for immediate, interactive analysis at a rate of several hundreds of concurrent queries per second. But as data grew an order of magnitude in the span of a few months, all systems involved started to show their limits. Xavier describes the challenges around scaling this stack and explains how the team overcame them, using extensive metric collection to manage both performance and costs and how they handle very heterogeneous processing workloads while keeping down operational complexity.

Photo of Xavier Léauté

Xavier Léauté

Confluent

Xavier Léauté is a software engineer at Confluent as well as a founding Druid committer and PMC member. Prior to his current role he headed the backend engineering team at Metamarkets.