Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Mistakes were made, but not by us: Lessons from a year of supporting Apache Kafka

Ryan Pridgeon (Confluent), Dustin Cote (Confluent)
1:50pm2:30pm Wednesday, March 15, 2017
Data engineering and architecture
Location: LL20 C Level: Intermediate
Secondary topics:  Streaming
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data architects, operations people, SREs, developers, and DevOps engineers

Prerequisite knowledge

  • A basic understanding of Apache Kafka
  • Familiarity with JMX and Linux

What you'll learn

  • Learn strategies for keeping an Apache Kafka cluster up and running efficiently


The number of deployments of Apache Kafka at enterprise scale has greatly increased in the years since Kafka’s original development in 2010. Along with this rapid growth has come a wide variety of use cases and deployment strategies that transcend what Kafka’s creators imagined when they originally developed the technology. As the scope and reach of streaming data platforms based on Apache Kafka has grown, the need to understand monitoring and troubleshooting strategies has as well.

Dustin Cote and Ryan Pridgeon share their experience supporting Apache Kafka at enterprise-scale and explore monitoring and troubleshooting techniques to help you avoid pitfalls when scaling large-scale Kafka deployments.

Topics include:

  • Effective use of JMX for Kafka
  • Tools for preventing small problems from becoming big ones
  • Efficient architectures proven in the wild
  • Finding and storing the right information when it all goes wrong
Photo of Ryan Pridgeon

Ryan Pridgeon


Ryan Pridgeon is a customer operations engineer at Confluent. Ryan has a deep-rooted passion for tinkering that knows no bounds. Be it automotive, software, or carpentry, if it has pieces, Ryan wants to take it apart. He’s still working on putting things back together though.

Photo of Dustin Cote

Dustin Cote


Dustin Cote is a customer operations engineer at Confluent. Over his career, Dustin has worked in a variety of roles from Java developer to operations engineer. His most recent focus is distributed systems in the big data ecosystem, with Apache Kafka being his software of choice.