Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Putting Kafka into overdrive

Gwen Shapira (Confluent), Todd Palino (LinkedIn)
17:25–18:05 Thursday, 2/06/2016
IoT & real-time
Location: Capital Suite 12 Level: Intermediate
Average rating: ****.
(4.67, 9 ratings)

Prerequisite knowledge

Attendees should have an understanding of how publish/subscribe messaging systems work, as well as basic knowledge of Apache Kafka. While a deep understanding of how Kafka works is not required, the more advanced the attendee is, the more immediately applicable the content will be.

Description

Apache Kafka lies at the heart of the largest data pipelines, handling trillions of messages and petabytes of data every day. Gwen Shapira and Todd Palino explain the right approach for getting the most out of Kafka, exploring how to monitor, optimize, and troubleshoot performance of your data pipelines from producer to consumer and from development to production.

Gwen and Todd outline some of the common problems that Kafka developers and administrators encounter when they take Apache Kafka from a proof of concept to production usage. Too often, these systems are overprovisioned and underutilized and still have trouble meeting reasonable performance agreements.

Topics include:

  • What latencies and throughputs you can expect from Kafka
  • How to select hardware and size components
  • What you should be monitoring
  • Design patterns and antipatterns for client applications
  • How to diagnose performance bottlenecks
  • Which configurations to examine and which ones to avoid
Photo of Gwen Shapira

Gwen Shapira

Confluent

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Photo of Todd Palino

Todd Palino

LinkedIn

Todd Palino is a site reliability engineer at LinkedIn tasked with keeping Zookeeper, Kafka, and Samza deployments fed and watered. His days are spent, in part, developing monitoring systems and tools to make that job a breeze. Previously, Todd was a systems engineer at Verisign, where he developed service-management automation for DNS, networking, and hardware management and managed hardware and software standards across the company.