Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Introduction to Apache Kafka (Half Day)

Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent), Joseph Adler (Facebook), Ian Wrigley (StreamSets)
9:00am–12:30pm Tuesday, 03/29/2016
IoT and Real-time

Location: LL21 A
Tags: real-time
Average rating: ***..
(3.90, 21 ratings)

Prerequisite knowledge


For this tutorial, we'll be using a virtual machine (VM) with Kafka installed on it. You must install VirtualBox 5.0.10 or above BEFORE you arrive onsite. Your computer must have a 64-bit processor and at least 3 GB of free RAM. Some computers' BIOS have VT-X turned off by default. You need to enable this setting if the VM boot fails.

Please download:

After you have downloaded the VM, you need to decompress it. Once the VM is decompressed, import the VM into VirtualBox.

Participants should have a basic familiarity with Java or Python and be comfortable with Linux-based tools.

Materials or downloads needed in advance

We will distribute a VM image with all of the tools you need to complete this tutorial. You'll need a laptop with VirtualBox or VMWare. (We'll provide more specific hardware recommendations prior to the conference, but will make sure things run well on most modern laptops.)


Learn about stream data and Apache Kafka from several core Kafka contributors. During this three-hour tutorial, Ewen Cheslack-Postava, Joseph Adler, Jesse Anderson, and Ian Wrigley explain what Kafka is, demonstrate how it works, and explore using Kafka to build modern data applications. Ewen, Joseph, Jesse, and Ian also discuss key architectural concepts and developer APIs as they guide participants through hands-on labs to build an application that can to publish data to Kafka and subscribe to receive data from Kafka.

This tutorial is ideal for application developers, extraction-transformation-load (ETL) developers, or data scientists who need to interact with Kafka clusters as a source of, or destination for, stream data.


  • Introduction to Kafka, its capabilities, and major components
  • Types of data appropriate for Kafka
  • Producers, consumers, and brokers and their roles in a Kafka cluster
  • Developer APIs in various languages for publication/subscription to Kafka topics
  • Common patterns for application development with Kafka
Photo of Jesse Anderson

Jesse Anderson

Big Data Institute

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He’s taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He’s widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at

Photo of Ewen Cheslack-Postava

Ewen Cheslack-Postava


Ewen Cheslack-Postava is an engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. Ewen received his PhD from Stanford University, where he developed Sirikata, an open source system for massive virtual environments. His dissertation defined a novel type of spatial query giving significantly improved visual fidelity and described a system for efficiently processing these queries at scale.

Photo of Joseph Adler

Joseph Adler


Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, Verisign, and LinkedIn. Currently, he is director of product management and data science at Confluent. He is the holder of several patents for computer security and cryptography and the author of Baseball Hacks and R in a Nutshell. He graduated from MIT with a BSc and MEng in computer science and electrical engineering.

Photo of Ian Wrigley

Ian Wrigley


Ian Wrigley is a Technical Director at StreamSets, the company behind the industry’s first data operations platform. Over his 25-year career, Ian has taught tens of thousands of students subjects ranging from C programming to Hadoop development and administration.

Comments on this page are now closed.


Ashish Warudkar
03/28/2016 2:46am PDT

I am registered for the Spark Machine Learning session.. but would like to switch to ‘Intro to Kafka’ for the 9am session .. can I ? THis will help much for my project as my afternoon booking is further for the ‘Kafka’ session .. Pls respond..

Ashish Warudkar