Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

An introduction to Apache Kafka

Ian Wrigley (StreamSets)
1:30pm–5:00pm Tuesday, 09/27/2016
IoT & real-time
Location: 1B 03/04 Level: Beginner
Tags: real-time
Average rating: *****
(5.00, 7 ratings)

Prerequisite knowledge

  • Familiarity with Linux
  • Materials or downloads needed in advance

  • The Introduction to Kafka tutorial includes some hands-on exercises. If you want to follow along, you’ll need to bring a laptop with at least 4 GB of RAM and VirtualBox installed. Once you have installed VirtualBox, please download the virtual machine and Exercise Manual.
  • Note that your laptop must be capable of running a 64-bit guest virtual machine; the easiest way to test this is to download the VM, launch it (double-click the .vbox file), and ensure it starts up. If it does not start properly, check your machine’s BIOS and ensure that VT-x is enabled.
  • What you'll learn

  • Understand what Kafka is, how it works, and why it's so well suited to dealing with real-time, streaming data
  • Description

    Ian Wrigley demonstrates how to leverage the capabilities of Apache Kafka to collect, manage, and process stream data for both big data projects and general-purpose enterprise data integration—no prior knowledge of Kafka required. Ian covers system architecture and use cases and walks you through hands-on exercises where you’ll publish data to, and subscribe to data from, Kafka and investigate Kafka’s Java and REST APIs. Ian also explores other elements of the broader Kafka ecosystem, such as Kafka Connect and Kafka Streams.

    This tutorial is ideal for application developers, ETL (extract, transform, load) developers, or data scientists who need to interact with Kafka clusters as a source of, or destination for, stream data.

    Topics include:

    • Introduction to what Kafka is, its capabilities, and major components
    • Types of data appropriate for Kafka
    • Producers, consumers, and brokers and their roles in a Kafka cluster
    • Developer APIs in various languages for publication/subscription to Kafka Topics
    • Common patterns for application development with Kafka
    Photo of Ian Wrigley

    Ian Wrigley


    Ian Wrigley is a Technical Director at StreamSets, the company behind the industry’s first data operations platform. Over his 25-year career, Ian has taught tens of thousands of students subjects ranging from C programming to Hadoop development and administration.

    Comments on this page are now closed.


    10/05/2016 7:30am EDT

    Will the slides be shared for this session?