Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Building real-time data pipelines with Apache Kafka

Ian Wrigley (StreamSets)
1:30pm5:00pm Tuesday, March 14, 2017
Stream processing and analytics
Location: 210 A/E Level: Intermediate
Secondary topics:  Streaming
Average rating: ****.
(4.83, 6 ratings)

Who is this presentation for?

  • Developers, data scientists, and anyone who wants to learn more about setting up and running Apache Kafka to build real-time data pipelines

Prerequisite knowledge

  • Basic knowledge of Apache Kafka

Materials or downloads needed in advance

  • A laptop with at least 4 GB of RAM and VirtualBox and the VM installed (You'll be provided a link to a VirtualBox virtual machine before the event.)

What you'll learn

  • Learn how to configure Kafka Connect to move data between external systems and Apache Kafka and write a real-time stream processing application using the Kafka Streams DSL
  • Understand how easy it is to scale Connect and Streams as your data volume increases


Ian Wrigley demonstrates how Kafka Connect and Kafka Streams can be used together to build real-world, real-time streaming data pipelines. Using Kafka Connect, you’ll ingest data from a relational database into Kafka topics as the data is being generated and then process and enrich the data in real time using Kafka Streams before writing it out for further analysis.

You’ll see how easy it is to use Connect to ingest and export data (no code required) and how the Kafka Streams domain-specific language (DSL) means that developers can concentrate on business logic without worrying about the low-level plumbing of streaming data processing. And because Streams is a Java library, developers can build real-time applications without needing a separate cluster to run an external stream processing framework.

Photo of Ian Wrigley

Ian Wrigley


Ian Wrigley is a Technical Director at StreamSets, the company behind the industry’s first data operations platform. Over his 25-year career, Ian has taught tens of thousands of students subjects ranging from C programming to Hadoop development and administration.

Comments on this page are now closed.


Picture of Ian Wrigley
03/14/2017 2:13am PDT

Hey Bill. Sorry — miscommunication on my part, I think. I’ll give you a link to the Exercise Manual when we start the tutorial.

03/14/2017 12:54am PDT

Hi Ian,

I was able to download, unzip and start-up the CentOS 6.8 VM on VirtualBox.

In your email, you mentioned that there is an Exercise
Manual but I don’t see that in the “training” folder.

Did I overlook something?

Thanks, – Bill

Picture of Ian Wrigley
03/13/2017 12:00am PDT


If you didn’t receive the email regarding the Virtual Machine, you can download it from

Shilpa Shukla | DATA ENGINEER
03/12/2017 11:42pm PDT

Hi, I am registered for this tutorial but have not received any link to VirtualBox virtual machine.