Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Building real-time data pipelines with Apache Kafka

Ian Wrigley (StreamSets)
9:00am12:30pm Tuesday, September 26, 2017
Secondary topics:  Streaming
Average rating: ****.
(4.50, 4 ratings)

Who is this presentation for?

  • Developers, data scientists, and anyone who wants to learn more about setting up and running Apache Kafka to build real-time data pipelines

Prerequisite knowledge

  • Basic knowledge of Apache Kafka (useful but not required)

Materials or downloads needed in advance

  • A laptop with at least 4 GB of RAM and VirtualBox and the VM installed (You'll be provided a link to a VirtualBox virtual machine before the event.)

What you'll learn

  • Learn how to configure Kafka Connect to move data between external systems and Apache Kafka and write a real-time stream processing application using the Kafka Streams DSL
  • Understand how easy it is to scale Connect and Streams as your data volume increases


Ian Wrigley demonstrates how Kafka Connect and Kafka Streams can be used together to build real-world, real-time streaming data pipelines. Using Kafka Connect, you’ll ingest data from a relational database into Kafka topics as the data is being generated and then process and enrich the data in real time using Kafka Streams before writing it out for further analysis.

You’ll see how easy it is to use Connect to ingest and export data (no code required) and how the Kafka Streams domain-specific language (DSL) means that developers can concentrate on business logic without worrying about the low-level plumbing of streaming data processing. And because Streams is a Java library, developers can build real-time applications without needing a separate cluster to run an external stream processing framework.

Photo of Ian Wrigley

Ian Wrigley


Ian Wrigley is a Technical Director at StreamSets, the company behind the industry’s first data operations platform. Over his 25-year career, Ian has taught tens of thousands of students subjects ranging from C programming to Hadoop development and administration.

Comments on this page are now closed.


Ratnakar Lingechetty | IT TECHNICAL LEAD
09/26/2017 7:50am EDT

Hi Ian, Where can I download the presentation slides from? Thanks!

Picture of Ian Wrigley
09/23/2017 11:04am EDT

OK, folks. Again, I’m very sorry for the confusion. The VM is now updated and available to download from the URL you received in the email from O’Reilly. Tested and confirmed to work on both Windows and Mac (the previous version was fine for Macs, but not for Windows boxes).

Picture of Ian Wrigley
09/23/2017 8:01am EDT

Many apologies — it looks like there was an issue when zipping the file on a Mac. I’m uploading a new version now, and I’ll comment again as soon as it’s in place.

09/23/2017 7:58am EDT

Is anyone having a problem unzipping the VM? I’ve tried using both 7-Zip and WinZip and am receiving CRC errors. Have downloaded it few times and the size and content has been the same so I don’t think it’s a transmission issue.

09/23/2017 5:33am EDT

About the virtual machine download, it seems the file is corrupted? I’ve tried downloading it several times and always get a CRC error when extracting the zip file.