Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Real-time Systems with Spark Streaming and Kafka (Day 2)

Jesse Anderson (Big Data Institute)
Secondary topics:  Architecture, Cloud, Streaming

Real-time big data frameworks are enabling brand-new use cases, while the cloud is letting us do things cheaper and faster than ever. Together, they’re making it easier to create production real-time systems. But to handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data?

Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Focusing on Apache Kafka and Apache Spark, Jesse also demonstrates how to ingest data, process it, analyze it, and display it in real time with a dashboard.

Prerequisites: To gain the most from the workshop, participants should have working knowledge of HDFS and Spark. Detailed Spark Streaming API level knowledge is not needed, but you will need to know the Spark batch APIs. There will be extensive programming activities.

Photo of Jesse Anderson

Jesse Anderson

Big Data Institute

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He has taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He is widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at