Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

In-Person Training
Real-time systems with Spark Streaming and Kafka

Jesse Anderson (Big Data Institute)
Monday, September 25 & Tuesday, September 26, 9:00am - 5:00pm
Secondary topics:  Architecture, Cloud, Streaming

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Tuesday.

To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company.

What you'll learn, and how you can apply it

  • Learn how to how to ingest data, process it, analyze it, and display it in real time with a dashboard with Apache Kafka and Apache Spark

Prerequisites:

  • A working knowledge of HDFS and Spark (i.e., Spark batch APIs)

Real-time big data frameworks are enabling brand-new use cases, while the cloud is letting us do things cheaper and faster than ever. Together, they’re making it easier to create production real-time systems. But to handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data?

Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Focusing on Apache Kafka and Apache Spark, Jesse also demonstrates how to ingest data, process it, analyze it, and display it in real time with a dashboard.

For the final exercise, you’ll take data that has been ingested with Kafka and process it with Spark Streaming and visualize it on a web page with D3. This video gives a little more information about the final exercise so you can see the skills you’ll take away from the class.

"Deep dive in Spark streaming, lots of content, not for the faint of heart." – Patrick Paul

"This class provided a lot of insight into real-time data engineering and integration into the cloud. It introduced technologies that I was unaware of. Jesse Anderson was very knowledgeable and presented the material very clearly and addressed all questions." – Richard Chabanne

About your instructor

Photo of Jesse Anderson

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He has taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He is widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.

Twitter for jessetanderson

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. .

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Jesse Anderson
Jesse Anderson | MANAGING DIRECTOR
06/22/2017 6:40am EDT

@Ofer here is a link to the syllabus. If you want an even more detail one I can give it to you.

This will be an advanced class. I’m expecting you to already know about Big Data and Spark batch.

I created a video showing the real-time dashboard we create at the end of the class. It uses Kafka, Spark Streaming, and D3.js to show real-time aggregations and analysis.

Ofer Cohen | R&D TEAM LEADER
06/22/2017 6:04am EDT

Is there any reference to a syllabus for this training ? I’m looking for more advanced training since i have some experience with writing rt streaming applicatinos with kafka