Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

In-Person Training
Real-time systems with Spark Streaming and Kafka

Jesse Anderson (Big Data Institute)

Monday, March 5 & Tuesday, March 6, 9:00am - 5:00pm

Big data and data science in the cloud, Data engineering and architecture
Location: 114

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Tuesday.

To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company.

What you'll learn, and how you can apply it

Learn how to how to ingest data, process it, analyze it, and display it in real time in a dashboard with Apache Kafka and Apache Spark

Prerequisites:

A working knowledge of HDFS and Spark (i.e., Spark batch APIs)

Real-time big data frameworks are enabling brand-new use cases, while the cloud is letting us do things cheaper and faster than ever. Together, they’re making it easier to create production real-time systems. But to handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data?

Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Focusing on Apache Kafka and Apache Spark, Jesse also demonstrates how to ingest data, process it, analyze it, and display it in real time in a dashboard.

For the final exercise, you’ll take data that has been ingested with Kafka and process it with Spark Streaming and visualize it on a web page with D3. This video gives a little more information about the final exercise so you can see the skills you’ll take away from the class.

About your instructor

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He’s taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He’s widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package.

Comments on this page are now closed.

Comments

Manjusha Bolishetty | BUSINESS INTELLIGENCE ANALYST

03/05/2018 4:03am PST

Someone had a question during the sesssion about Apache Kafka versus Apache pulsar. Found this useful:
https://streaml.io/blog/pulsar-streaming-queuing/

Jesse Anderson | MANAGING DIRECTOR

01/29/2018 5:37pm PST

Looks like this session will in Java?
Yes, see this post for why

I am not new to Java but I do not use it day to day or have used it regularly in the past either. I have some understanding. I am not sure if I will need some prep before taking this session.

You should make sure you have an intermediate level of Java knowledge. Big Data isn’t going to stress your knowledge of Java syntax.

Will we doing the exercise of ingesting data, process it, analyze it, and display it in real time in a dashboard with Apache Kafka and Apache Spark during the session from stratch?

Yes, the video you watched shows the final exercise where we are doing everything from scratch. The VM has everything set up so you’re not worrying about installing and configuring services.

Manjusha Bolishetty | BUSINESS INTELLIGENCE ANALYST

01/29/2018 6:56am PST

Sorry had some more questions.

I am not new to Java but I do not use it day to day or have used it regularly in the past either. I have some understanding. I am not sure if I will need some prep before taking this session.
Will we doing the exercise of ingesting data, process it, analyze it, and display it in real time in a dashboard with Apache Kafka and Apache Spark during the session from stratch?

Manjusha Bolishetty | BUSINESS INTELLIGENCE ANALYST

01/29/2018 6:34am PST

Hi Jesse, I had a quick glance at the video linkedin in the description and Looks like this session will in Java?

Jesse Anderson | MANAGING DIRECTOR

01/23/2018 1:14am PST

I’ll be distributing the materials the day of the class. There will be some pre-class preparations. Be on the lookout for that email. You will need your laptop for the class.

Carlos Perales | HADOOP DEVELOPER

01/22/2018 5:39am PST

Hi
What about the material ?
Can I bring my laptop?

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

In-Person TrainingReal-time systems with Spark Streaming and Kafka

What you'll learn, and how you can apply it

Prerequisites:

About your instructor

Conference registration

Comments

Sponsorship Opportunities

Partner Opportunities

Contact Us

In-Person Training
Real-time systems with Spark Streaming and Kafka