Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Processing Fast Data with Apache Spark: The Tale of Two APIs

Gerard Maas (Lightbend)
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Who is this presentation for?

SW Engineers, Data Engineers, Enterprise Architects

Prerequisite knowledge

Familiarity with the subject: Some knowledge of Apache Spark and general interest in streaming applications.

What you'll learn

Attendees will leave the session with a global understanding of the capabilities of the Spark APIs for streaming and their key differences, how to make the right choice for an application and how to architect and develop streaming pipelines that use one or both APIs to fulfill their requirements.


Fast Data architectures provide an answer to the increasing need for the enterprise to process and analyze continuous streams of data, which helps accelerate decision making and enables faster responses to changing characteristics of their market. Apache Spark is a popular framework for data analytics. Its capabilities in the streaming domain are represented by two APIs: The low-level Spark Streaming and the more declarative Structured Streaming, which builds upon the recent advances in Spark SQL query optimization and code generation.

After a quick introduction to both APIs, we will discuss their virtues, capabilities and key differences:

- How to get started: ease of development.
- How to deal with time: both at the processing and event level
- How to deal with state: locally, distributed and its relation with time
- How to migrate: functional coding strategies
- How to do ML: machine learning capabilities

Using practical examples from actual applications, we will provide guidance on how to choose one or even combine both APIs to implement functional and resilient streaming pipelines.

Photo of Gerard Maas

Gerard Maas


Gerard Maas contributes to Lightbend Fast Data Platform as a Senior SW Engineer, where he focuses on the integration of stream processing technologies. Previously, he has held leading roles at several startups and large enterprises, building data science governance, cloud-native IoT platforms and scalable APIs.
He enjoys giving tech talks, contributing to small and large open source projects, tinkering with drones and building personal IoT projects.
Gerard is the co-author of ‘Learning Spark Streaming’, a book from O’Reilly Media.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)