Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Processing fast data with Apache Spark: A tale of two APIs

Gerard Maas (Lightbend)

11:15–11:55 Wednesday, 23 May 2018

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Average rating:

(4.00, 13 ratings)

Who is this presentation for?

Software engineers, data engineers, and enterprise architects

Prerequisite knowledge

Familiarity with Apache Spark and streaming applications

What you'll learn

Explore the capabilities of Spark's APIs for streaming and their key differences
Learn how to make the right choice for an application and how to architect and develop streaming pipelines that use one or both APIs to fulfill their requirements

Description

Fast data architectures provide an answer to enterprises’ increasing need to process and analyze continuous streams of data, which helps accelerate decision making and enables faster responses to changing characteristics of their market. Apache Spark is a popular framework for data analytics. Its capabilities in the streaming domain are represented by two APIs: the low-level Spark Streaming and the more declarative Structured Streaming, which builds upon the recent advances in Spark SQL query optimization and code generation.

Gerard Maas offers a critical overview of the differences in these APIs, from the API user experience to dealing with time and with state and machine learning capabilities, and shares practical guidance on picking one or combining both to implement resilient streaming pipelines.

Topics include:

How to get started (ease of development)
How to deal with time (both at the processing and event level)
How to deal with state (locally, distributed, and its relation to time)
How to migrate (functional coding strategies)
How to do ML (machine learning capabilities)

Gerard Maas

Lightbend

Gerard Maas is a senior software engineer at Lightbend, where he contributes to the Fast Data Platform and focuses on the integration of stream processing technologies. Previously, he held leading roles at several startups and large enterprises, building data science governance, cloud-native IoT platforms, and scalable APIs. He is the coauthor of Stream Processing with Apache Spark from O’Reilly. Gerard is a frequent speaker and contributes to small and large open source projects. In his free time, he tinkers with drones and builds personal IoT projects.

Website

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com