Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Streaming big data in the cloud: What to consider and why

Bill Chambers (Databricks), michael dddd (Databricks)

11:50am–12:30pm Wednesday, March 7, 2018

Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Location: 230 A

Secondary topics: Graphs and Time-series

Average rating:

(4.60, 5 ratings)

Who is this presentation for?

Software engineers, data engineers, and streaming data engineers

Prerequisite knowledge

A basic understanding of Spark, streaming and stream processing concepts, and big data

What you'll learn

Explore considerations for leveraging Apache Spark's Structured Streaming processing engine

Description

Running streaming workloads successfully is a challenge regardless of whether you’re deploying on-premises or in the cloud. While buying a managed service is an option, it’s usually quite expensive. Therefore, many companies opt for open source streaming engines like Apache Spark’s Structured Streaming.

Apache Spark’s Structured Streaming consolidates all big data processing under a unified API. Built on the foundation of the Spark SQL engine, not only does Structured Streaming allow developers to express the same queries for batch as for streaming, but it also allows for different execution strategies for streaming processing, including microbatching for high throughput or continuous processing for low latency.

William Chambers and Michael Armbrust discuss the motivation and basics of Apache Spark’s Structured Streaming processing engine and share lessons they’ve learned running hundreds of Structured Streaming workloads in the cloud. Along the way, William and Michael deep dive into the internals of the Structured Streaming engine and explain why it’s suitable for a variety of uses cases.

Topics include:

How to successfully create business value with streaming
What makes a successful streaming use case and what doesn’t
A decision framework for choosing a streaming engine and architecture
The best advantages of streaming in the cloud (both storage and compute)
How to leverage cloud storage like S3 and Azure Blob Store for streaming workloads
How to successfully monitor and maintain your streaming applications
Future development

Bill Chambers

Databricks

William Chambers is a product manager at Databricks, where he works on Structured Streaming and data science products. He is lead author of Spark: The Definitive Guide, coauthored with Matei Zaharia. Bill also created SparkTutorials.net as a way to teach Apache Spark basics. Bill holds a master’s degree in information management and systems from UC Berkeley’s School of Information. During his time at school, Bill was also creator of the Data Analysis in Python with pandas course for Udemy and cocreator of and first instructor for Python for Data Science, part of UC Berkeley’s Masters of Data Science program.

Website

michael dddd

Databricks

Michael Armbrust is the lead developer of the Spark SQL and Structured Streaming projects at Databricks. Michael’s interests broadly include distributed systems, large-scale structured storage, and query optimization. Michael holds a PhD from UC Berkeley, where his thesis focused on building systems that allow developers to rapidly build scalable interactive applications and specifically defined the notion of scale independence.

Website

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com