Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Spark Streaming - The State of the Union, and Beyond

Tathagata Das (Databricks)
4:00pm–4:40pm Thursday, 02/19/2015
Spark in Action
Location: 210 C/G
Average rating: ****.
(4.17, 6 ratings)

Spark Streaming extends the core Apache Spark API to perform large-scale stream processing, which is revolutionizing the way Big “Streaming” Data application are being written. It is rapidly adopted by companies spread across various business verticals – ad and social network monitoring, real-time analysis of machine data, fraud and anomaly detections, etc. These companies are mainly adopting Spark Streaming because

  • Its simple, declarative batch-like API makes large-scale stream processing accessible to non-scientists,
  • Its unified API and a single processing engine (i.e. Spark core engine) allows a single cluster and a single set of operational processese cover the full spectrum of uses cases – batch, interactive and stream processing.
  • Its stronger, exactly-once semantics makes it easier to express and debug complex business logic,

In this talk, I am going to elaborate on such adoption stories, highlighting interesting use cases of Spark Streaming in the wild. In addition, I am also going to talk about (and perhaps also demonstrate) exciting new developments in Spark Streaming – the brand new Python API, “streaming” machine learning algorithms for simultaneous learning and prediction, etc.

Photo of Tathagata Das

Tathagata Das


Tathagata Das is a Apache Spark Committer and a member of the PMC. He is the lead developer of behind Spark Streaming, and currently employed at Databricks. Earlier, he has spent in the AMPLab of UC Berkeley, research about datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.