Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Taking Spark Streaming to the next level with DataFrames

Tathagata Das (Databricks)
11:50am–12:30pm Thursday, 03/31/2016
Spark & Beyond

Location: 210 A/E
Tags: real-time
Average rating: ****.
(4.54, 13 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of streaming architecture.


Tathagata Das introduces Streaming DataFrames, the next evolution of Spark Streaming. Started in 2012, Spark Streaming was one of the first projects to unify batch and streaming. The design of Streaming DataFrames draws upon users’ experiences deploying streaming systems to unify an additional dimension missing in many existing streaming systems: interactive analysis. In addition, it provides enhanced support for out-of-order (delayed) data, zero-latency decision making and integration with existing enterprise data warehouses. Tathagata explores common patterns for streaming applications and discusses how Streaming DataFrames can simplify many real-world use cases.

Photo of Tathagata Das

Tathagata Das


Tathagata Das is an Apache Spark committer and a member of the PMC. He is the lead developer behind Spark Streaming, which he started while a PhD student in the UC Berkeley AMPLab, and is currently employed at Databricks. Prior to Databricks, Tathagata worked at the AMPLab, conducting research about data-center frameworks and networks with Scott Shenker and Ion Stoica.