Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Streaming sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26
SOLD OUT
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26
SOLD OUT
Joseph Kambourakis (Databricks)
Joseph Kambourakis walks you through using Apache Spark to perform exploratory data analysis (EDA), developing machine learning pipelines, and using the APIs and algorithms available in the Spark MLlib DataFrames API. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Ian Wrigley (StreamSets)
Ian Wrigley demonstrates how Kafka Connect and Kafka Streams can be used together to build real-world, real-time streaming data pipelines. Using Kafka Connect, you'll ingest data from a relational database into Kafka topics as the data is being generated and then process and enrich the data in real time using Kafka Streams before writing it out for further analysis. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Avrilia Floratau (Microsoft), Ashvin Agrawal (Microsoft), Arun Kejariwal (MZ), Sijie Guo (Streamlio)
Karthik Ramasamy, Sanjeev Kulkarni, Avrilia Floratau, Ashvin Agrawal, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming systems, algorithms, and deployment architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Data Engineering & Architecture, Stream processing and analytics
Location: 1E 07/08 Level: Intermediate
Dean Wampler (Lightbend)
While stream processing is now popular, streaming architectures must be more reliable and scalable than ever before—more like microservice architectures in fact. Dean Wampler defines "stream" based on characteristics for such systems, using specific tools like Kafka, Spark, Flink, and Akka as examples, and argues that big data and microservices architectures are converging. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Michael Freedman (TimescaleDB | Princeton)
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Data Engineering & Architecture, Stream processing and analytics
Location: 1E 07/08 Level: Intermediate
Dustin Cote (Confluent)
Dustin Cote shares his experience troubleshooting Apache Kafka in production environments and explains how to avoid pitfalls like message loss or performance degradation in your environment. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Data Engineering & Architecture, Real-time applications
Location: 1A 23/24 Level: Intermediate
Todd Lipcon (Cloudera)
To date, mutable big data storage has primarily been the domain of nonrelational (NoSQL) systems such as Apache HBase. However, demand for real-time analytic architectures has led big data back to a familiar friend: relationally structured data storage systems. Todd Lipcon explores the advantages of relational storage and reviews new developments, including Google Cloud Spanner and Apache Kudu. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Stephen Devine (Big Fish Games), Kalah Brown (Big Fish Games)
Companies are increasingly interested in processing and analyzing live-streaming data. The Hadoop ecosystem includes platforms and software library frameworks to support this work, but these components require correct architecture, performance tuning, and customization. Stephen Devine and Kalah Brown explain how they used Spark, Flume, and Kafka to build a live-streaming data pipeline. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Paul Curtis (MapR Technologies)
A microservices architecture benefits from the agility of containers for convenient, predictable deployment of applications, while persistent, performant message streaming makes both work better. Paul Curtis explores these infrastructure components and discusses the design of highly scalable real-world systems that take advantage of this powerful triad. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Data Engineering & Architecture, Stream processing and analytics
Location: 1E 07/08 Level: Intermediate
Fabian Hueske (data Artisans)
Although the most widely used language for data analysis, SQL is only slowly being adopted by open source stream processors. One reason is that SQL's semantics and syntax were not designed with streaming data in mind. Fabian Hueske explores Apache Flink's two relational APIs for streaming analytics—standard SQL and the LINQ-style Table API—discussing their semantics and showcasing their usage. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Karthik Ramasamy (Streamlio), Supun Kamburugamuve (Indiana University)
Modern enterprises are data driven and want to move at light speed. To achieve real-time performance, financial applications use streaming infrastructures for low latency and high throughput. Twitter Heron is an open source streaming engine with low latency around 14 ms. Karthik Ramasamy and Supun Kamburugamuvee explain how they ported Heron to Infiniband to achieve latencies as low as 7 ms. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Reuven Lax (Google)
Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. Reuven Lax offers an overview of Beam basic concepts and demonstrates that portability in action. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 12/14 Level: Intermediate
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Gwen Shapira (Confluent)
Gwen Shapira explains how the three realities of modern programming—the explosion of data and data systems, building business processes as microservices instead of monolithic applications, and the rise of the public cloud—affect how developers and companies operate today and why companies across all industries are turning to streaming data and Apache Kafka for mission-critical applications. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Michael Crutcher (Cloudera), Ryan Lippert (Cloudera)
A long time ago in a data center far, far away, we deployed complex lambda architectures as the backbone of our IoT solutions. Though hard, they enabled collection of real-time sensor data and slightly delayed analytics. Michael Crutcher and Ryan Lippert explain why Apache Kudu, a relational storage layer for fast analytics on fast data, is the key to unlocking the value in IoT data. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Dean Wampler (Lightbend), Jun Rao (Confluent), Karthik Ramasamy (Streamlio), Pramod Immaneni (DataTorrent)
In a series of three 11-minute presentations, key members of Apache Kafka, Heron, and Apache Apex discuss their respective implementations of exactly once delivery and semantics. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data Engineering & Architecture, Real-time applications
Location: 1E 09 Level: Beginner
Matteo Merli (Streamlio), Sijie Guo (Streamlio)
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Matteo Merli and Sijie Guo offer an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data Engineering & Architecture, Stream processing and analytics
Location: 1E 07/08 Level: Intermediate
Tyler Akidau (Google)
What does it mean to execute streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing? And how does all of this relate to the programmatic frameworks we’re all familiar with? Tyler Akidau answers these questions and more as he walks you through key concepts underpinning data processing in general. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Stream processing and analytics
Location: 1E 15/16 Level: Intermediate
Sahaana Suri (Stanford University)
Sahaana Suri offers an overview of MacroBase, a new analytics engine from Stanford designed to prioritize the scarcest resource in large-scale, fast-moving data streams: human attention. MacroBase allows reconfigurable, real-time root-cause analyses that have already diagnosed issues in production streams in mobile, data center, and industrial applications. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 12/14 Level: Intermediate
Josh Patterson (Skymind), Kirit Basu (StreamSets )
Enterprises building data lakes often have to deal with very large volumes of image data that they have collected over the years. Josh Patterson and Kirit Basu explain how some of the most sophisticated big data deployments are using convolutional neural nets to automatically classify images and add rich context about the content of the image, in real time, while ingesting data at scale. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1E 07/08 Level: Intermediate
Tim Berglund (Confluent)
Tim Berglund offers a thorough introduction to the Streams API, an important recent addition to Kafka that lets us build sophisticated stream processing systems that are as scalable and fault tolerant as Kafka itself—and also happen to align quite well with the microservices sensibilities that are so common in contemporary architectural thinking. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Services such as YouTube, Netflix, and Spotify popularized streaming in different industry segments, but these services do not center around live data—best exemplified by sensor data—which will be increasingly important in the future. Arun Kejariwal, Francois Orsini, and Dhruv Choudhary demonstrate how to leverage Satori to collect, discover, and react to live data feeds at ultralow latencies. Read more.