Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Streaming systems & real-time applications sessions

9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 07/08 Level: Intermediate
Tim Berglund (Confluent)
Average rating: ****.
(4.33, 3 ratings)
Tim Berglund leads this solid introduction to Apache Kafka as a streaming data platform. You'll cover the internal architecture, APIs, and platform components like Kafka Connect and Kafka Streams, then finish with an exercise processing streaming data using KSQL, the new SQL-like declarative stream processing language for Kafka. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Data Platforms
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Average rating: ***..
(3.12, 8 ratings)
Arun Kejariwal and Karthik Ramasamy lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, covering messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. They also share case studies from the IoT, gaming, and healthcare and their experience operating these systems at internet scale. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 23/24 Level: Intermediate
Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
Average rating: ***..
(3.67, 3 ratings)
Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead. You'll also discover a few ML model serving ideas along the way. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Gerard Maas (Lightbend)
Average rating: *****
(5.00, 1 rating)
Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences with regard to key aspects of a streaming application: API usability, dealing with time, dealing with state and machine learning capabilities, and more. You'll learn when to pick one over the other or combine both to implement resilient streaming pipelines. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Fabian Hueske (Ververica)
Average rating: *****
(5.00, 1 rating)
Fabian Hueske discusses why SQL is a great approach to unify batch and stream processing. He gives an update on Apache Flink's SQL support and shares some interesting use cases from large-scale production deployments. Finally, Fabian presents Flink's new query service that enables users and applications to submit streaming and batch SQL queries and retrieve low-latency updated results. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Beginner
Karthik Ramasamy (Streamlio), Andrew Jorgensen (Google)
Average rating: ****.
(4.00, 1 rating)
Streaming systems like Apache Heron are being used for an increasingly broad array of applications. Karthik Ramasamy and Andrew Jorgensen offer an overview of Fabric Answers, which provides real-time insights to mobile developers to improve their product experience at Google Fabric using Apache Heron. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Bill Chambers (Databricks)
Average rating: ***..
(3.00, 1 rating)
Streaming big data is a rapidly growing field but currently involves a lot of operational complexity and expertise. Bill Chambers shares a decision making framework for determining the best tools and technologies for successfully deploying and maintaining streaming data pipelines to solve business problems and offers an overview of Apache Spark’s Structured Streaming processing engine. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Intermediate
Brian Wu (AppNexus)
Average rating: *****
(5.00, 1 rating)
Automating the success of digital ad campaigns is complicated and comes with the risk of wasting the advertiser's budget or a trader's margin and time. Brian Wu describes the evolution of Inventory Discovery, a streaming control system of eligibility, prioritization, and real-time evaluation that helps digital advertisers hit their performance goals with AppNexus. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Blockchain and decentralization, Data Platforms
Dan Harple (Context Labs)
Dan Harple explains how distributed systems are being influenced by and are influencing operational, financial, and social impact requirements of a wide range of enterprises and how trust in these distributed systems is being challenged, elevated, and resolved by engineers and architects today. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 07/08 Level: Beginner
Secondary topics:  Data Integration and Data Pipelines
Nishith Agarwal (Uber), Balaji Varadarajan (Uber), Vinoth Chandar (Apache Hudi)
Uber has a real need to provide faster, fresher data to its data consumers and products, which are running hundreds of thousands of analytical queries every day. Nishith Agarwal, Balaji Varadarajan, and Vinoth Chandar share the design, architecture, and use cases of the second-generation of Hudi, an analytical storage engine designed to serve such needs and beyond. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Beginner
Secondary topics:  Temporal data and time-series analytics, Transportation and Logistics
Thomas Weise (Lyft), Mark Grover (Lyft)
Average rating: **...
(2.50, 2 ratings)
Thomas Weise and Mark Grover explain how Lyft uses its streaming platform to detect and respond to anomalous events, using data science tools for machine learning and a process that allows for fast and predictable deployment. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 07/08 Level: Intermediate
Jun Rao (Confluent)
Average rating: ****.
(4.00, 1 rating)
The controller is the brain of Apache Kafka and is responsible for maintaining the consistency of the replicas. Jun Rao outlines the main data flow in the controller, then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 14 Level: Non-technical
Dean Wampler (Anyscale)
Streaming data systems, so called "fast data," promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler shares what you need to know to exploit fast data successfully. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 09 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines, Data Platforms, Financial Services
Kevin Lu (PayPal), Maulin Vasavada (PayPal), Na Yang (PayPal)
Average rating: ****.
(4.00, 3 ratings)
PayPal is one of the biggest Kafka users in the industry; it manages and maintains over 40 production Kafka clusters in three geodistributed data centers and supports 400 billion Kafka messages a day. Kevin Lu, Maulin Vasavada, and Na Yang explore the management and monitoring PayPal applies to Kafka, from client-perceived statistics to configuration management, failover, and data loss auditing. Read more.