Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Streaming systems & real-time applications sessions

9:00am–12:30pm Tuesday, 09/11/2018

Stream processing with Kafka and KSQL

Location: 1E 07/08 Level: Intermediate

Tim Berglund (Confluent)

Average rating:

(4.33, 3 ratings)

Tim Berglund leads this solid introduction to Apache Kafka as a streaming data platform. You'll cover the internal architecture, APIs, and platform components like Kafka Connect and Kafka Streams, then finish with an exercise processing streaming data using KSQL, the new SQL-like declarative stream processing language for Kafka. Read more.

9:00am–12:30pm Tuesday, 09/11/2018

Designing modern streaming data applications

Location: 1E 12/13 Level: Intermediate

Secondary topics: Data Platforms

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)

Average rating:

(3.12, 8 ratings)

Arun Kejariwal and Karthik Ramasamy lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, covering messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. They also share case studies from the IoT, gaming, and healthcare and their experience operating these systems at internet scale. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

Hands-on Kafka streaming microservices with Akka Streams and Kafka Streams

Location: 1A 23/24 Level: Intermediate

Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

Average rating:

(3.67, 3 ratings)

Dean Wampler and Boris Lublinsky walk you through building streaming apps as microservices using Akka Streams and Kafka Streams. Dean and Boris discuss the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you'll know when to choose them instead. You'll also discover a few ML model serving ideas along the way. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Processing fast data with Apache Spark: A tale of two APIs

Location: 1E 07/08 Level: Intermediate

Gerard Maas (Lightbend)

Average rating:

(5.00, 1 rating)

Apache Spark has two streaming APIs: Spark Streaming and Structured Streaming. Gerard Maas offers a critical overview of their differences with regard to key aspects of a streaming application: API usability, dealing with time, dealing with state and machine learning capabilities, and more. You'll learn when to pick one over the other or combine both to implement resilient streaming pipelines. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Why and how to leverage the power and simplicity of SQL on Apache Flink

Location: 1E 07/08 Level: Intermediate

Fabian Hueske (Ververica)

Average rating:

(5.00, 1 rating)

Fabian Hueske discusses why SQL is a great approach to unify batch and stream processing. He gives an update on Apache Flink's SQL support and shares some interesting use cases from large-scale production deployments. Finally, Fabian presents Flink's new query service that enables users and applications to submit streaming and batch SQL queries and retrieve low-latency updated results. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Building Fabric Answers using Apache Heron

Location: 1E 07/08 Level: Beginner

Karthik Ramasamy (Streamlio), Andrew Jorgensen (Google)

Average rating:

(4.00, 1 rating)

Streaming systems like Apache Heron are being used for an increasingly broad array of applications. Karthik Ramasamy and Andrew Jorgensen offer an overview of Fabric Answers, which provides real-time insights to mobile developers to improve their product experience at Google Fabric using Apache Heron. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Streaming big data in the cloud: What to consider and why

Location: 1E 07/08 Level: Intermediate

Bill Chambers (Databricks)

Average rating:

(3.00, 1 rating)

Streaming big data is a rapidly growing field but currently involves a lot of operational complexity and expertise. Bill Chambers shares a decision making framework for determining the best tools and technologies for successfully deploying and maintaining streaming data pipelines to solve business problems and offers an overview of Apache Spark’s Structured Streaming processing engine. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

AppNexus's stream-based control system for automated buying of digital ads

Location: 1E 07/08 Level: Intermediate

Brian Wu (AppNexus)

Average rating:

(5.00, 1 rating)

Automating the success of digital ad campaigns is complicated and comes with the risk of wasting the advertiser's budget or a trader's margin and time. Brian Wu describes the evolution of Inventory Discovery, a streaming control system of eligibility, prioritization, and real-time evaluation that helps digital advertisers hit their performance goals with AppNexus. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Architectural principles for building trusted, real-time, distributed IoT systems

Location: Expo Hall Level: Intermediate

Secondary topics: Blockchain and decentralization, Data Platforms

Dan Harple (Context Labs)

Dan Harple explains how distributed systems are being influenced by and are influencing operational, financial, and social impact requirements of a wide range of enterprises and how trust in these distributed systems is being challenged, elevated, and resolved by engineers and architects today. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Hudi: Unifying storage and serving for batch and near-real-time analytics

Location: 1E 07/08 Level: Beginner

Secondary topics: Data Integration and Data Pipelines

Nishith Agarwal (Uber), Balaji Varadarajan (Uber), Vinoth Chandar (Apache Hudi)

Uber has a real need to provide faster, fresher data to its data consumers and products, which are running hundreds of thousands of analytical queries every day. Nishith Agarwal, Balaji Varadarajan, and Vinoth Chandar share the design, architecture, and use cases of the second-generation of Hudi, an analytical storage engine designed to serve such needs and beyond. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Near-real-time anomaly detection at Lyft

Location: 1E 07/08 Level: Beginner

Secondary topics: Temporal data and time-series analytics, Transportation and Logistics

Thomas Weise (Lyft), Mark Grover (Lyft)

Average rating:

(2.50, 2 ratings)

Thomas Weise and Mark Grover explain how Lyft uses its streaming platform to detect and respond to anomalous events, using data science tools for machine learning and a process that allows for fast and predictable deployment. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

A deep dive into Kafka controller

Location: 1E 07/08 Level: Intermediate

Jun Rao (Confluent)

Average rating:

(4.00, 1 rating)

The controller is the brain of Apache Kafka and is responsible for maintaining the consistency of the replicas. Jun Rao outlines the main data flow in the controller, then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Executive Briefing: What you need to know about fast data

Location: 1E 14 Level: Non-technical

Dean Wampler (Anyscale)

Streaming data systems, so called "fast data," promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler shares what you need to know to exploit fast data successfully. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Kafka at PayPal: Enabling 400 billion messages a day

Location: 1E 09 Level: Intermediate

Secondary topics: Data Integration and Data Pipelines, Data Platforms, Financial Services

Kevin Lu (PayPal), Maulin Vasavada (PayPal), Na Yang (PayPal)

Average rating:

(4.00, 3 ratings)

PayPal is one of the biggest Kafka users in the industry; it manages and maintains over 40 production Kafka clusters in three geodistributed data centers and supports 400 billion Kafka messages a day. Kevin Lu, Maulin Vasavada, and Na Yang explore the management and monitoring PayPal applies to Kafka, from client-perceived statistics to configuration management, failover, and data loss auditing. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com