San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Schedule: Streaming and realtime analytics sessions

9:00 - 17:00 Monday, 29 April & Tuesday, 30 April

Professional Kafka development

Data Engineering and Architecture
Location: London Suite 2

Jesse Anderson (Big Data Institute)

Average rating:

(5.00, 1 rating)

Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.

9:00–12:30 Tuesday, 30 April 2019

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Data Engineering and Architecture
Location: Capital Suite 11

Robin Moffatt (Confluent)

Average rating:

(5.00, 5 ratings)

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You'll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL. Read more.

13:30–17:00 Tuesday, 30 April 2019

Architecture and algorithms for end-to-end streaming data processing

Data Engineering and Architecture, Streaming and IoT
Location: S11 A

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Average rating:

(3.00, 10 ratings)

Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.

13:30–17:00 Tuesday, 30 April 2019

Hands-on machine learning with Kafka-based streaming pipelines

Streaming and IoT
Location: Capital Suite 10

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Average rating:

(4.20, 5 ratings)

Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipelines and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.

11:15–11:55 Wednesday, 1 May 2019

Stream, stream, stream: Different streaming methods with Spark and Kafka

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Itai Yaffe (Nielsen)

Average rating:

(4.45, 11 ratings)

NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals. Read more.

12:05–12:45 Wednesday, 1 May 2019

Report card on streaming microservices

Data Engineering and Architecture, Expo Hall, Streaming and IoT
Location: Expo Hall 2 (Capital Hall N24)

Ted Dunning (MapR, now part of HPE)

Average rating:

(4.67, 6 ratings)

As a community, we have been pushing streaming architectures, particularly microservices, for several years now. But what are the results in the field? Ted Dunning shares several (anonymized) case histories, describing the good, the bad, and the ugly. In particular, Ted covers how several teams who were new to big data fared by skipping MapReduce and jumping straight into streaming. Read more.

14:05–14:45 Wednesday, 1 May 2019

Nielsen presents: Fun with Kafka, Spark, and offset management

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Simona Meriam (Nielsen)

Average rating:

(4.57, 7 ratings)

Simona Meriam explains how Nielsen Marketing Cloud (NMC) used to manage its Kafka consumer offsets against Spark-Kafka 0.8 consumer and why the company decided to upgrade from Spark-Kafka 0.8 to 0.10 consumer. Simona reviews the problems encountered during the upgrade and details the process that led to the solution. Read more.

14:55–15:35 Wednesday, 1 May 2019

Processing 10M samples a second to drive smart maintenance in complex IIoT systems

Data Engineering and Architecture, Expo Hall, Streaming and IoT
Location: Expo Hall 2 (Capital Hall N24)

Geir Engdahl (Cognite), Daniel Bergqvist (Google)

Average rating:

(4.00, 2 ratings)

Geir Engdahl and Daniel Bergqvist explain how Cognite is developing IIoT smart maintenance systems that can process 10M samples a second from thousands of sensors. You'll explore an architecture designed for high performance, robust streaming sensor data ingest, and cost-effective storage of large volumes of time series data as well as best practices learned along the way. Read more.

17:25–18:05 Wednesday, 1 May 2019

Mastering streaming and pipelines: Designing and supporting the nervous system of your company

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Ted Malaska (Capital One)

Average rating:

(4.12, 8 ratings)

The world of data is all about building the best path to support time and quality to value. 80% to 90% of the work is getting the data into the hands and tools that can create value. Ted Malaska takes you on a journey to investigate strategies and designs that can change the way your company looks and approaches data. Read more.

17:25–18:05 Wednesday, 1 May 2019

Infinite retention using storage offloading with Apache Pulsar

Data Engineering and Architecture
Location: Capital Suite 4

Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

This talk discusses how Apache Pulsar provides infinite retention of events in topics. We will discuss how the segment oriented architecture allows unlimited topic growth, how you can keep costs down by using tiered storage and how you can run ad-hoc queries on the topic using SQL. Read more.

11:15–11:55 Thursday, 2 May 2019

Streaming at Lyft

Data Engineering and Architecture, Expo Hall, Streaming and IoT
Location: Expo Hall 2 (Capital Hall N24)

Thomas Weise (Lyft)

Average rating:

(4.50, 14 ratings)

Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Lyft's systems need to track and react to event streams in real time to update locations, compute routes and estimates, balance prices, and more. Thomas Weise offers an overview of the streaming platform that powers these use cases. Read more.

12:05–12:45 Thursday, 2 May 2019

Schema on read and the new logging way

Data Engineering and Architecture
Location: S11 A

David Josephsen (Sparkpost)

Average rating:

(3.50, 2 ratings)

David Josephsen tells the story of how Sparkpost's reliability engineering team abandoned ELK for a DIY schema-on-read logging infrastructure. Join in to learn the architectural details, trials, and tribulations from the company's Internal Event Hose data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet, and AWS Athena to make logging sane. Read more.

14:05–14:45 Thursday, 2 May 2019

Simplicity at scale: How Cloudflare’s analyses some of the world’s largest DDoS attacks

Data Engineering and Architecture
Location: Capital Suite 10/11

Tom Walwyn (Cloudflare)

Average rating:

(4.00, 1 rating)

Cloudflare powers nearly 10 percent of all Internet requests worldwide, absorbing some of the largest DDoS attacks. Learn how we use ClickHouse and SQL to simplify our data pipelines on a global scale while experiencing over 10 million events per second. Read more.

14:55–15:35 Thursday, 2 May 2019

Performant time series data management and analytics with PostgreSQL

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Michael Freedman (TimescaleDB | Princeton University)

Average rating:

(4.75, 4 ratings)

Time series databases require ingesting high volumes of structured data, answering complex, performant queries for recent and historical time intervals, and performing specialized time-centric analysis and data management. Michael Freedman explains how to avoid these operational problems by reengineering Postgres to serve as a general data platform, including high-volume time series workloads. Read more.

16:35–17:15 Thursday, 2 May 2019

Executive Briefing: What it takes to use machine learning in fast data pipelines

Executive Briefing and best practices, Strata Business Summit
Location: Capital Suite 13

Dean Wampler (Anyscale)

Average rating:

(5.00, 4 ratings)

Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations. Read more.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com