Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Streaming and realtime analytics sessions

Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Data Engineering and Architecture
Location: Capital Suite 16
Jesse Anderson (Big Data Institute)
Takes a participant through an in-depth look at Apache Kafka. We show how Kafka works and how to create real-time systems with it. It shows how to create consumers and publishers in Kafka. The we look at Kafka’s ecosystem and how each one is used. We show how to use Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Data Engineering and Architecture
Location: Capital Suite 11
Robin Moffatt (Confluent)
In this workshop you will learn the architectural reasoning for Apache Kafka and the benefits of real-time integration, and then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). In this tutorial we shall lead the audience through a journey of the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline - messaging, compute and storage - for real-time data and algorithms to extract insights - e.g., heavy-hitters, quantiles - from data streams. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Streaming and IoT
Location: Capital Suite 2/3
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Itai Yaffe (Nielsen)
At Nielsen Marketing Cloud, we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. To achieve that, we need to ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner. In this talk, we will discuss how we continuously transform our data infrastructure to support these goals. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Ted Dunning (MapR)
As a community, we have been pushing streaming architectures, particularly microservices, for several years now. But what are the results in the field? I will describe several (anonymized) case histories and describe the good, the bad and the ugly. In particular, I will describe how several teams who were new to big data fared by skipping map-reduce and jumping straight into streaming. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Simona Meriam (Nielsen)
Ingesting billions of events per day into our big data stores we need to do it in a scalable, cost-efficient and consistent way. When working with Spark and Kafka the way you manage your consumer offsets has a major implication on data consistency. We will go in depths of the solution we ended up implementing and discuss the working process, the dos and don'ts that led us to its final design. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Geir Endahl (Cognite), Daniel Bergqvist (Google)
Learn how Cognite is developing IIoT smart maintenance systems that can process 10M samples/second from thousands of sensors. We’ll review an architecture designed for high performance, robust streaming sensor data ingest and cost-effective storage of large volumes of time series data, best practices for aggregation and fast queries, and achieving high-performance with machine learning. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Ted Malaska (Capital One)
In the world of data it is all about building the best path to support time/quality to value. 80% to 90% of the work is getting the data into the hands and tools that can create value. This talk will take us on a journey of different patterns and solution that can work at the largest of companies. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Thomas Weise (Lyft)
Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Our systems need to track and react to event streams in real-time, to update locations, compute routes and estimates, balance prices and more. The streaming platform at Lyft powers these use cases with development frameworks and deployment stack that are based on Apache Flink and Beam. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
David Josephsen (Sparkpost)
This is the story of how Sparkpost Reliability Engineering abandoned ELK for a DIY Schema-On-Read logging infrastructure. We share architectural details and tribulations from our _Internal Event Hose_ data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet and AWS Athena to make logging sane. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Erik Nordström (Timescale)
Requirements of time-series databases include ingesting high volumes of structured data; answering complex, performant queries for both recent & historical time intervals; & performing specialized time-centric analysis & data management. I explain how one can avoid these operational problems by re-engineering Postgres to serve as a general data platform, including high-volume time-series workloads Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Dean Wampler (Lightbend)
Your team is building Machine Learning capabilities. I'll discuss how you can integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed. There are big challenges. How do you build long-running services that are very reliable and scalable? How do you combine a spectrum of very different tools, from data science to operations? Read more.