Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Schedule: Stream processing and analytics sessions

Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 4
Level: Intermediate
Tim Berglund (Confluent)
Average rating: ***..
(3.50, 2 ratings)
Tim Berglund demonstrates how to use Kafka Connect and Kafka Streams to build real-world, real-time streaming data pipelines—using Kafka Connect to ingest data from a relational database into Kafka topics as the data is being generated and then using Kafka Streams to process and enrich the data in real time before writing it out for further analysis. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 8
Level: Advanced
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment)
Average rating: *****
(5.00, 6 ratings)
Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, and Mark Grover explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
10:2510:40 Wednesday, 24 May 2017
Location: Auditorium
Secondary topics:  AI
M. C. Srivas (Uber)
Average rating: ***..
(3.95, 19 ratings)
M. C. Srivas covers the technologies underpinning the big data architecture at Uber and explores some of the real-time problems Uber needs to solve to make ride sharing as smooth and ubiquitous as running water, explaining how they are related to real-time big data analytics. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 24 May 2017
Location: Capital Suite 8/9
Level: Beginner
Michael Noll (Confluent)
Average rating: ****.
(4.00, 11 ratings)
Michael Noll explains how Apache Kafka helps you radically simplify your data processing architectures by building normal applications to serve your real-time processing needs rather than building clusters or similar special-purpose infrastructure—while still benefiting from properties typically associated exclusively with cluster technologies. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 24 May 2017
Location: Capital Suite 8/9
Level: Intermediate
Fabian Hueske (data Artisans)
Average rating: ***..
(3.50, 2 ratings)
Although the most widely used language for data analysis, SQL is only slowly being adopted by open source stream processors. One reason is that SQL's semantics and syntax were not designed with streaming data in mind. Fabian Hueske explores Apache Flink's two relational APIs for streaming analytics—standard SQL and the LINQ-style Table API—discussing their semantics and showcasing their usage. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 24 May 2017
Location: Capital Suite 8/9
Level: Intermediate
Tristan Stevens (Cloudera)
Average rating: ***..
(3.67, 3 ratings)
Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest over 1 million events per second. Tristan Stevens discusses the architecture, deployment, and performance-tuning techniques that enable the system to perform at IoT-scale on modest hardware and at a very low cost. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 24 May 2017
Location: Hall S21/23 (A)
Level: Intermediate
Dean Wampler (Lightbend)
Average rating: ****.
(4.57, 7 ratings)
"Stream" is a buzzword for several things that share the idea of timely handling of never-ending data. Big data architectures are evolving to be stream oriented. Microservice architectures are inherently message driven. Dean Wampler defines "stream" based on characteristics for such systems, using specific tools as examples, and argues that big data and microservices architectures are converging. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 24 May 2017
Location: Capital Suite 8/9
Level: Intermediate
Ben Stopford (Confluent), Ismael Juma (Confluent)
Dynamic data rebalancing is a complex process. Ben Stopford and Ismael Juma explain how to do data rebalancing and use replication quotas in the latest version of Apache Kafka. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Secondary topics:  AI, IoT, Logistics, Streaming
Level: Beginner
Dr.-Ing. Michael Nolting (Volkswagen Commercial Vehicles)
Average rating: *....
(1.67, 6 ratings)
It is nearly impossible to sample enough training data initially to prevent autonomous driving accidents on the road, as has been sadly proven by Tesla’s autopilot. Michael Nolting explains that to overcome this problem, a real-time system has to be created to detect dangerous runtime situations in real time, a process much like website monitoring. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 24 May 2017
Location: Capital Suite 8/9
Level: Beginner
Sanjeev Kulkarni (Streamlio), Maosong Fu (Twitter)
Twitter processes billions of events per day at the instant the data is generated. To achieve real-time performance, Twitter employs Heron, an open source streaming engine tailored for large-scale environments. Sanjeev Kulkarni and Maosong Fu share several optimizations implemented in Heron to improve throughput by 5x and reduce latency by 50–60%. Read more.
Add to your personal schedule
11:1511:55 Thursday, 25 May 2017
Location: Capital Suite 8/9
Level: Beginner
Tyler Akidau (Google)
Average rating: ***..
(3.80, 5 ratings)
The world of big data involves an ever-changing field of players. Much as SQL is a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. Tyler Akidau explains how this vision has been realized and discusses the challenges that lie ahead. Read more.
Add to your personal schedule
14:0514:45 Thursday, 25 May 2017
Location: Capital Suite 8/9
Level: Advanced
Aljoscha Krettek (data Artisans)
Average rating: ***..
(3.50, 2 ratings)
Apache Beam's new State API brings scalability and consistency to fine-grained stateful processing while remaining portable to any Beam runner. Aljoscha Krettek introduces the new state and timer features in Beam and shows how to use them to express common real-world use cases in a backend-agnostic manner. Read more.
Add to your personal schedule
14:0514:45 Thursday, 25 May 2017
Location: Capital Suite 10/11
Level: Advanced
Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment)
Average rating: ****.
(4.00, 4 ratings)
Any nontrivial streaming app requires that you consider a number of important topics, but questions like how to manage offsets or state often go unanswered. Mark Grover and Ted Malaska share practices that no one talks about when you start writing a streaming app but that you'll inevitably need to learn along the way. Read more.
Add to your personal schedule
14:5515:35 Thursday, 25 May 2017
Location: Capital Suite 12
Level: Beginner
Matthias Niehoff (codecentric AG)
Average rating: ****.
(4.00, 4 ratings)
Matthias Niehoff shares lessons learned working with Spark, Cassandra, and the Spark-Cassandra connector and best practices drawn from his work on multiple big and fast data projects, as well as challenges encountered along the way. Read more.