Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Real-time conference sessions

13:30–17:00 Wednesday, 1/06/2016
Patrick McFadin (DataStax)
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.
9:30–10:00 Wednesday, 1/06/2016
Ira Cohen (Anodot)
Time series and event data form the basis for real-time insights about the performance of businesses such as ecommerce, the IoT, and web services, but gaining these insights involves designing a learning system that scales to millions and billions of data streams. Ira Cohen outlines such a system that performs real-time machine learning and analytics on streams at massive scale.
14:55–15:35 Thursday, 2/06/2016
Ted Dunning (MapR)
Telecom operators need to find operational anomalies in their networks very quickly. Spark plus a streaming architecture can solve these problems very nicely. Ted Dunning presents a practical architecture as well as some detailed algorithms for detecting anomalies in event streams. These algorithms are simple and quite general and can be applied across a wide variety of situations.
9:00–17:00 Tuesday, 31/05/2016
Tim Berglund (Confluent), Tanya Gallagher (DataStax)
O’Reilly Media and DataStax have partnered to create a 2-day developer course for Apache Cassandra. Get trained as a Cassandra developer at Strata + Hadoop World in London, be recognized for your NoSQL expertise, and benefit from the skyrocketing demand for Cassandra developers.
9:00–17:00 Wednesday, 1/06/2016
Tim Berglund (Confluent), Tanya Gallagher (DataStax)
O’Reilly Media and DataStax have partnered to create a 2-day developer course for Apache Cassandra. Get trained as a Cassandra developer at Strata + Hadoop World in London, be recognized for your NoSQL expertise, and benefit from the skyrocketing demand for Cassandra developers.
14:55–15:35 Friday, 3/06/2016
Apache Eagle is an open source monitoring solution to instantly identify access to sensitive data, recognize malicious activities, and take action. Arun Karthick Manoharan, Edward Zhang, and Chaitali Gupta explain how Eagle helps secure a Hadoop cluster using policy-based and machine-learning user-profile-based detection and alerting.
17:25–18:05 Thursday, 2/06/2016
Tyler Akidau (Google), Kenneth Knowles (Google), Slava Chernyak (Google)
Apache Beam/Google Cloud Dataflow engineers Tyler Akidau, Kenneth Knowles, and Slava Chernyak will be on hand to answer a wide range of detailed questions about stream processing. Even if you don’t have a specific question, join in to hear what others are asking.
16:35–17:15 Friday, 3/06/2016
Alasdair Allan (Babilim Light Industries)
Privacy is no longer "a social norm," but this may not survive as the Internet of Things grows. Big data is all very well when it is harvested in the background. But it's a very different matter altogether when your things tattle on you behind your back. Alasdair Allan explains how the rush to connect devices to the Internet has led to sloppy privacy and security and why that can't continue.
13:30–14:00 Wednesday, 1/06/2016
Anomaly detection is a hot topic in data and can be applied to various fields. Anomaly detection faces challenges common to all big data projects but also deals with higher uncertainty and more difficult measurements, all while operating in real time. Alessandra Staglianò explains how those challenges translate to the real world and how to overcome them with the latest data science tools.
14:55–15:35 Friday, 3/06/2016
Stephan Ewen (data Artisans), Kostas Tzoumas (data Artisans)
Data stream processing is emerging as a new paradigm for the data infrastructure. Streaming promises to unify and simplify many existing applications while simultaneously enabling new applications on both real-time and historical data. Stephan Ewen and Kostas Tzoumas introduce the data streaming paradigm and show how to build a set of simple but representative applications using Apache Flink.
9:00–12:30 Wednesday, 1/06/2016
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Gwen Shapira (Confluent), Ted Malaska (Capital One)
Jonathan Seidman, Mark Grover, Gwen Shapira, and Ted Malaska walk attendees through an end-to-end case study of building a fraud detection system, providing a concrete example of how to architect and implement real-time systems.
11:15–11:55 Thursday, 2/06/2016
Todd Lipcon (Cloudera)
Todd Lipcon investigates the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals and offers an overview of Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
14:55–15:35 Thursday, 2/06/2016
Gopal GopalKrishnan (OSIsoft, LLC.), Hoa Tram (OSIsoft)
For decades, industrial manufacturing has dealt with large volumes of sensor data and handled a variety of data from the various manufacturing operations management (MOM) systems in production, quality, maintenance, and inventory. Gopal GopalKrishnan and Hoa Tram offer lessons learned from applying big data ecosystem tools to oil and gas, energy, utilities, metals, and mining use cases.
14:05–14:45 Friday, 3/06/2016
Neha Narkhede (Confluent)
Neha Narkhede offers an overview of Kafka Streams, a new stream processing library natively integrated with Apache Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such, it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka.
16:35–17:15 Friday, 3/06/2016
Ignacio Manuel Mulas Viela (Ericsson), Nicolas Seyvet (Ericsson AB)
ICT systems are growing in size and complexity. Monitoring and orchestration mechanisms need to evolve and provide richer capabilities to help handle them. Ignacio Manuel Mulas Viela and Nicolas Seyvet analyze a stream of telemetry/logs in real time by following the Kappa architecture paradigm, using machine-learning algorithms to spot unexpected behaviors from an in-production cloud system.
17:25–18:05 Thursday, 2/06/2016
Jim Scott (MapR Technologies)
Application messaging isn’t new. Solutions like message queues have been around for a long time, but newer solutions like Kafka have emerged as high-performance, high-scalability alternatives that integrate well with Hadoop. Should distributed messaging systems like Kafka be considered replacements for legacy technologies? Jim Scott answers that question by delving into architectural trade-offs.
11:15–11:55 Friday, 3/06/2016
Flavio Junqueira (Dell EMC)
Exactly-once semantics is a highly desirable property for streaming analytics. Ideally, all applications process events once and never twice, but making such guarantees in general either induces significant overhead or introduces other inconveniences, such as stalling. Flavio Junqueira explores what's possible and reasonable for streaming analytics to achieve when targeting exactly-once semantics.
11:15–11:55 Friday, 3/06/2016
Gwen Shapira (Confluent), Todd Palino (LinkedIn)
Join Gwen Shapira, Todd Palino, and other Apache Kafka experts for a fast-paced conversation on Apache Kafka use cases, troubleshooting Apache Kafka, using Kafka in stream architectures, and when to avoid Kafka.
14:05–14:45 Thursday, 2/06/2016
Eric Kramer (Dataiku)
Dataiku and Bioserenity have built a system for an at-home, real-time EEG and, in the process, created an open source stack for handling the data from connected devices. Eric Kramer offers an overview of the tools Dataiku and Bioserenity use to handle large amounts of time series data and explains how they created a real-time web app that processes petabytes of data generated by connected devices.
11:15–11:55 Thursday, 2/06/2016
Frank Saeuberlich (Teradata), Eliano Marques (Think Big Analytics)
The IoT combined with big data analytics enables organizations to track new patterns and signs and bring data together that previously was not only a challenge to integrate but also way too expensive. Frank Saeuberlich and Eliano Marques explain why data management, data integration, and multigenre analytics are foundational to driving business value from IoT initiatives.
12:05–12:45 Thursday, 2/06/2016
David Talby (Pacific AI), Claudiu Branzan (Accenture AI)
David Talby and Claudiu Branzan offer a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records. Infrastructure components include Kafka, Spark Streaming, Spark, Titan, and Elasticsearch; data science components include custom UIMA annotators, curated taxonomies, machine-learned dynamic ontologies, and real-time inferencing.
12:05–12:45 Thursday, 2/06/2016
Vida Ha (Databricks), Prakash Chockalingam (Databricks)
So you’ve successfully tackled big data. Now let Vida Ha and Prakash Chockalingam help you take it real time and conquer fast data. Vida and Prakash cover the most common uses cases for streaming, important streaming design patterns, and the best practices for implementing them to achieve maximum throughput and performance of your system using Spark Streaming.
12:05–12:45 Friday, 3/06/2016
Xavier Léauté (Confluent)
Xavier Léauté shares his experience and relates the challenges scaling Metamarkets's real-time processing to over 3 million events per second. Built entirely on open source, the stack performs streaming joins using Kafka and Samza and feeds into Druid, serving 1 million interactive queries per day.
11:15–11:55 Friday, 3/06/2016
Tyler Akidau (Google)
Tyler Akidau offers a whirlwind tour of the conceptual building blocks of massive-scale data processing systems over the last decade, comparing and contrasting systems at Google with popular open source systems in use today.
11:15–11:55 Friday, 3/06/2016
Steven Noels (NGDATA)
Steven Noels explains how to prime the Hadoop ecosystem for real-time data analysis and actionability, examining ways to evolve from batch processing to real-time stream-based processing.
11:15–11:55 Friday, 3/06/2016
Tathagata Das (Databricks)
Tathagata Das explains how Spark 2.x develops the next evolution of Spark Streaming by extending DataFrames and Datasets in Spark to handle streaming data. Streaming Datasets provides a single programming abstraction for batch and streaming data and also brings support for event-time-based processing, out-of-order data, sessionization, and tight integration with nonstreaming data sources.
14:55–15:35 Friday, 3/06/2016
Emil Andreas Siemes (Hortonworks), Stephan Anne (Hortonworks)
The Internet of Things and big data analytics are currently two of the hottest topics in IT. But how do you get started using them? Emil Andreas Siemes and Stephan Anné demonstrate how to use Apache NiFi to ingest, transform, and route sensor data into Hadoop and how to do further predictive analytics.
12:05–12:45 Friday, 3/06/2016
Kenneth Knowles (Google)
Drawing on important real-world use cases, Kenneth Knowles delves into the details of the language- and runner-independent semantics developed for triggers in Apache Beam, demonstrating how the semantics support the use cases as well as all of the above variability in streaming systems. Kenneth then describes some of the particular implementations of those semantics in Google Cloud Dataflow.
14:05–14:45 Thursday, 2/06/2016
Fergal Toomey (Corvil), Pierre Lacave (Corvil Ltd.)
Fergal Toomey and Pierre Lacave demonstrate how to effectively use Spark and Hadoop to reliably analyze data in high-speed trading environments across multiple machines in real time.
16:35–17:15 Thursday, 2/06/2016
Slava Chernyak (Google)
Watermarks are a system for measuring progress and completeness in out-of-order stream processing systems and are used to emit correct results in a timely way. Given the trend toward out-of-order processing in current streaming systems, understanding watermarks is an increasingly important skill. Slava Chernyak explains watermarks and demonstrates how to apply them using real-world cases.
11:15–11:55 Thursday, 2/06/2016
Charles Givre (Deutsche Bank)
In the last few years, auto makers and others have introduced devices to connect cars to the Internet and gather data about the vehicles’ activity, and auto insurers and local governments are just starting to require these devices. Charles Givre gives an overview of the security risks as well as the potential privacy invasions associated with this unique type of data collection.
12:05–12:45 Thursday, 2/06/2016
Gwen Shapira (Confluent), Jeff Holoman (Cloudera)
Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? Gwen Shapira and Jeff Holoman explain how developers and operation teams can work together to build a bulletproof data pipeline with Kafka and pinpoint all the places where data can be lost if you're not careful.
12:05–12:45 Friday, 3/06/2016
Thomas Beer (Continental), Felix Werkmeister (Continental)
Experience tells us a decision is only as good as the information it is based on. The same is true for driving. The better a vehicle knows its surroundings, the better it can support the driver. Information makes vehicles safer, more efficient, and more comfortable. Thomas Beer and Felix Werkmeister explain how Continental exploits big data technologies for building information-driven vehicles.