Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

IoT and Real-time conference sessions

Tuesday, March 29

Add to your personal schedule
9:00am–12:30pm Tuesday, 03/29/2016
Location: LL21 A
Tags: real-time
Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent), Joseph Adler (Confluent), Ian Wrigley (StreamSets)
Average rating: ***..
(3.90, 21 ratings)
Ewen Cheslack-Postava, Joseph Adler, Jesse Anderson, and Ian Wrigley show how to use Apache Kafka to collect, manage, and process stream data for big data projects and general purpose enterprise data-integration needs alike. Once your data is captured in real time and available as real-time subscriptions, you can start to compute new datasets in real-time from these original feeds. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/29/2016
Location: LL21 A
Tags: real-time
Patrick McFadin (DataStax)
Average rating: ****.
(4.07, 14 ratings)
Patrick McFadin gives a comprehensive overview of the powerful Team Apache: Apache Kafka, Spark, and Cassandra. Patrick demonstrates data models, covers deployment considerations, and explains code for different requirements. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/29/2016
Location: 210 A/E
Tags: real-time
Joseph Adler (Confluent), Ewen Cheslack-Postava (Confluent), Ian Wrigley (StreamSets)
Average rating: ***..
(3.06, 32 ratings)
Joseph Adler, Ewen Cheslack, and Ian Wrigley demonstrate the features of Apache Kafka that make it easy to build fast, secure, and reliable data pipelines and explain how to use Copycat, Kafka Streams, and Kafka Security as they coach you through building a working enterprise data pipeline. Read more.

Wednesday, March 30

Add to your personal schedule
11:00am–11:40am Wednesday, 03/30/2016
Location: 210 C/G
Tags: real-time
Jay Kreps (Confluent)
Average rating: ****.
(4.17, 24 ratings)
The world is moving to real-time data, and much of that data flows through Apache Kafka. Jay Kreps explores how Kafka forms the basis for our modern stream-processing architecture. He covers some of the pros and cons of different frameworks and approaches and discusses the recent APIs Kafka has added to allow direct stream processing of Kafka data. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/30/2016
Location: 210 C/G
Tags: real-time
Ted Dunning (MapR Technologies)
Average rating: ***..
(3.78, 9 ratings)
Application messaging isn’t new—solutions include IBM MQ, RabbitMQ, and ActiveMQ. Apache Kafka is a high-performance, high-scalability alternative that integrates well with Hadoop. Can modern distributed messaging systems like Kafka be considered a legacy replacement or is it purely complementary? Ted Dunning outlines Kafka's architectural benefits and tradeoffs to find the answer. Read more.
Add to your personal schedule
1:50pm–2:30pm Wednesday, 03/30/2016
Location: 210 C/G
Moty Fania (Intel)
Average rating: ***..
(3.20, 5 ratings)
Moty Fania shares Intel’s IT experience implementing an on-premises big data IoT platform for internal use cases. This unique platform was built on top of several open source technologies and enables highly scalable stream analytics with a stack of algorithms such as multisensor change detection, anomaly detection, and more. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/30/2016
Location: 210 C/G
Tags: iot
Brandon Rohrer (Microsoft)
Average rating: ****.
(4.00, 11 ratings)
Modern houses and robots have a lot in common. Both have a lot of sensors and have to make a lot of decisions. However, unlike houses, robots adapt and perform helpful tasks. Brandon Rohrer details an algorithm specifically designed to help houses, buildings, roads, and stores learn to actively help the people that use them. Read more.

Thursday, March 31

Add to your personal schedule
11:00am–11:40am Thursday, 03/31/2016
Location: 210 C/G
Tags: real-time
Ted Malaska (Blizzard Entertainment), Jeff Holoman (Cloudera)
Average rating: ****.
(4.50, 10 ratings)
Ted Malaska and Jeff Holoman explain how to go from zero to full-on time series and mutable-profile systems in 40 minutes. Ted and Jeff cover code examples of ingestion from Kafka and Spark Streaming and access through SQL, Spark, and Spark SQL to explore the underlying theories and design patterns that will be common for most solutions with Kudu. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/31/2016
Location: 210 D/H
Tags: real-time
Joey Echeverria (Rocana)
Average rating: *****
(5.00, 2 ratings)
Real-time analysis starts with transforming raw data into structured records. Typically this is done with bespoke business logic custom written for each use case. Joey Echeverria presents a configuration-based, reusable library for data transformation that can be embedded in real-time stream-processing systems and demonstrates its real-world use cases with Apache Kafka and Apache Hadoop. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/31/2016
Location: 210 C/G
Sean Murphy (PingThings)
Average rating: ***..
(3.20, 5 ratings)
Sean Murphy demonstrates how and why the power grid and other legacy industrials built on traditional engineering will be transformed from deterministic machines described by mathematical equations to probabilistic systems requiring streaming data and analytics. Sean demonstrates how to take an agile approach to the scientific method with big data and fuse the two approaches. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/31/2016
Location: 230 C
Tags: real-time, iot
Karthik Ramasamy (Twitter)
Average rating: *****
(5.00, 1 rating)
Heron, Twitter's streaming system, has been in production nearly two years and is widely used by several teams for diverse use cases. Karthik Ramasamy discusses Twitter's operating experiences and shares the challenges of running Heron at scale as well as the approaches that Twitter took to solve them. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/31/2016
Location: 210 D/H
Tags: real-time
Kostas Tzoumas (data Artisans)
Average rating: ****.
(4.41, 17 ratings)
Apache Flink is a full-featured streaming framework with high throughput, millisecond latency, strong consistency, support for out-of-order streams, and support for batch as a special case of streaming. Kostas Tzoumas gives an overview of Flink and its streaming-first philosophy, as well as the project roadmap and vision: fully unifying the worlds of “batch” and “streaming” analytics. Read more.
Add to your personal schedule
4:20pm–5:00pm Thursday, 03/31/2016
Location: 210 C/G
Tags: real-time
Tony Ng (eBay, Inc.)
Average rating: ****.
(4.11, 9 ratings)
Enterprises are increasingly demanding real-time analytics and insights. Tony Ng offers an overview of Pulsar, an open source real-time streaming system used at eBay, which can scale to millions of events per second with 4GL SQL-like language support. Tony explains how Pulsar integrates Kafka, Kylin, and Druid to provide flexibility and scalability in event and metrics consumption. Read more.
Add to your personal schedule
4:20pm–5:00pm Thursday, 03/31/2016
Location: 210 D/H
Tags: real-time
Jim Scott (MapR Technologies)
Average rating: ****.
(4.67, 3 ratings)
The Zeta Architecture is an enterprise architecture to move beyond the data lake. The most logical way to scale applications across tiers is to put a messaging platform in between the tiers, which allows a far simpler ability to scale the communications of applications. Jim Scott covers the benefits of this model and offers an example of data-center monitoring. Read more.