Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Schedule: IoT & real-time sessions

Add to your personal schedule
9:00am–12:30pm Tuesday, 09/27/2016
Location: 1B 03/04 Level: Beginner
Tags: real-time
Tyler Akidau (Google), Jesse Anderson (Big Data Institute)
Average rating: ****.
(4.50, 6 ratings)
Come learn the basics of stream processing via a guided walkthrough of the most sophisticated and portable stream processing model on the planet—Apache Beam (incubating). Tyler Akidau and Jesse Anderson cover the basics of robust stream processing (windowing, watermarks, and triggers) with the option to execute exercises on top of the runner of your choice—Flink, Spark, or Google Cloud Dataflow. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 09/27/2016
Location: 1 E 06 Level: Intermediate
Tags: real-time
Patrick McFadin (DataStax)
Average rating: *****
(5.00, 1 rating)
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, while users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day with powerful data pipelines built with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 09/27/2016
Location: 1B 03/04 Level: Beginner
Tags: real-time
Ian Wrigley (StreamSets)
Average rating: *****
(5.00, 7 ratings)
Ian Wrigley demonstrates how to leverage the capabilities of Apache Kafka to collect, manage, and process stream data for both big data projects and general-purpose enterprise data integration. Ian covers system architecture, use cases, and how to write applications that publish data to, and subscribe to data from, Kafka—no prior knowledge of Kafka required. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Sridhar Alla (Comcast), Kiran Muglurmath (Comcast)
Average rating: ****.
(4.17, 6 ratings)
Sridhar Alla and Kiran Muglurmath explain how real-time analytics on Comcast Xfinity set-top boxes (STBs) help drive several customer-facing and internal data-science-oriented applications and how Comcast uses Kudu to fill the gaps in batch and real-time storage and computation needs, allowing Comcast to process the high-speed data without the elaborate solutions needed till now. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1 E 12/1 E 13 Level: Beginner
Tags: real-time
Neha Narkhede (Confluent)
Average rating: ****.
(4.92, 12 ratings)
Neha Narkhede explains how Apache Kafka serves as a foundation to streaming data applications that consume and process real-time data streams and introduces Kafka Connect, a system for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library. Neha also describes the lessons companies like LinkedIn learned building massive streaming data architectures. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Slava Chernyak (Google)
Average rating: ****.
(4.60, 10 ratings)
Watermarks are a system for measuring progress and completeness in out-of-order streaming systems and are utilized to emit correct results in a timely manner. Given the trend toward out-of-order processing in existing streaming systems, watermarks are an increasingly important tool when designing streaming pipelines. Slava Chernyak explains watermarks and explores real-world applications. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1 E 12/1 E 13 Level: Advanced
Kenneth Knowles (Google)
Average rating: *****
(5.00, 2 ratings)
Triggers specify when a stage of computation should emit output. With a small language of primitive conditions, triggers provide the flexibility to tailor a streaming pipeline to a variety of use cases and data sources. Kenneth Knowles delves into the details of language- and runner-independent semantics for triggers in Apache Beam and explores real-world implementations in Google Cloud Dataflow. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Tags: real-time, iot
Ira Cohen (Anodot)
Average rating: ****.
(4.00, 5 ratings)
Time series and event data form the basis for real-time insights about the performance of businesses such as ecommerce, the IoT, and web services, but gaining these insights involves designing a learning system that scales to millions and billions of data streams. Ira Cohen outlines a system that performs real-time machine learning and analytics on streams at massive scale. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Tags: iot
Ted Dunning (MapR Technologies)
Average rating: ****.
(4.00, 2 ratings)
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their fair share. Ted Dunning presents a demo of how data streaming can be applied to the analytics problems posed by modern motorsports. Although he won't be bringing Formula 1 cars to the talk, Ted demonstrates a physics-based simulator to analyze realistic data from simulated cars. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: River Pavilion Level: Beginner
Tony Ng (eBay, Inc.)
Average rating: ****.
(4.00, 1 rating)
Enterprises are increasingly demanding real-time analytics and insights. Tony Ng offers an overview of Pulsar, an open source real-time streaming system used at eBay. Tony explains how Pulsar integrates Kafka, Kylin, and Druid to provide flexibility and scalability in event and metrics consumption. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/29/2016
Location: 1 E 12/1 E 13 Level: Beginner
Jim Scott (MapR Technologies)
Average rating: ***..
(3.80, 5 ratings)
Jim Scott outlines the core tenets of a message-driven architecture and explains its importance in real-time big data-enabled distributed systems within the realm of finance. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Tags: real-time
Ewen Cheslack-Postava (Confluent)
Average rating: ***..
(3.33, 3 ratings)
You may have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center. But what if one data center is not enough? Ewen Cheslack-Postava explores resilient multi-data-center architecture with Apache Kafka, sharing best practices for data replication and mirroring as well as disaster scenarios and failure handling. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 3D 12
Karthik Ramasamy (Twitter)
Average rating: ***..
(3.20, 5 ratings)
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Karthik Ramasamy offers an overview of the end-to-end real-time stack Twitter designed in order to meet this challenge, consisting of DistributedLog (the distributed and replicated messaging system) and Heron (the streaming system for real-time computation). Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 3D 08 Level: Non-technical
Brian Kahn (Climate Central), Edward Wisniewski (Radish Lab)
Average rating: ****.
(4.50, 2 ratings)
Radish Lab teamed up with science news nonprofit Climate Central to transform temperature data from 1,001 US cities into a compelling, simple interactive that received more than 1 million views within three days of launch. Alana Range and Brian Kahn offer an overview of the process of creating a viral, interactive data visualization with teams that regularly produce powerful data stories. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Venkatesh Sivasubramanian (GE Digital), Luis Ramos (GE Digital)
Average rating: ***..
(3.50, 2 ratings)
Opportunities in the industrial world are expected to outpace consumer business cases. Time series data is growing exponentially as new machines get connected. Venkatesh Sivasubramanian and Luis Ramos explain how GE makes it faster and easier for systems to access (using a common layer) and perform analytics on a massive volume of time series data by combining Apache Apex, Spark, and Kudu. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 3D 08 Level: Intermediate
Tags: real-time
Kostas Tzoumas (data Artisans)
Average rating: ****.
(4.00, 2 ratings)
Apache Flink has seen incredible growth during the last year, both in development and usage, driven by the fundamental shift from batch to stream processing. Kostas Tzoumas demonstrates how Apache Flink enables real-time decisions, makes infrastructure less complex, and enables extremely efficient, accurate, and fault-tolerant streaming applications. Read more.
Add to your personal schedule
2:55pm–3:35pm Thursday, 09/29/2016
Location: 1 E 12/1 E 13 Level: Intermediate
yaron haviv (iguaz.io)
Average rating: **...
(2.00, 1 rating)
Yaron Haviv explains how to design real-time IoT and FSI applications, leveraging Spark with advanced data frame acceleration. Yaron then presents a detailed, practical use case, diving deep into the architectural paradigm shift that makes the powerful processing of millions of events both efficient and simple to program. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/29/2016
Location: 1 E 12/1 E 13 Level: Intermediate
Moty Fania (Intel)
Moty Fania shares Intel’s IT experience implementing an on-premises IoT platform for internal use cases. The platform was designed as a multitenant platform with built-in analytical capabilities and based on open source big data technologies and containers. Moty highlights the lessons learned from this journey with a thorough review of the platform’s architecture. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, 09/29/2016
Location: 3D 08 Level: Beginner
Roy Ben-Alta (Amazon Web Services)
Average rating: *****
(5.00, 1 rating)
Roy Ben-Alta explores the Amazon Kinesis platform in detail and discusses best practices for scaling your core streaming data ingestion pipeline as well as real-world customer use cases and design pattern integration with Amazon Elasticsearch, AWS Lambda, and Apache Spark. Read more.