Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Schedule: Real-time applications sessions

2:40pm3:20pm Wednesday, March 15, 2017
Location: LL20 A Level: Intermediate
Secondary topics:  Architecture, Media, Streaming
Kartik Paramasivam (LinkedIn)
Average rating: *****
(5.00, 2 ratings)
LinkedIn has one of the largest Kafka installations in the world, ingesting more than a trillion messages per day. Apache Samza-based stream processing applications process this deluge of data. Kartik Paramasivam discusses key improvements and architectural patterns that LinkedIn has adopted in its data systems in order to process millions of requests per second while keeping costs in control. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: LL20 D Level: Intermediate
Secondary topics:  Healthcare
Joseph Blue (MapR), ed00425e 963b0803 (MapR Technologies)
Average rating: ****.
(4.50, 2 ratings)
Joseph Blue and Carol Mcdonald walk you through a reference application that processes ECG data encoding using HL7 with a modern anomaly detector, demonstrating how combining visualization and alerting enables healthcare professionals to improve outcomes and reduce costs and sharing lessons learned from their experience dealing with real data in real medical situations. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: LL20 A Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Sridhar Alla (BlueWhale), Shekhar Agrawal (Comcast)
Average rating: *****
(5.00, 2 ratings)
Sridhar Alla and Shekhar Agrawal explain how Comcast built the largest Kudu cluster in the world (scaling to PBs of storage) and explore the new kinds of analytics being performed there, including real-time processing of 1 trillion events and joining multiple reference datasets on demand. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: LL20 D Level: Beginner
Secondary topics:  Architecture, IoT, Manufacturing, Platform, Streaming
Kishore R (GE)
Average rating: ***..
(3.00, 1 rating)
Kishore Reddipalli explores how to stream data at a large scale from the edge to the cloud to the client, detect anomalies, analyze machine data in stream and rest in an industrial world, and optimize the industrial operations by providing real-time insights and recommendations using big data technologies. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: 210 C/G Level: Intermediate
Secondary topics:  Deep learning, Streaming
Shivnath Babu (Duke University | Unravel Data Systems)
Average rating: ***..
(3.33, 3 ratings)
Shivnath Babu offers an introduction to using deep learning to solve complex problems in IT operations analytics. Shivnath focuses on how deep learning can derive operations insights automatically for the complex big data application stack composed of systems such as Hadoop, Spark, Cassandra, Elasticsearch, and Impala, using examples of open source tools for deep learning. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: LL20 C
Secondary topics:  Media, Streaming
Sijie Guo (StreamNative)
Average rating: **...
(2.00, 2 ratings)
Apache DistributedLog (incubating) is a low-latency, high-throughput replicated log service. Sijie Guo shares how Twitter has used DistributedLog as the real-time data foundation in production for years, supporting services like distributed databases, pub-sub messaging, and real-time stream computing and delivering more than 1.5 trillion (17 PB) events per day. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: LL20 D Level: Intermediate
Secondary topics:  IoT, Streaming
Michael Freedman (TimescaleDB)
Average rating: *****
(5.00, 3 ratings)
IoT applications often need more-complex queries than those supported by traditional time series databases. Michael Freedman outlines a new distributed time series database for such workloads, supporting efficient queries, including complex predicates across many metrics, while scaling out to support IoT ingest rates. Read more.
11:00am11:40am Thursday, March 16, 2017
Location: LL20 A Level: Intermediate
Secondary topics:  Data Platform
Tony Xing (Microsoft)
Average rating: ***..
(3.00, 2 ratings)
Tony Xing offers an overview of Microsoft's common anomaly detection platform, an API service built internally to provide product teams the flexibility to plug in any anomaly detection algorithms to fit their own signal types. Read more.
2:40pm3:20pm Thursday, March 16, 2017
Location: LL20 D Level: Beginner
Manny Puentes (Rebel AI)
Average rating: ***..
(3.00, 2 ratings)
In 2016, digital advertising overtook TV in spend, requiring companies to cut through the noise to reach their audience. Manny Puentes explains how Rebel AI decides which ads to serve across devices and how it delivers multidimension reporting in milliseconds. Read more.
2:40pm3:20pm Thursday, March 16, 2017
Location: 210 A/E Level: Advanced
Secondary topics:  Financial services, Hardcore Data Science, IoT, Streaming
Jeffrey Yau (Silicon Valley Data Science)
Average rating: ***..
(3.20, 5 ratings)
Thanks to frameworks such as Spark's GraphX and GraphFrames, graph-based techniques are increasingly applicable to anomaly, outlier, and event detection in time series. Jeffrey Yau offers an overview of applying graph-based techniques in fraud detection, IoT processing, and financial data and outlines the benefits of graphs relative to other techniques. Read more.