Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Media conference sessions

11:00am–11:40am Wednesday, 03/30/2016
Chris Sanden (Netflix), Christopher Colburn (Netflix)
Chris Sanden and Christopher Colburn outline a shared infrastructure for doing anomaly detection. Chris and Christopher explain how their solution addresses both real-time and batch use cases and offer a framework for performance evaluation.
11:00am–11:40am Wednesday, 03/30/2016
Eric Tschetter (Yahoo)
Yahoo uses Druid to provide visibility into the actions of its billions of users and developed a new type of sketch called a Theta Sketch to enable this analysis. Eric Tschetter discusses how Yahoo leverages Druid and Theta Sketches together to enable user-level understanding of their billions of users.
4:20pm–5:00pm Wednesday, 03/30/2016
Jonathan King (Ericsson)
Jonathan King outlines ethical best practices for big data and explores the difficult questions emerging from missteps that have caused public outcry, as well as the legal, ethical, and regulatory frameworks that are just beginning to take shape around big data.
11:00am–11:40am Wednesday, 03/30/2016
Ram Shankar Siva Kumar (Microsoft (Azure Security Data Science)), Cody Rioux (Netflix (Real-time Analytics))
In the era of large-volume security applications, false positives, as Gartner says, can make the difference between building an "indicator machine" and an "answering machine." Ram Shankar and Cody Rioux explore how to suppress false positives in security monitoring systems through use cases from Microsoft and Netflix.
11:50am–12:30pm Thursday, 03/31/2016
Sumeet Singh (Yahoo), Mridul Jain (Yahoo)
Building a real-time monitoring service that handles millions of custom events per second while satisfying complex rules, varied throughput requirements, and numerous dimensions simultaneously is a complex endeavor. Sumeet Singh and Mridul Jain explain how Yahoo approached these challenges with Apache Storm Trident, Kafka, HBase, and OpenTSDB and discuss the lessons learned along the way.
1:50pm–2:30pm Thursday, 03/31/2016
Roopa Tangirala (Netflix)
Roopa Tangirala details Netflix's migration from Oracle to Cassandra, covering the problems encountered, what worked and what didn't, and lessons learned along the way.
11:00am–11:40am Thursday, 03/31/2016
Daniel Weeks (Netflix)
Netflix is exploring new avenues for data processing where traditional approaches fail to scale. Daniel Weeks explains how Netflix has enhanced its 25+ petabyte warehouse by combining Parquet's features with Presto and Spark to boost both ETL and interactive queries. Daniel explores how these approaches offer new ways to look at the relationship between storage and compute.
2:40pm–3:20pm Thursday, 03/31/2016
Christopher Berry (Canadian Broadcasting Corporation)
The Canadian Broadcasting Corporation broadcasts a lot of digital content. And Canadians create a huge amount of data about that content. So how does a public broadcaster, of all entities, broadcast its data exhaust? Christopher Berry details the CBC's early experiments with importing a variant of the lean startup into a 79-year-old institution.