Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Schedule: Hadoop platform and applications sessions

A deep dive into the dominant big data stack, with practical lessons, integration tricks, and glimpse of the road ahead.

9:00am5:00pm Tuesday, March 14, 2017
Location: LL20 A
Barbara Eckman (Comcast), Dirk Jungnickel (Emirates Integrated Telecommunications Company (du)), Kishore Papineni (Astellas Pharma), Paul Barth (Podium Data), Carlo Torniai (Pirelli Tyre), Bryan Harrison (American Express), Chris Murphy (Zurich Insurance Group), Martin Lidl (Deloitte), Maura Lynch (Pinterest), Nixon Patel (Kovid Group), Bas Geerdink (Aizonic), Robin Li (Tapjoy), Yohan Chin (Tapjoy), Jim Harrold (NationBuilder), Lana Novikova (Heartbeat AI Technologies)
In a series of 12 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further. Read more.
1:30pm5:00pm Tuesday, March 14, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture
Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Mark Grover (Lyft), Gwen Shapira (Confluent)
Average rating: ****.
(4.17, 6 ratings)
Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, Mark Grover, and Gwen Shapira explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
11:00am11:40am Wednesday, March 15, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture
Todd Lipcon (Cloudera)
Average rating: ****.
(4.75, 4 ratings)
Todd Lipcon offers a very brief refresher on the goals and feature set of the Kudu storage engine, covering the development that has taken place over the last year, including new features such as improved support for time series workloads, performance improvements, Spark integration, and highly available replicated masters. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Location: LL21 E/F Level: Beginner
Sean Suchter (Pepperdata), Shekhar Gupta (Pepperdata)
Sean Suchter and Shekhar Gupta describe the use of very fine-grained performance data from many Hadoop clusters to build a model predicting excessive swapping events. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture
Daniel Templeton (Cloudera)
Average rating: ****.
(4.00, 4 ratings)
Docker makes it easy to bundle an application with its dependencies and provide full isolation, and YARN now supports Docker as an execution engine for submitted applications. Daniel Templeton explains how YARN's Docker support works, why you'd want to use it, and when you shouldn't. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture, Cloud
Dwai Lahiri (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
Dwai Lahiri explains how to leverage private cloud infrastructure to successfully build Hadoop clusters and outlines dos, don'ts, and gotchas for running Hadoop on private clouds. Read more.
11:00am11:40am Thursday, March 16, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture, Streaming
Todd Lipcon (Cloudera), Marcel Kornacker (Cloudera)
Average rating: ****.
(4.00, 1 rating)
Todd Lipcon and Marcel Kornacker offer an introduction to using Impala and Kudu to power your real-time data-centric applications for use cases like time series analysis (fraud detection, stream market data), machine data analytics, and online reporting. Read more.
11:50am12:30pm Thursday, March 16, 2017
Location: LL21 E/F Level: Intermediate
Secondary topics:  Architecture
Yang Li (Kyligence)
Average rating: *****
(5.00, 3 ratings)
Apache Kylin, which started as a big data OLAP engine, is reaching its v2.0. Yang Li explains how, armed with snowflake schema support, a full SQL interface, spark cubing, and the ability to consume real-time streaming data, Apache Kylin is closing the gap to becoming a real-time data warehouse. Read more.