Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Schedule: Hadoop use cases sessions

1:30pm–5:00pm Tuesday, 09/27/2016
Location: Hall 1C Level: Intermediate
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Ted Malaska (Capital One)
Average rating: ****.
(4.08, 13 ratings)
Jonathan Seidman, Gwen Shapira, Mark Grover, and Ted Malaska demonstrate how to architect a modern, real-time big data platform and explain how to leverage components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics such as real-time ETL, change data capture, and machine learning. Read more.
11:20am–12:00pm Wednesday, 09/28/2016
Location: 3D 08 Level: Intermediate
Navdeep Alam (IMS Health)
Average rating: ****.
(4.50, 12 ratings)
The need to find efficiencies in healthcare is becoming paramount as our society and the global population continue to grow and live longer. Navdeep Alam shares his experience and reviews current and emerging technologies in the marketplace that handle working with unbounded, de-identified patient datasets in the billions of rows in an efficient and scalable way. Read more.
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 3D 08 Level: Beginner
Jun Liu (Intel), Zhaojuan Bian (Intel)
Average rating: **...
(2.00, 2 ratings)
Many challenges exist in designing an SQL-on-Hadoop cluster for production in a multiuser environment with heterogeneous and concurrent query workloads. Jun Liu and Zhaojuan Bian draw on their personal experience to address these challenges, explaining how to determine the right size of your cluster with different combinations of hardware and software resources using a simulation-based approach. Read more.
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 3D 08 Level: Beginner
Marcel Kornacker (Cloudera), Todd Lipcon (Cloudera)
Average rating: ****.
(4.50, 8 ratings)
Todd Lipcon and Marcel Kornacker explain how to simplify Hadoop-based data-centric applications with the CRUD (create, read, update, and delete) and interactive analytic functionality of Apache Impala (incubating) and Apache Kudu (incubating). Read more.
2:55pm–3:35pm Wednesday, 09/28/2016
Location: Hall 1B Level: Beginner
Praveen Murugesan (Uber Technologies Inc)
Average rating: ***..
(3.67, 12 ratings)
Praveen Murugesan explains how Uber leverages Hadoop and Spark as the cornerstones of its data infrastructure. Praveen details the current data architecture at Uber and outlines some of the unique challenges with data processing Uber faced as well as its approach to solving some key issues in order to continue to power Uber's real-time marketplace. Read more.
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 3D 08 Level: Beginner
Kaushik Deka (Novantas), Phil Jarymiszyn (Novantas)
Kaushik Deka and Phil Jarymiszyn discuss the benefits of a Spark-based feature store, a library of reusable features that allows data scientists to solve business problems across the enterprise. Kaushik and Phil outline three challenges they faced—semantic data integration within a data lake, high-performance feature engineering, and metadata governance—and explain how they overcame them. Read more.
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 3D 08 Level: Intermediate
Jasjeet Thind (Zillow)
Average rating: ***..
(3.75, 8 ratings)
Zillow pioneered providing access to unprecedented information about the housing market. Long gone are the days when you needed an agent to get comparables and prior sale and listing data. And with more data, data science has enabled more use cases. Jasjeet Thind explains how Zillow uses Spark and machine learning to transform real estate. Read more.
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 3D 08 Level: Non-technical
Bas Geerdink (Aizonic)
Average rating: ****.
(4.67, 3 ratings)
Bas Geerdink offers an overview of the evolution that the Hadoop ecosystem has taken at ING. Since 2013, ING has invested heavily in a central data lake and data management practice. Bas shares historical lessons and best practices for enterprises that are incorporating Hadoop into their infrastructure landscape. Read more.