Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Schedule: Hadoop internals & development sessions

9:00–12:30 Wednesday, 1/06/2016
Location: Capital Suite 13 Level: Intermediate
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Gwen Shapira (Confluent), Ted Malaska (Capital One)
Average rating: ***..
(3.50, 6 ratings)
Jonathan Seidman, Mark Grover, Gwen Shapira, and Ted Malaska walk attendees through an end-to-end case study of building a fraud detection system, providing a concrete example of how to architect and implement real-time systems. Read more.
11:15–11:55 Thursday, 2/06/2016
Location: Capital Suite 10/11
Doug Cutting (Cloudera), Ben Lorica (O'Reilly Media), Tom White (Cloudera)
Average rating: **...
(2.43, 14 ratings)
Ben Lorica hosts a conversation with Hadoop cofounder Doug Cutting and Tom White, an early user and committer of Apache Hadoop. Read more.
11:15–11:55 Thursday, 2/06/2016
Location: Capital Suite 15/16 Level: Intermediate
Tags: real-time
Todd Lipcon (Cloudera)
Average rating: ****.
(4.42, 12 ratings)
Todd Lipcon investigates the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals and offers an overview of Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API. Read more.
12:05–12:45 Thursday, 2/06/2016
Location: Capital Suite 15/16 Level: Intermediate
Ruhollah Farchtchi (Zoomdata)
Average rating: ***..
(3.78, 9 ratings)
Ruhollah Farchtchi explores best practices for building systems that support ad hoc queries over real-time data and offers an overview of Kudu, a new storage layer for Hadoop that is specifically designed for use cases that require fast analytics on rapidly changing data with a simultaneous combination of sequential and random reads and writes. Read more.
14:05–14:45 Thursday, 2/06/2016
Location: Capital Suite 13 Level: Advanced
Bikas Saha (Hortonworks Inc)
Average rating: ***..
(3.00, 6 ratings)
Hadoop is used to run large-scale jobs over hundreds of machines. Considering the complexity of Hadoop jobs, it's no wonder that Hadoop jobs running slower than expected remains a perennial source of grief for developers. Bikas Saha draws on his experience debugging and analyzing Hadoop jobs to describe the approaches and tools that can solve this difficult problem. Read more.
14:55–15:35 Thursday, 2/06/2016
Location: Capital Suite 15/16 Level: Non-technical
Carl Steinbach (LinkedIn)
Average rating: ***..
(3.71, 7 ratings)
Carl Steinbach offers an overview of Dali, LinkedIn's collection of libraries, services, and development tools that are united by the common goal of providing a dataset API for Hadoop. Read more.
17:25–18:05 Thursday, 2/06/2016
Location: Capital Suite 15/16 Level: Intermediate
Chad Metcalf (Docker), Seshadri Mahalingam (Trifacta)
Average rating: ***..
(3.67, 6 ratings)
Developers of big data applications face a unique challenge testing their software against a diverse ecosystem of data platforms that can be complex and resource intensive to deploy. Chad Metcalf and Seshadri Mahalingam explain why Docker offers a simpler model for systems by encapsulating complex dependencies and making deployment onto servers dynamic and lightweight. Read more.