Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Production Ready Hadoop conference sessions

How is Big Data finding its way into big organizations? What are enterprises doing to embrace data science, machine learning, and the decision-making power of analytics? How are CIOs making deployment decisions in the face of new technology? In this track, we look at how enterprises are making the move from legacy data stores to big data, and the best practices—and roadblocks—to becoming a data-driven organization.

Tuesday, September 29

9:00am–9:15am Tuesday, 09/29/2015
Location: 1 E14/1 E15 Level: Intermediate
Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera)
Average rating: ***..
(3.73, 15 ratings)
In this full-day tutorial, attendees will get an overview of all phases for successfully managing Apache Hadoop clusters, with an emphasis on production systems — from installation, to configuration management, service monitoring, troubleshooting, and support integration. Read more.
1:30pm–5:00pm Tuesday, 09/29/2015
Location: 3D 02/11 Level: Intermediate
Tom White (Cloudera), Ryan Blue (Cloudera)
Average rating: ****.
(4.40, 5 ratings)
In the second (afternoon) half of the Architecture Day tutorial, attendees will build a data application from the ground up. As a part of the tutorial, we will demonstrate how Kite codifies the best practices from the Hadoop Architecture Day morning session. Read more.

Wednesday, September 30

11:20am–12:00pm Wednesday, 09/30/2015
Location: 3D 05/08 Level: Intermediate
Jairam Ranganathan (Cloudera)
Average rating: ***..
(3.77, 13 ratings)
Apache Hadoop was designed when cloud models were in their infancy. Despite this fact, Hadoop has proven remarkably adept at migrating its architecture to work well in the context of the cloud, as production workloads migrate to a cloud environment. This talk will cover several topics on adapting Hadoop to the cloud. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Location: 3D 05/08 Level: Intermediate
Jonathan Hsieh (Cloudera, Inc), Dima Spivak (StreamSets)
Average rating: ***..
(3.79, 14 ratings)
With the number of production Apache HBase clusters increasing, there is greater demand for running multiple applications on single clusters, for data reliability and availability, and for developers to better test their applications. We’ll lay out how these new demands can be addressed using multi-tenant, multi-cluster, or multi-container deployments, including the use of Docker. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Location: 3D 05/08 Level: Intermediate
Siwei Zhu (Scribd), Kevin Perko (Scribd)
Average rating: ***..
(3.17, 12 ratings)
With the explosion of big data open source technologies, companies can now build a powerful data warehouse. But as they reach scale, they’ll find that patching together numerous projects requires building their own tools to manage the data pipeline. In this presentation we will talk about the tools you’ll likely need to build in-house to make your data infrastructure manageable. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Location: 3D 05/08 Level: Intermediate
Michael Segel (Segel & Associates.)
Average rating: ***..
(3.62, 8 ratings)
Today's Hadoop Cluster now has multiple single points of failures. This talk focuses on identifying these failings and how to mitigate them. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Location: 3D 05/08 Level: Intermediate
Prat Moghe (Cazena)
Average rating: ****.
(4.50, 2 ratings)
Hadoop’s ability to handle large amounts of varied data has been a driving force behind the explosion of big data. Many organizations’ ambitions to become more data-driven, however, are held back by a shortage of resources as well as the time and expense needed to purchase and set up hardware and software infrastructure. The cloud offers a natural alternative to overcome these barriers. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Location: 3D 05/08 Level: Intermediate
Ted Dunning (MapR)
Average rating: ***..
(3.70, 10 ratings)
I will deconstruct a real-world database schema into the corresponding NoSQL design. Along the way, we will see how the number of tables drops by nearly 5x and the ease of understanding the design increases by a similar degree. In spite of radical changes, the resulting denormalized and nested data can still be queried with SQL by using Apache Drill. These methods are practical and easy to apply. Read more.