Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Hadoop Use Cases conference sessions

Real-world case studies of the Hadoop ecosystem in action, from disruptive startups to industry giants.

Wednesday, September 30

11:20am–12:00pm Wednesday, 09/30/2015
Location: 1 E16 / 1 E17 Level: Intermediate
Greg Rahn (Cloudera)
Average rating: ***..
(3.80, 5 ratings)
The flexibility and simplicity of JSON have made it one of the most common formats for data. Data engines need to be able to load, process, and query JSON and nested data types quickly and efficiently. There are multiple approaches to processing JSON data, each with trade offs. In this session we’ll compare and contrast the approaches taken by systems such as Hive, Drill, BigQuery, and others. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Location: 1 E12/ 1 E13 Level: Non-technical
Melissa Santos (Big Cartel)
Average rating: ***..
(3.14, 14 ratings)
Over the last year, my team has gone from being a Hadoop Infrastructure team that was constantly fixing problems and cleaning up messes, to declaring ourselves to be a Data Platform team, expanding into investigating new tools, teaching coworkers about big data, and consulting with other teams about how to meet their data needs. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Location: 1 E12/ 1 E13 Level: Non-technical
Tags: health
Aaron Kimball (Zymergen, Inc.)
Average rating: ***..
(3.82, 11 ratings)
Zymergen has industrialized the process of genome engineering to build microbes that produce chemicals at scale. High-throughput microbe development is driven by integrating machine learning and open source software for complex data storage, search, and bioinformatics. See how we built this futuristic vision for synthetic biology, and learn how NoSQL can power massive scale experimentation. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Jaipaul Agonus (FINRA)
Average rating: ***..
(3.71, 14 ratings)
This presentation is a real-world case study about moving a large portfolio of batch analytical programs that process 30 billion or more transactions every day, from a proprietary MPP database appliance architecture to the Hadoop ecosystem in the cloud, leveraging Hive, Amazon EMR, and S3. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Arvind Prabhakar (StreamSets)
Average rating: ***..
(3.67, 9 ratings)
Modern data infrastructures operate on vast volumes of continuously produced data generated by independent channels. Enterprises such as consumer banks that have many such channels are starting to implement a single view of customers that can power all customer touchpoints. In this session we present an architectural approach for implementing such a solution using a customer event hub. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Amar Arsikere (infoworks.io)
Average rating: **...
(2.56, 16 ratings)
Enterprise data warehouses have become a large cost center. As their data volumes grow, enterprises want to move their warehouses on to Hadoop. But it is not an easy task. How do you solve this problem? The speakers have designed and deployed large scale data warehouses on Hadoop. In this talk, they will examine the technical underpinnings of their solution with a real-world example. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Alan Choi (Cloudera)
Average rating: ***..
(3.00, 16 ratings)
Many workloads are being migrated from data warehouses to Hadoop; but without a good methodology, the migration process can be challenging. In this talk, we’ll discuss such a methodology in detail: from cluster sizing, to query tuning, to production readiness. Read more.

Thursday, October 1

11:20am–12:00pm Thursday, 10/01/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Haden Land (Lockheed Martin IS&GS), Jason Loveland (Lockheed Martin)
Average rating: ****.
(4.75, 4 ratings)
Lockheed Martin builds unmanned and manned human space systems, which require systems that are tested for all possible conditions – even for unforeseen situations. We present a test system that is a learning system built on big data technologies, that supports the testing of the Orion Multi-Purpose Crew Vehicle being designed for long-duration, human-rated deep space exploration. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Sriranjan Manjunath (Saavn Inc), Rahul Saxena (Saavn)
Average rating: **...
(2.00, 1 rating)
Saavn is the leading music streaming service in the South Asian market. This talk will focus on how we are leveraging data to adapt to very specific demands on the market. We will demonstrate how Hadoop, Kafka, and Storm came together to help us solve some of the challenges. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Location: 1 E12/ 1 E13 Level: Non-technical
Raymond Collins (TE Connectivity), Scott Sokoloff (Orderup)
Average rating: ***..
(3.00, 2 ratings)
Scott and Ray will discuss a real-life use case from a large manufacturing company, where data was produced in remote factories faster than it could be sent through the internet. This session is an interactive discussion around how to resolve the issue of "big data, small internet." Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Jonathan Gray (Cask)
Average rating: ****.
(4.40, 5 ratings)
Hadoop has evolved into a rich collection of technologies that enable a broad range of use cases. However, the technology innovation has outpaced the skills of most developers. The open-source Cask Data Application Platform (CDAP) project was initiated to close this developer gap. In this session, we will show how three different organizations utilized CDAP to deliver solutions on Hadoop. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Location: 1 E12/ 1 E13 Level: Intermediate
Rosaria Silipo (KNIME.com AG)
Average rating: *....
(1.00, 2 ratings)
In this project, we re-engineered a few barely-usable legacy solutions from the past, and made them viable again by exploiting the speed and performance of Hadoop platform-based execution. Read more.