Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Hadoop Platform conference sessions

Tuesday, December 1

9:00am–12:30pm Tuesday, 12/01/2015
Location: 321-322 Level: Intermediate
Gwen Shapira (Confluent), Ted Malaska (Capital One), Mark Grover (Lyft), Jonathan Seidman (Cloudera)
Average rating: ****.
(4.16, 19 ratings)
Looking for a deeper understanding of how to architect real-time data processing solutions? This tutorial will provide this understanding using a real-world example of a fraud detection system. We’ll use this example to discuss considerations for building such a system, how you’d integrate various technologies, and why those choices make sense for the use case in question. Read more.

Wednesday, December 2

11:00am–11:40am Wednesday, 12/02/2015
Location: 334-335 Level: Intermediate
Tags: featured
Todd Lipcon (Cloudera)
Average rating: ****.
(4.33, 6 ratings)
This session will investigate the trade-offs between real-time transactional access and fast analytic performance in Hadoop from the perspective of storage engine internals. We will discuss recent advances, evaluate benchmark results from current generation Hadoop technologies, and propose potential ways ahead for the Hadoop ecosystem to conquer its newest set of challenges. Read more.
11:50am–12:30pm Wednesday, 12/02/2015
Location: 334-335 Level: Intermediate
Jun Liu (Intel), Zhaojuan Bian (Intel)
Average rating: ***..
(3.86, 7 ratings)
Based on previous experience, there are many challenges in designing an Impala cluster for production, such as table schema, data placement, file format selection, hardware selection, and software stack parameters tuning. We will walk through a real-world case study in the banking and financial services sector to illustrate how we use our simulator-based approach to design an Impala cluster. Read more.
1:30pm–2:10pm Wednesday, 12/02/2015
Location: 334-335 Level: Intermediate
Heesun Won (ETRI), Minh Chau Nguyen (ETRI)
Average rating: ***..
(3.50, 4 ratings)
This session will address how one single Hadoop cluster can be built across many geographically distributed data centers to provide multitenant analytics services. We extend the overall architecture of Hadoop so that multiple tenants can securely access, share, and analyze data in their own isolated executing environments. Read more.
2:20pm–3:00pm Wednesday, 12/02/2015
Location: 334-335 Level: Intermediate
Jairam Ranganathan (Cloudera)
Average rating: ***..
(3.50, 2 ratings)
Apache Hadoop was designed when cloud models were in their infancy. Despite this fact, Hadoop has proven remarkably adept at migrating its architecture to work well in the context of the cloud, as production workloads migrate to a cloud environment. This talk will have cover several topics on adapting Hadoop to the cloud. Read more.
4:00pm–4:40pm Wednesday, 12/02/2015
Location: 334-335 Level: Intermediate
Guy Harrison (Dell Software)
When people think of big data processing, they think of Apache Hadoop, but that doesn't mean traditional databases don't play a role. In most cases users will still draw from data stored in RDBMS systems. Apache Sqoop can be used to unlock that data and transfer it to Hadoop, enabling users with information stored in existing SQL tables to use new analytic tools. Read more.
4:50pm–5:30pm Wednesday, 12/02/2015
Location: 334-335 Level: Intermediate
Jim Scott (NVIDIA)
Average rating: ****.
(4.00, 4 ratings)
Application developers have long created complex schemas to handle storing with minor relationships in an RDBMS. This talk will show how to convert an existing (complicated schema) music database to HBase for transactional workloads, plus how to use Drill against HBase for real-time queries. HBase column families will also be discussed. Read more.