Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Hadoop Internals & Development conference sessions

A deep dive into the dominant big data stack, with practical lessons, integration tricks, and glimpse of the road ahead.

Tuesday, September 29

9:00am–12:30pm Tuesday, 09/29/2015
Location: 3D 02/11 Level: Intermediate
Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Mark Grover (Lyft)
Average rating: ***..
(3.72, 29 ratings)
Looking for a deeper understanding of how to architect real-time data processing solutions? Then this tutorial is for you. In Part 1 of "Architecture Day," We will build a fraud-detection system, and use it as an example to discuss considerations for building such a system; how you’d integrate various technologies; and why those choices make sense for the use case in question. Read more.

Wednesday, September 30

1:15pm–1:55pm Wednesday, 09/30/2015
Location: 1 E16 / 1 E17 Level: Intermediate
Lenni Kuff (Facebook), Nong Li (Cloudera), Stephen Romanoff (Capital One )
Average rating: ****.
(4.05, 21 ratings)
Hadoop is supremely flexible, but with that flexibility comes integration challenges. In this talk, we introduce a new service that eliminates the need for components to support individual file formats, handle security, perform auditing, and implement sophisticated IO scheduling and other common processing that is at the bottom of any computation. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Location: 1 E16 / 1 E17 Level: Intermediate
Todd Lipcon (Cloudera)
Average rating: ***..
(3.44, 18 ratings)
This session will investigate the trade-offs between real-time transactional access and fast analytic performance in Hadoop, from the perspective of storage engine internals. We will discuss recent advances, evaluate benchmark results from current generation Hadoop technologies, and propose potential ways ahead for the Hadoop ecosystem to conquer its newest set of challenges. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Location: 1 E16 / 1 E17 Level: Advanced
Zhe Zhang (LinkedIn), Weihua Jiang (Intel)
Average rating: ****.
(4.29, 7 ratings)
In this session, attendees will learn how erasure coding (HDFS-7285) can greatly reduce the storage overhead of HDFS without sacrificing data reliability. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Location: 1 E16 / 1 E17 Level: Advanced
Alan Gates (Hortonworks)
Average rating: ***..
(3.64, 11 ratings)
Hadoop gives the ability to keep all data together for shared use and analysis. People use Apache HBase for fast updates and low latency data access and Apache Hive for analytics. To improve sharing of this data, users need to be able to access their transactional and analytic data through one tool. This talk will cover work in the Hive, HBase, and Phoenix communities to deliver on this promise. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Location: 1 E16 / 1 E17 Level: Intermediate
Monte Zweben (Splice Machine Inc.), John Leach (Splice Machine)
Average rating: ****.
(4.00, 8 ratings)
Even after 25 years, the TPC-C benchmark still sets the standard for online transaction processing (OLTP) database benchmarking. It has traditionally been the arena for RDBMSs like Oracle Database, IBM DB2, and Microsoft SQL Server to do battle. Now, for the first time, a Hadoop database has successfully completed TPC-C benchmarks. Can it change the equation for OLTP workload price/performance? Read more.

Thursday, October 1

11:20am–12:00pm Thursday, 10/01/2015
Location: 1 E16 / 1 E17 Level: Intermediate
Henry Robinson (Cloudera), Zuo Wang (Wanda), Arthur Peng (Intel)
Average rating: ***..
(3.71, 7 ratings)
Columnar data formats such as Apache Parquet promise much in terms of performance, but need help from modern CPUs to fully realize all the benefits. In this talk we'll show how the combination of the newest SIMD instruction sets, and an open-source columnar file format, can provide an enormous performance advantage. Our example system will be Impala, Parquet, and Intel's AVX2 instruction set. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Location: 1 E16 / 1 E17 Level: Advanced
Thomas Phelan (HPE BlueData)
Average rating: ****.
(4.25, 12 ratings)
This session will delve into the multiple different meanings of "virtualized HDFS." It will lead an investigation into the abstraction of the HDFS protocol in order to permit any storage device to deliver data to a Hadoop application in a performance critical environment. It will include a discussion and assessment of the work in this area done by projects such as Tachyon and MemHDFS. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Location: 1 E16 / 1 E17 Level: Intermediate
Ravi Prakash (Altiscale)
Average rating: ***..
(3.83, 6 ratings)
The HDFS File Browser now has improved accessibility and is easier to use! Hadoop 2.4.0 introduced a new UI for file browsing with WebHDFS. This feature set has been expanded to include write operations and file uploads. Authentication issues have been addressed and the file browser is now configured with HttpFS. We'll present a demonstration and overview of possible configuration requirements. Read more.