Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY
Henry Robinson

Henry Robinson
Software Engineer, Cloudera

Website | @HenryR

Henry Robinson is a software engineer at Cloudera. For the past few years, he has worked on Apache Impala, an SQL query engine for data stored in Apache Hadoop, and leads the scalability effort to bring Impala to clusters of thousands of nodes. Henry’s main interest is in distributed systems. He is a PMC member for the Apache ZooKeeper, Apache Flume, and Apache Impala open source projects.

Sessions

11:20am–12:00pm Thursday, 10/01/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Intermediate
Henry Robinson (Cloudera), Zuo Wang (Wanda), Arthur Peng (Intel)
Average rating: ***..
(3.71, 7 ratings)
Columnar data formats such as Apache Parquet promise much in terms of performance, but need help from modern CPUs to fully realize all the benefits. In this talk we'll show how the combination of the newest SIMD instruction sets, and an open-source columnar file format, can provide an enormous performance advantage. Our example system will be Impala, Parquet, and Intel's AVX2 instruction set. Read more.