Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks

Chao Sun (Cloudera), Alex Leblang (Cloudera)
2:40pm–3:20pm Wednesday, 03/30/2016
Security

Location: LL21 B
Average rating: ***..
(3.40, 5 ratings)

One of the key values of the Hadoop ecosystem is its flexibility. A myriad of components make up this ecosystem, allowing Hadoop to tackle otherwise intractable problems. However, having so many components provides a significant integration, implementation, and usability burden. Features that ought to work in all the components often require sizable per-component effort to ensure correctness across the stack.

Chao Sun and Alex Leblang explore RecordService, a new solution that provides an API to read data from Hadoop storage managers and return them as canonical records. This eliminates the need for components to support individual file formats, handle security, perform auditing, and implement sophisticated IO scheduling and other common processing that is at the bottom of any computation.

Chao and Alex discuss the architecture of the service and the integration work done for MapReduce and Spark. Many existing applications on those frameworks can take advantage of the service with little to no modification. Chao and Alex demonstrate how this provides fine-grain (column-level and row-level) security through Sentry integration and improves performance for existing MapReduce and Spark applications by up to 5×. They conclude by explaining how this architecture can enable significant future improvements to the Hadoop ecosystem.

Chao Sun

Cloudera

Chao Sun is currently a software engineer at Cloudera working on the RecordService project. Before that, Chao worked on the Hive on Spark project. He holds a PhD in computer science from the University of Wisconsin-Milwaukee, where he focused on type systems and programming languages.

Photo of Alex Leblang

Alex Leblang

Cloudera

Alex Leblang is an engineer at Cloudera on the RecordService team. Previously, Alex was an Apache Impala (incubating) engineer and interned at Vertica. He holds a bachelor’s degree from Brown University with concentrations in computer science and Latin American studies.