Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks

Alex Leblang (Cloudera)
11:15–11:55 Friday, 3/06/2016
Location: Capital Suite 15/16 Level: Intermediate
Average rating: ***..
(3.33, 3 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of Hadoop security concepts.


One of the key values of the Hadoop ecosystem is its flexibility. There are a myriad of components that make up this ecosystem, allowing Hadoop to tackle otherwise intractable problems. However, having so many components provides a significant integration, implementation, and usability burden. Features that ought to work in all the components often require sizable per-component effort to ensure correctness across the stack.

Alex Leblang introduces RecordService, a new solution to address this problem. The service provides an API to read data from Hadoop storage managers and return them as canonical records. This eliminates the need for components to support individual file formats, handle security, perform auditing, and implement sophisticated I/O scheduling and other common processing at the bottom of any computation.

Alex discusses the architecture of the service and the integration work done for MapReduce and Spark. Many existing applications on those frameworks can take advantage of the service with little to no modification. Alex demonstrates how this provides fine-grain (column-level and row-level) security through Sentry integration and improves performance for existing MapReduce and Spark applications by up to 5x and ends by exploring how this architecture can enable significant future improvements to the Hadoop ecosystem.

Photo of Alex Leblang

Alex Leblang


Alex Leblang is an engineer at Cloudera on the RecordService team. Previously, Alex was an Apache Impala (incubating) engineer and interned at Vertica. He holds a bachelor’s degree from Brown University with concentrations in computer science and Latin American studies.