Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Scaling out to 10 clusters, 1,000 users, and 10,000 flows: The Dali experience at LinkedIn

Carl Steinbach (LinkedIn)
14:55–15:35 Thursday, 2/06/2016
Hadoop internals & development
Location: Capital Suite 15/16 Level: Non-technical
Average rating: ***..
(3.71, 7 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of Hadoop.


Over the past couple of years, we’ve seen firsthand that Hadoop does an admirable job of scaling out to thousands of nodes and many petabytes of data. However, less satisfactory is Hadoop’s ability to scale out in other dimensions, namely number of users, the myriad different frameworks and languages that those users employ in their daily tasks, and the tens of thousands of data applications that these users write and have to maintain.

To solve these problems, LinkedIn built Dali, a collection of libraries, services, and development tools united by the common goal of providing a dataset API for Hadoop. Carl Steinbach offers an overview of the project’s different components, discusses recent successes, and concludes with a detailed discussion of Dali Views, a new addition to the project that makes it easier to share logic and surface and manage the contracts that exist between data producers and data consumers.

Photo of Carl Steinbach

Carl Steinbach


Carl Steinbach is a senior staff software engineer at LinkedIn, where he leads the Grid Platform team. Before joining LinkedIn, Carl was an early employee at Cloudera. He is an ASF member and former PMC chair of the Apache Hive Project.