Over the past couple of years, we’ve seen firsthand that Hadoop does an admirable job of scaling out to thousands of nodes and many petabytes of data. However, less satisfactory is Hadoop’s ability to scale out in other dimensions, namely number of users, the myriad different frameworks and languages that those users employ in their daily tasks, and the tens of thousands of data applications that these users write and have to maintain.
To solve these problems, LinkedIn built Dali, a collection of libraries, services, and development tools united by the common goal of providing a dataset API for Hadoop. Carl Steinbach offers an overview of the project’s different components, discusses recent successes, and concludes with a detailed discussion of Dali Views, a new addition to the project that makes it easier to share logic and surface and manage the contracts that exist between data producers and data consumers.
Carl Steinbach is a senior staff software engineer at LinkedIn, where he leads the Grid Platform team. Before joining LinkedIn, Carl was an early employee at Cloudera. He is an ASF member and former PMC chair of the Apache Hive Project.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.