Hive tables are an integral part of the big data ecosystem, but the simple directory-based design that made them ubiquitous is increasingly problematic. Netflix uses tables backed by S3 that, like other object stores, don’t fit this directory-based model: listings are much slower, renames are not atomic, and results are eventually consistent. Even tables in HDFS are problematic at scale, and reliable query behavior requires readers to acquire locks and wait.
Owen O’Malley and Ryan Blue offer an overview of Iceberg, a new open source project that defines a new table layout addresses the challenges of current Hive tables, with properties specifically designed for cloud object stores, such as S3. Iceberg is an Apache-licensed open source project. It specifies the portable table format and standardizes many important features, including:
Owen O’Malley is a cofounder and technical fellow at Cloudera, formerly Hortonworks. Cloudera’s software includes Hadoop and the large ecosystem of big data tools that enterprises need for their data analytics. Owen has been working on Hadoop since the beginning of 2006 at Yahoo, was the first committer added to the project, and used Hadoop to set the Gray sort benchmark in 2008 and 2009. Previously, he was the architect of MapReduce, Security, and now Hive. He’s driving the development of the ORC file format and adding ACID transactions to Hive.
Ryan Blue is an engineer on Netflix’s big data platform team. Previously, Ryan was responsible for the Avro and Parquet file formats at Cloudera. He is the author of the Analytic Data Storage in Hadoop series of screencasts from O’Reilly.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com