Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Adding Insert, Update, and Delete to Hive

Alan Gates (Hortonworks)
2:20pm–3:00pm Friday, 02/20/2015
Hadoop Platform
Location: 210 C/G
Average rating: ****.
(4.75, 4 ratings)
Slides:   external link

Apache Hive provides a convenient SQL query engine and table abstraction for data stored in Hadoop. Hive uses Hadoop to provide highly scaleable bandwidth to the data, but until recently did not support updates, deletes, or transaction isolation. This has prevented many desirable use cases, such as updating of dimension tables or doing data clean up. We have implemented the standard SQL commands insert, update, and delete allowing users to insert new records as they become available, update changing dimension tables, repair incorrect data, and remove individual records. This also allows very low latency ingestion of streaming data from tools like Storm and Flume. Additionally, we have added ACID compliant snapshot isolation between queries so that queries will see a consistent view of the committed transactions when they are launched. This talk will cover the intended use cases, architectural challenges of implementing updates and deletes in a write once file system, performance of the solution, as well as details of changes to the file storage formats and transaction management system.

Photo of Alan Gates

Alan Gates

Hortonworks

Alan is a co-founder at Hortonworks and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in Mathematics from Oregon State University and a MA in Theology from Fuller Theological Seminary. He is also the author of Programming Pig, a book from O’Reilly Press.

Comments on this page are now closed.

Comments

Picture of Alan Gates
Alan Gates
02/26/2015 2:25am PST

Slides are at http://www.slideshare.net/alanfgates/hive-acidupdatesstratasjcfeb2015

Picture of Jo Ramos
Jo Ramos
02/26/2015 2:01am PST

Hi Alan, great session. Are the slides available?