Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Transaction processing with Apache Hive, HBase, and Phoenix

Alan Gates (Hortonworks)
4:35pm–5:15pm Wednesday, 09/30/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Advanced
Average rating: ***..
(3.64, 11 ratings)

One of the promises of Hadoop is that a group of users can store all their data together, for shared use and analysis. A pattern frequently seen today is Apache HBase for fast updates and low latency data with Apache Phoenix on top for SQL; while Apache Hive is the de-facto SQL standard for data warehousing on Hadoop, used for both batch and interactive analytic queries.

This separation of data into tools based on the intended use leads to duplication of data when it must be used in both situations. It causes users to spend resources moving data back and forth between tools. And users must learn multiple SQL dialects, remember which data is where, and understand when to use which tool.

There is work going on in the Hive, HBase, and Apache Phoenix communities to significantly improve the integration of these tools so that users can have one dialect of SQL, one O/JDBC connection point, and one set of tables to store their data in, regardless of whether it is intended for transactional or analytic use. This work takes advantage of changes in Hive to incorporate HBase as a storage layer for Hive tables, and Phoenix operators to execute queries against data stored in HBase. This talk will cover this work and how it relates to other work happening in the Hive, HBase, and Phoenix communities.

Photo of Alan Gates

Alan Gates

Hortonworks

Alan Gates is a co-founder at Hortonworks, and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in mathematics from Oregon State University and an MA in theology from Fuller Theological Seminary. He is also the author of Programming Pig from O’Reilly Press.