One of the promises of Hadoop is that a group of users can store all their data together, for shared use and analysis. A pattern frequently seen today is Apache HBase for fast updates and low latency data with Apache Phoenix on top for SQL; while Apache Hive is the de-facto SQL standard for data warehousing on Hadoop, used for both batch and interactive analytic queries.
This separation of data into tools based on the intended use leads to duplication of data when it must be used in both situations. It causes users to spend resources moving data back and forth between tools. And users must learn multiple SQL dialects, remember which data is where, and understand when to use which tool.
There is work going on in the Hive, HBase, and Apache Phoenix communities to significantly improve the integration of these tools so that users can have one dialect of SQL, one O/JDBC connection point, and one set of tables to store their data in, regardless of whether it is intended for transactional or analytic use. This work takes advantage of changes in Hive to incorporate HBase as a storage layer for Hive tables, and Phoenix operators to execute queries against data stored in HBase. This talk will cover this work and how it relates to other work happening in the Hive, HBase, and Phoenix communities.
Alan Gates is a co-founder at Hortonworks, and an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in mathematics from Oregon State University and an MA in theology from Fuller Theological Seminary. He is also the author of Programming Pig from O’Reilly Press.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.