The analytics and data warehousing industries are in the midst of a major period of transformation and upheaval. Since the publication nearly a decade ago of Google’s seminal MapReduce and GFS papers, we have witnessed the appearance of Apache Hadoop, followed closely by the arrival of batch-oriented SQL systems like Apache Hive, and the scramble by established SQL vendors to implement Hadoop connectors.
This talk addresses the recent emergence of a new generation of analytic databases inspired by Google Dremel. These databases have been designed with the goal of running real-time SQL natively on Hadoop in a manner that fully exploits the flexibility and performance of the underlying platform. Characterized by features including schema-on-read, support for semi-structured data, and pluggable storage engines, and defined by systems like Citus Data’s CitusDB and Cloudera’s Impala, these new systems share important architectural details that distinguish them from the previous generation of analytic databases.
In this talk we will discuss the unavoidable cost and performance limitations of the connector-based approach employed by many established vendors and explain the long-term significance of Apache Hive’s data model along with its influence on next generation SQL-on-Hadoop databases. We will then unravel the novel architectural features common to next generation analytic database systems like CitusDB and Impala that make real-time SQL-on-Hadoop feasible. Finally, we will conclude by reviewing several important database lessons learned over the previous decades that remain relevant today.
This session is sponsored by Citus Data
Carl Steinbach is a software engineer at Citus Data, as well as a committer and PMC member on the Apache Hive project. Previously Carl worked at Cloudera where he led the Hive team, at NetApp where he developed storage encryption products, and at Oracle where he was a member of the Server Technologies group. Carl holds B.S. and M.Eng. degrees in Computer Science from MIT.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts