Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

NoSQL no more: SQL on Druid with Apache Calcite

Gian Merlino (Imply)
4:20pm5:00pm Wednesday, March 7, 2018
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data engineers and developers

Prerequisite knowledge

    Familiarity with databases and SQL

What you'll learn

  • Explore Druid SQL and how Druid and Calcite are integrated
  • Understand concepts involved in SQL and relational algebra
  • Learn how to build a SQL data access layer

Description

Druid is an analytics-focused, distributed, scale-out data store. Existing Druid clusters have scaled to petabytes of data and trillions of events, ingesting millions of events every second. Up until version 0.10, Druid could only be queried in a JSON-based language that many users found unfamiliar.

Enter Apache Calcite. It includes an industry-standard SQL parser, validator, and JDBC driver, as well as a cost-based relational optimizer. Calcite bills itself as “the foundation for your next high-performance database” and is used by Hive, Drill, and a variety of other projects. Druid uses Calcite to power Druid SQL, a standards-based query API that vaults Druid out of the NoSQL world and into the SQL world.

Gian Merlino offers an overview of Druid SQL and explains how Druid and Calcite are integrated and why you should stop worrying and learn to love relational algebra in your own projects.

Photo of Gian Merlino

Gian Merlino

Imply

Gian Merlino is CTO and cofounder of Imply and is one of the original committers of the Druid project. Previously, he worked at Metamarkets and Yahoo. Gian holds a BS in computer science from the California Institute of Technology.