Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

An introduction to Druid

Fangjin Yang (Imply)
4:35pm–5:15pm Thursday, 09/29/2016
Data innovations
Location: 1 E 07/1 E 08 Level: Intermediate
Tags: real-time
Average rating: *****
(5.00, 4 ratings)

Prerequisite knowledge

  • A basic understanding of the data infrastructure space
  • A general knowledge of distributed systems
  • What you'll learn

  • Understand limitations of existing data warehouses
  • Learn about alternative architectures and why Druid is a good fit to ingest streams and power data applications
  • Description

    Cluster computing frameworks such as Hadoop or Spark are tremendously beneficial in processing and deriving insights from data. However, long query latencies make these frameworks suboptimal choices to power interactive applications. Organizations frequently rely on dedicated query layers such as relational databases and key-value stores for faster query latencies, but these technologies suffer many drawbacks for analytic use cases.

    User-facing applications are replacing traditional reporting interfaces as the preferred means for organizations to derive value from their datasets. In order to provide an interactive user experience, user interactions with analytic applications must complete in an order of milliseconds. To meet these needs, organizations often struggle with selecting a proper serving layer; many select serving layers because of their general popularity without understanding the possible architecture limitations.

    Druid is an analytics data store designed for analytic (OLAP) queries on event data. It draws inspiration from Google’s Dremel, Google’s PowerDrill, and search infrastructure, and many large technology companies are switching to Druid for analytics. Fangjin Yang discusses using Druid for analytics and explains why the architecture is well suited to power analytic dashboards.

    Photo of Fangjin Yang

    Fangjin Yang


    Fangjin Yang is a coauthor of the open source Druid project and a cofounder of Imply, a data analytics startup based in San Francisco. Previously, Fangjin held senior engineering positions at Metamarkets and Cisco Systems. Fangjin has a BASc in electrical engineering and an MASc in computer engineering from the University of Waterloo, Canada.

    Comments on this page are now closed.


    Picture of André Morrow
    André Morrow
    10/04/2016 1:38pm EDT

    All Strata + Hadoop World 2016 slide presentations have been posted if they were made available to us by the speakers.

    09/29/2016 12:53pm EDT

    Can you share the slides.