Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Web analytics at scale with Druid at Naver

Jason Heo (Naver), Dooyong Kim (Navercorp)
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Platforms, Media, Advertising, Entertainment
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • Data engineers

Prerequisite knowledge

  • Familiarity with analytics systems

What you'll learn

  • Learn best practices for building analytic systems with Druid

Description is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver’s web analytics system, built with Druid. Jason and Dooyong outline the architecture, share techniques for speedup, explain how they implemented Spark Druid Connector, demonstrate how to use it, and explain how they extended Druid to solve the challenges their team faced.

Topics include:

  • What is Druid and why should you use it?
  • The architecture
  • Implementing and using Spark Druid Connector
  • Extending Druid’s queries
  • How Kafka’s indexing service works
  • Approximate TopN Query for speedup
  • Split-apply-combine for multidimensional queries
  • How to improve Plywood Druid Requester
  • How to run Druid on CDH
Photo of Jason Heo

Jason Heo


Jason Heo is a senior software engineer at Naver, where he develops analytics systems and graph databases for internal use. Previously, he worked at a number of startups. Jason helped MySQL become widely used in Korea and wrote a book on MySQL. Nowadays, he mainly uses Spark, Elasticsearch, Kudu, and Druid to build analytic systems.

Photo of Dooyong Kim

Dooyong Kim


Dooyong Kim is a software engineer at Naver, where he has been working on building a Spark- and Druid-based OLAP platform. Previously, he was a search engineer at ecommerce search platform Coupang, where he implemented several Apache Solr search infrastructure-related projects and researched a Spark and Solr integrated indexing mechanism. Dooyong is currently interested in MPP and advanced file formats for big data processing.

Comments on this page are now closed.


Picture of Jason Heo
24/05/2018 1:26 BST

Slide is avaliable here –