Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Web Analytics at Scale with Druid @naver.com

Jason Heo (Navercorp), Dooyong Kim (Navercorp)
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture
Location: S11B Level: Intermediate

Who is this presentation for?

Data Engineer

Prerequisite knowledge

Basic knowledge of how to build analytics system

What you'll learn

Best practices for Analytic system with Druid

Description

Multi-dimensional analysis in real-time over Big Data is the dream of all of Data engineers. But GroupBy operation is so expensive that real-time response could not be achieved easily. My team has tested several Big Data Platforms to build real-time MOLAP system and we finally chose Druid (a high-performance, column-oriented, distributed data store). Even though Druid is a great OLAP system, we had some difficulties to make Analytics System with Druid. For example, not easy to operate, has insufficient document and materials to understand how it works, and because it is not used widely, there are a few use-cases to reference. In this talk, We will introduce problems arose and how we solved it.

Our presentation will cover followings:

  • Why Druid (experimental results of Elasticsearch, Kudu, and Druid)
  • Our architecture
  • Spark on Druid
  • Extending Druid’s query
  • How Kafka Indexing Service works
  • Approximation for speed-up (topN Query and Sampling)
  • Split-Apply-Combine for multi-dimensional query
  • How to improve Plywood Druid Requestor
  • How to run Druid on CDH
Photo of Jason Heo

Jason Heo

Navercorp

Jason is a senior software engineer at Naver. He joined Naver in 2007 and developed analytics systems and Graph Database for internal use. Before joining Naver, he experienced several startups for 8 years. Late 90s and early 2000s, he helped MySQL be used widely in Korea, and wrote MySQL Book. Nowadays he mainly use Spark, Elasticsearch, Kudu, and Druid to successfully build analytic systems.

Photo of Dooyong Kim

Dooyong Kim

Navercorp

Dooyong Kim is a software engineer at Naver, has been working on building spark + druid based olap platform. Prior to Naver Kim worked as a search engineer at Coupang e-commerce search platform, did several apache solr search infrastructure related projects and researched spark + solr integrated indexing mechanism. Kim is into MPP and advanced file format for big data processing nowadays.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)