Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Web analytics at scale with Druid at Naver

Jason Heo (Naver), Dooyong Kim (Navercorp)
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture
Location: S11B Level: Intermediate
Secondary topics:  Data Platforms, Media, Advertising, Entertainment

Who is this presentation for?

  • Data engineers

Prerequisite knowledge

  • Familiarity with analytics systems

What you'll learn

  • Learn best practices for building analytic systems with Druid

Description

naver.com is the largest search engine in Korea, which shares 70% of the Korean search market. Speaker’s team handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver’s web analytics system, built with Druid. They outline the architecture, share techniques for speedup, explain how they implement Spark Druid Connector, and how to use it, and detail how they extended Druid to solve the challenges their team faced.

Topics include:

  • What is Druid and why Druid
  • The architecture
  • Implementing and using Spark Druid Connector
  • Extending Druid’s queries
  • How Kafka’s indexing service works
  • Approximate TopN Query for speedup
  • Split-apply-combine for multidimensional queries
  • How to improve Plywood Druid Requester
  • How to run Druid on CDH
Photo of Jason Heo

Jason Heo

Naver

Jason Heo is a senior software engineer at Naver, where he develops analytics systems and graph databases for internal use. Previously, he worked at a number of startups. Jason helped MySQL become widely used in Korea and wrote a book on MySQL. Nowadays, he mainly uses Spark, Elasticsearch, Kudu, and Druid to build analytic systems.

Photo of Dooyong Kim

Dooyong Kim

Navercorp

Dooyong Kim is a software engineer at Naver, where he has been working on building a Spark- and Druid-based OLAP platform. Previously, he was a search engineer at ecommerce search platform Coupang, where he implemented several Apache Solr search infrastructure-related projects and researched a Spark and Solr integrated indexing mechanism. Dooyong is currently interested in MPP and advanced file formats for big data processing.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)