Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Web analytics at scale with Druid at Naver

Jason Heo (Naver), Dooyong Kim (Navercorp)

11:15–11:55 Wednesday, 23 May 2018

Data engineering and architecture
Location: S11B Level: Intermediate

Secondary topics: Data Platforms, Media, Advertising, Entertainment

Average rating:

(3.00, 1 rating)

Download slides (1-PDF)

View slides

Who is this presentation for?

Data engineers

Prerequisite knowledge

Familiarity with analytics systems

What you'll learn

Learn best practices for building analytic systems with Druid

Description

Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver’s web analytics system, built with Druid. Jason and Dooyong outline the architecture, share techniques for speedup, explain how they implemented Spark Druid Connector, demonstrate how to use it, and explain how they extended Druid to solve the challenges their team faced.

Topics include:

What is Druid and why should you use it?
The architecture
Implementing and using Spark Druid Connector
Extending Druid’s queries
How Kafka’s indexing service works
Approximate TopN Query for speedup
Split-apply-combine for multidimensional queries
How to improve Plywood Druid Requester
How to run Druid on CDH

Jason Heo

Naver

Jason Heo is a senior software engineer at Naver, where he develops analytics systems and graph databases for internal use. Previously, he worked at a number of startups. Jason helped MySQL become widely used in Korea and wrote a book on MySQL. Nowadays, he mainly uses Spark, Elasticsearch, Kudu, and Druid to build analytic systems.

Website

Dooyong Kim

Navercorp

Dooyong Kim is a software engineer at Naver, where he has been working on building a Spark- and Druid-based OLAP platform. Previously, he was a search engineer at ecommerce search platform Coupang, where he implemented several Apache Solr search infrastructure-related projects and researched a Spark and Solr integrated indexing mechanism. Dooyong is currently interested in MPP and advanced file formats for big data processing.

Comments on this page are now closed.

Comments

Jason Heo | SENIOR SOFTWARE ENGINEER

24/05/2018 1:26 BST

Slide is avaliable here – https://www.slideshare.net/JasonJungsuHEO/web-analytics-at-scale-with-druid-at-navercom

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com