Multi-dimensional analysis in real-time over Big Data is the dream of all of Data engineers. But GroupBy operation is so expensive that real-time response could not be achieved easily. My team has tested several Big Data Platforms to build real-time MOLAP system and we finally chose Druid (a high-performance, column-oriented, distributed data store). Even though Druid is a great OLAP system, we had some difficulties to make Analytics System with Druid. For example, not easy to operate, has insufficient document and materials to understand how it works, and because it is not used widely, there are a few use-cases to reference. In this talk, We will introduce problems arose and how we solved it.
Our presentation will cover followings:
Jason is a senior software engineer at Naver. He joined Naver in 2007 and developed analytics systems and Graph Database for internal use. Before joining Naver, he experienced several startups for 8 years. Late 90s and early 2000s, he helped MySQL be used widely in Korea, and wrote MySQL Book. Nowadays he mainly use Spark, Elasticsearch, Kudu, and Druid to successfully build analytic systems.
Dooyong Kim is a software engineer at Naver, has been working on building spark + druid based olap platform. Prior to Naver Kim worked as a search engineer at Coupang e-commerce search platform, did several apache solr search infrastructure related projects and researched spark + solr integrated indexing mechanism. Kim is into MPP and advanced file format for big data processing nowadays.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com