Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Hive as a service

Szehon Ho (Criteo), Pawel Szostek (Criteo)

11:50am–12:30pm Thursday, March 8, 2018

Big data and data science in the cloud, Data engineering and architecture, Media, entertainment, and advertising
Location: LL21 E/F

Average rating:

(4.50, 2 ratings)

Download slides (PPTX)

Who is this presentation for?

Developers and those in operations and business

Prerequisite knowledge

A basic understanding of Hive and Mesos

What you'll learn

Explore the evolution of Criteo's Hive platform

Description

Hive is the main data transformation tool at Criteo, and hundreds of analysts and thousands of automated jobs run Hive queries every day. Szehon Ho and Pawel Szostek discuss the evolution of Criteo’s Hive platform from an error-prone add-on installed on some spare machines to a best-in-class installation capable of self-healing and automatically scaling to handle its growing load.

The resulting platform is based on Mesos. Mesos has allowed Criteo to scale per demand and better utilize resources, iterate on development much faster than on bare metal, and roll out new versions seamlessly without downtime for our users. Finally, it has allowed the company to eliminate the last SPOF in its Hive stack. Szehon and Pawel detail Criteo’s data architecture and explain how the company solved challenges in security, monitoring, scheduling, and load balancing on multiple layers. They also discuss the gains made by this process.

Szehon Ho

Criteo

Szehon Ho is a staff software engineer on the analytics data storage team at Criteo, where he works on Criteo’s Hive platform. Previously, he was a software engineer on the Hive team at Cloudera. He was a committer and PMC member in the Apache Hive open source community, working on features like Hive on Spark and Hive monitoring and metrics, among others.

Pawel Szostek

Criteo

Pawel Szostek is a senior software engineer on Criteo’s analytics data storage team, where he works on various projects, including implementing an improved HyperLogLog algorithm. Previously, he was a researcher at CERN in Geneva.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com