Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Operationalizing Presto in the cloud: Lessons and mistakes

Feng Cheng (Grab), Yanyu Qu (Grab)
4:15pm4:55pm Wednesday, December 6, 2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Beginner
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data engineers, analysts, and scientists

Prerequisite knowledge

  • A basic understanding of ride-hailing platforms, distributed computing, SQL on Hadoop, Spark, and stream processing

What you'll learn

  • Learn how to architect a distributed data processing platform in the cloud using Presto, manage Presto in the cloud at scale, tune the performance of Presto, and manage Presto users

Description

Grab uses Presto to support operational reporting (batch and near real-time), ad hoc analyses, and its data pipeline. Currently, Grab has 5+ clusters with 100+ instances in production on AWS and serves up to 30K queries per day while supporting more than 200 internal data users. Feng Cheng and Yanyu Qu explain how Grab operationalizes Presto in the cloud and share lessons learned along the way.

Topics include:

  • Managing Presto from dev to production
  • Presto performance tuning
  • Managing Presto users
Photo of Feng Cheng

Feng Cheng

Grab

Cheng Feng is a data engineer at Grab, where he works on the big data platform, distributed computing, stream processing, and data science. Previously, he was a data scientist at the Lazada Group, working on Lazada’s tracker, customer segmentation and recommendation systems, and fraud detection.

Photo of Yanyu Qu

Yanyu Qu

Grab

Yanyu Qu is a data engineer on Grab’s data engineering team, where he works on Spark and Presto’s data gateway. Previously, he worked at FunPlus, App Annie, IBM, and Teradata.