Presented by O'Reilly and Cloudera
Make Data Work
July 12-13, 2017: Training
July 13-15, 2017: Tutorials & Conference
Beijing, China

使用大数据推动东南亚前行 (Driving Southeast Asia forward with big data)

This will be presented in English.

Feng Cheng (Grab), Edwin Law (Grab)
11:15–11:55 Friday, 2017-07-14
数据工程和架构 (Data engineering and architecture), 英文讲话 (Presented in English)
Location: 紫金大厅B(Grand Hall B) 观众水平 (Level): Non-technical
平均得分:: **...
(2.00, 2 次得分)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of ride-hailing platforms, distributed computing, SQL on Hadoop, Spark, and stream processing

您将学到什么 (What you'll learn)

Understand how Grab improved the performance, reliability and availability of its data infrastructure, migrated from Redshift to Presto and managed to reduce query running time from 30 minutes to 5 minutes at only 20% of the cost, and built a real-time big data platform with Spark Streaming and key-value storage

描述 (Description)



在本议题里,Cheng Feng将介绍Grab在把它的后端办公应用进行扩展时面临的一些挑战,以及我们是如何应对这一需求的。他还会分享一些架构轨迹从Redshift变为EMR+S3的历史。在早期,Redshift是一个简单且高费效比的分析我们数据的解决方案。但随着近年来我们数据量的爆炸性增长,它就变的很贵且慢了。因此我们决定对架构做出重大改变。我们用AWS的EMR+S3做为我们的数据仓库。这一架构让我们能把计算层和数据存储层分离。也可以让多个集群共享同样的S3上的数据,而且集群可以是长时运行的,或出于灵活性的考虑而仅是临时存在的。我们的用户通常是编写Spark或是Presto的任务来进行ETL和数据分析。


  • Grad的分析基础设施
  • Redshift和数据湖的对比
  • Presto:背景和场景
  • EMR上的Presto
  • Grab使用Spark Streaming的应用案例

Grab is sitting at the junction of the digital and physical worlds. Its vision is to drive Southeast Asia forward and transform the way people travel and pay across the region. With more than 700,000 drivers and 36 million app downloads, the Grab app has become a platform with one of the highest usage and transaction rates for the 620 million people in SEA—and is growing every day—giving the company an incredible opportunity to perfect the way it uses data to make lives easier across SEA.

In general, Grab aims to create and sustain a data-driven culture, using data to solve the toughest problems. The Data Engineering team is responsible for building a reliable data analytics platform, playing a big role in helping different teams to gain product and consumer insights from a multipetabyte scale data warehouse. Their work ranges from supporting ad hoc queries (booking, log, etc.) to analyzing user experience and training machine-learning models.

Feng Cheng and Edwin Law explain Grab’s data architecture and offer a history of its data platform migration and stream-processing apps. Feng and Edwin describe some of the challenges the company has faced in getting its back-office applications to scale and what it’s done to meet demand. They also explore its history of architecture traces, from Redshift to EMR + S3. In the early stage, Redshift was a simple and cost-effective solution to analyze all of Grab’s data. But when data volumes grew exponentially over the last year and data processing became more complicated, the company decided to make a big change in the architecture, leveraging AWS (EMR + S3) for its data warehouse. This architecture offers many advantages, including allowing Grab to separate the computing and storage layers and allowing multiple clusters to share the same data on S3 and data analytics.

Topics include:

  • Data infrastructure at Grab
  • Redshift versus data lakes
  • Presto: Background and context
  • Presto on EMR
  • User case studies using Spark Streaming at Grab
Photo of Feng Cheng

Feng Cheng


Cheng Feng is a data engineer at Grab, where he works on the big data platform, distributed computing, stream processing, and data science. Previously, he was a data scientist at the Lazada Group, working on Lazada’s tracker, customer segmentation and recommendation systems, and fraud detection.

Edwin Law


Edwin Law was the third person and first engineer on the Data team at Grab (formerly MyTeksi and Grab Taxi), which encompasses data engineering, data science, and data analytics. Edwin leads the almost-15-member-strong Data Engineering and Database Operations teams as their engineering manager.

Connect with O'ReillyData

Use the QR Code to follow OReillyData and get the latest conference information and browse data articles.

WeChat QRcode


Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

Read the latest ideas on big data.

ORB Data Site