Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Lyft's analytics pipeline: From Redshift to Apache Hive and Presto

Shenghu Yang (Lyft)
4:20pm5:00pm Thursday, March 8, 2018
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data engineers, analysts, and data scientists

Prerequisite knowledge

  • A basic understanding of big data and business analytics

What you'll learn

  • Explore the evolution of Lyft's data pipeline, from AWS Redshift clusters to Apache Hive and Presto

Description

Lyft’s business has grown over 100x in the past four years. Shenghu Yang explains how Lyft’s data pipeline has evolved over the years to serve its ever-growing analytics use cases, migrating from the world’s largest AWS Redshift clusters to Apache Hive and Presto for solving scalability and concurrency hard limits.

Topics include:

  • How Lyft’s data pipeline evolved
  • A flexible architecture that shares storage and a metastore but separates computation
  • How Hive replaces Redshift ETL
  • How Presto complements Hive for ad hoc queries
  • Lyft’s self-service tools
  • How Lyft educates end users about its data systems
Photo of Shenghu Yang

Shenghu Yang

Lyft

Shenghu Yang is an engineering manager at Lyft, where he was a founding member of the company’s data platform team and now runs the data tools team. Previously, Shenghu worked at Oracle and @WalmartLabs on cloud computing and digital marketing-related engineering work. He holds an MS from Carnegie Mellon University.