Mar 15–18, 2020

Reducing data lag from 24+ hours to 5 mins at Lyft scale

Mark Grover (Lyft), Dev Tagare (Lyft)
11:50am12:30pm Tuesday, March 17, 2020
Location: LL21B
Secondary topics:  Streaming and IoT

Who is this presentation for?

Data engineers, data architects, developers




It used to take Lyft more than 24 hours from the time someone took a ride to the time it showed up in Lyft’s analytical stores. Recently, the company up-leveled its data architecture to reduce the lag from the time an event happens in real life to the time Lyft is able to make use of it in decision making down to a few minutes.

Mark Grover and Dev Tagare offer you a glimpse at the end-to-end data architecture Lyft uses to reduce data lag appearing in its analytical systems from 24+ hours to under 5 minutes. You’ll learn the what and why of tech choices, monitoring, and best practices. They outline the use cases Lyft has enabled, especially in ML model performance and evaluation.

Prerequisite knowledge

  • A basic understanding of streaming and data architectures

What you'll learn

  • Learn how to build next-generation data architecture to lower the lag to insight and decision making
Photo of Mark Grover

Mark Grover


Mark Grover is a product manager at Lyft. Mark’s a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data. He occasionally blogs on topics related to technology.

Photo of Dev Tagare

Dev Tagare


Dev Tagare is an engineering manager at Lyft. He has hands-on experience in building end-to-end data platforms for high-velocity and large data volume use cases. Previously, Dev spent 10 years leading engineering functions for companies including Oracle and Twitter with a focus on areas including open source; big data; low-latency, high-scalability design; data structures; design patterns; and real-time analytics.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires