Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Near-real-time anomaly detection at Lyft

Thomas Weise (Lyft), Mark Grover (Lyft)
11:20am–12:00pm Thursday, 09/13/2018
Secondary topics:  Temporal data and time-series analytics, Transportation and Logistics
Average rating: **...
(2.50, 2 ratings)

Who is this presentation for?

  • Data engineers, data scientists, architects, and technical decision makers

Prerequisite knowledge

  • Basic familiarity with big data processing use cases

What you'll learn

  • Explore Lyft’s streaming platform and see how Lyft uses it to perform anomaly detection
  • Understand how data science and data engineering processes can be brought together for faster outcomes


Consumer-facing real-time processing poses a number of challenges to protect against fraudulent transactions and other risks. The streaming platform at Lyft seeks to support this with an architecture that brings together a data science-friendly programming environment with a deployment stack for the reliability, scalability, and other SLA requirements of a mission-critical stream processing system.

Thomas Weise and Mark Grover explain how Lyft uses its streaming platform to detect and respond to anomalous events. Reacting to such events with traditional development methodologies is challenging, especially where low-latency SLAs for instant user feedback are critically important. Enablement of data science tools for machine learning and a process that allows for fast and predictable deployment is of growing importance.

Topics include:

  • A deep dive into Lyft’s streaming platform, covering use cases, system architecture, and key requirements that drive technology choices
  • Examples for risk and fraud analysis of real-time transaction streams, including credit cards and location, based on machine learning models and historical data
  • A data scientist-friendly development environment with the Python ecosystem and tools that allow users to focus on business logic
  • An Apache Beam portability framework as bridge to distributed execution without code rewrites for a JVM-based target streaming engine
  • A data engineering process for continuous integration and deployment with reliability and operability focus
  • Apache Flink-based streaming execution for scalability, high availability, and low-latency processing
Photo of Thomas Weise

Thomas Weise


Thomas Weise is a software engineer for the streaming platform at Lyft. He’s also a PMC member for the Apache Apex and Apache Beam projects and has contributed to several more projects within the ASF ecosystem. Thomas is a frequent speaker at international big data conferences and the author of Learning Apache Apex.

Photo of Mark Grover

Mark Grover


Mark Grover is a product manager at Lyft. Mark’s a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data. He occasionally blogs on topics related to technology.

Comments on this page are now closed.


Picture of Mark Grover
09/13/2018 5:40am EDT

Hi all,
We are super excited to see you all real soon! You won’t want to miss this.

The slides are posted at