Building a self-service platform for continuous, real-time feature generation.
Who is this presentation for?Data engineers, data architects, developers
At Lyft, all our systems, including client applications generate many millions of events per second. These events are ingested by the event ingestion pipeline and streamed through Kinesis and Kafka and also available in persistent stores such as Hive for offline consumption.
This data can be used to generate features for ML models as well as for any other form of real time decision making. Our Research Scientists and Data Scientists come up with algorithms to get features from data. However, the challenge lies in doing this quickly, correctly, effectively and reliably and at scale. For this we have built a self service platform using Flink, Beam and Kubernetes that can be used to write, prototype and deploy stateful computations on high throughput streaming data.
With this platform we have tried to abstract out the challenges of dealing with provisioning, data discovery, bootstrapping, skew, late arriving and unordered events, downtime etc, so that our experts can focus on what they do best without having to worry about managing and scaling a distributed system.
Computations can be expressed in terms of SQL and Python and prototyped in an interactive interface, making it easy for even someone with no programming background to hit the ground running on Day 1.
In this talk I will be covering the challenges of building such a system, common pitfalls, lessons learned as well as wins!
Prerequisite knowledgeRudimentary knowledge of Machine Learning and its applications
What you'll learn
Sherin is a Software Engineer at Lyft. In her career spanning 8 years, she has worked on most parts of the tech stack, but enjoys the challenges in Data Science and Machine Learning the most. Most recently she has been focussed on building products that would facilitate advances in Artificial Intelligence and Machine Learning through Streaming.
She is passionate about getting more people, especially women, interested in this field and has been trying her best to share her work with the community through tech talks and panel discussions. Most recently she gave a talk about Machine Learning Infra and Streaming, at Beam Summit as well as Flink Forward in Berlin.
In her free time she loves to read and paint. She is also the president of the Russian Hill book club based in San Francisco and loves to organize events for her local library.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires