Accelerating your organization: Making data optimal for machine learning
Who is this presentation for?Data engineers, data architects, developers
As a leading global software company, SurveyMonkey created the online survey category and transformed the way people give feedback. The amount of people-powered data (50+ billion questions answered on the platform, 2.4 million survey respondents per day, etc.) collected over the past two decades is a gold mine for machine learning (ML). In early 2018, SurveyMonkey started a journey with the objective to expand its ML capabilities and empower the rest of the company to leverage ML. Almost two years into this journey, it’s developed intricate data interfaces and workflows to speed up ML model development by 5x and democratize model development throughout the organization on its hybrid cloud infrastructure.
Shubhankar Jain, Aliaksandr Padvitselski, and Manohar Angani use SurveyMonkey as a case study to explain how it met the needs of its data scientists, data engineers, and product owners by developing a high-level but intuitive data interface and its ML feature store that presents high-quality, fresh data. This ML feature store acts as the source of truth for model features and input for all of SurveyMonkey’s ML model pipelines. The team leveraged a deep understanding of both the data structures specific to SurveyMonkey and the type of ML models the organization had developed. You’ll hear the story of solving this complex engineering challenge and its exciting results. Join in to learn how you can optimize your ML model development cycle and achieve similar value in your organization.
- A basic understanding of the ML development pipeline and data and ETL pipelines
What you'll learn
- Learn why the traditional data warehouse or data lake doesn't serve the ML workflow
- Discover the ideal data requirements of an ML workflow for your organization and your ML models, the building blocks and considerations when updating your ML data workflow, and how to transform your workflow within your existing infrastructure
Shubhankar Jain (he/him) is a machine learning engineer at SurveyMonkey, where he develops and implements machine learning systems for its products and teams. He’s really excited to bring his expertise and passion of data and AI systems to rest of the industry. In his free time, he likes hiking with his dog and accelerating his hearing loss at live music shows.
Aliaksandr Padvitselski (he/him) is a machine learning engineer at SurveyMonkey, where he works on building the machine learning platform and helping to integrate machine learning systems to SurveyMonkey’s products. He worked on a variety of projects related to data business and personalization at SurveyMonkey. Previously, he mostly worked in the finance industry contributing to backend services and building a data warehouse for BI systems.
Manohar Angani is a machine learning engineer at SurveyMonkey where he works on productionalizing models and integrating them with SurveyMonkey products. Previously, he worked in different groups within the company, like growth. In his free time, he likes to hang out with his family and explore the bay area.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires