Data science and machine learning are critical enabling factors for data-driven organizations. There has been an exponential rise of expectations put on engineering organizations to meet the demand to develop and scale machine learning capabilities. A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. This includes data discovery, feature engineering, iterative model development, model training, and model scoring (batch and online). The management of artifacts, their associations, and deployment across various platform components is vital.
While there are a number of mature technologies that support each phase of this lifecycle, there are limited solutions available that tie these components together into a cohesive machine learning platform. To support the lifecycle of a model, you must be able to manage the various ML-related artifacts and their associations and automate deployment. A lifecycle management service built for this purpose should be leveraged for storage, versioning, visualizing (including associations), and deployment of artifacts. The platform should support model development in different programming languages, and language and package versions should be configured specific to a model. Having the custom environment follows the model through the lifecycle is important to guarantee model always run in the same environment. Thus, the environment should be externalized, associated, and deployed together with a model. Other considerations include the connection between various artifacts and platforms:, the data and datasets (source data and feature data, training datasets, and scoring result sets), the code (notebook code, model code, deployment code, etc.), model-specific environments, and platforms (developing and training platforms, batch and online scoring platforms).
Hope Wang explains how her team at Intuit is managing the machine learning lifecycle, how different components associate and interact with each other, and how to execute in a production environment. Hope then shares an example of how an integrated process was developed for data engineers and data scientists to manage the entire lifecycle of a model from ideation through development, training, and ultimately, scoring.
Hope Wang is a software engineer in Intuit’s Small Business Data and Analytics Group. Hope is a self-taught, self-motivated, fully powered hacker who is passionate about innovation. She holds a master’s degree in biomedical engineering from the University of Southern California.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com