Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Continuous delivery and machine learning

14:0514:45 Thursday, 24 May 2018
Data engineering and architecture
Location: Capital Suite 7 Level: Beginner
Secondary topics:  Managing and Deploying Machine Learning
Average rating: ***..
(3.00, 5 ratings)

Who is this presentation for?

  • DevOps engineers, data scientists, and data engineers

Prerequisite knowledge

  • A basic understanding of machine learning

What you'll learn

  • Explore automated machine learning and discover how it's useful for model management

Description

Guillaume Salou shares OVH’s approach to continuous deployment of machine learning models, which involved building a full stack of automated machine learning. Automated machine learning allows the company to rebuild models efficiently and keep models up to date with fresh data brought by its data convergence tool.

In most offices, DevOps and data scientists are on separate teams. OVH has merged the teams so that data scientists can access DevOps improvements like continuous delivery and DevOps can access data scientists’ knowledge. The continuous delivery of models is not as easy as building and deploying an application. First, raw data must be transformed into features, which are then preprocessed. Only then can you train and build a model. To achieve the best results, you must test different types of models associated with variables called hyperparameters. OVH monitors model performance and chooses the best one.

Guillaume discusses OVH’s first shared project, public cloud instance fraud detection. For this project, it was necessary to continually and automatically keep the model up to date in production and fed by fresh data. Guillaume outlines the architecture for the project, built on open source software like Jupyter, CDS, Warp 10, openscoring, PMML, and scikit-learn. This approach is pragmatically led by a metrics data platform. For now, this is basically an autoML solution, rebuilt daily by batch. The solution is efficient but insufficient. Guillaume explains how OVH is laying the next steps of a streamed fully automated machine learning platform that will allow data scientists to work on Stage-Gate innovation processes and efficiently go to production.

Photo of Guillaume Salou

Guillaume Salou

OVH

Guillaume Salou is the machine learning services team leader at OVH, where he is focusing on extracting high value from specific data science applications in order to make it available to all. Previously, he worked on data lakes.