Fueling innovative software
July 15-18, 2019
Portland, OR

Building machine learning inference pipelines at scale

Julien Simon (AWS)
11:00am11:40am Thursday, July 18, 2019
Secondary topics:  AI Enhanced
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Developers, data engineers, and data scientists




Real-life ML workloads typically require more than training and predicting: data often needs to be preprocessed and postprocessed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm but a sequence of algorithms that will collaborate in delivering predictions from raw data.

Julien Simon outlines how to use Apache Spark MLlib to build ML pipelines and discusses scaling options when datasets grow huge. As the cloud is a popular way to scale, he dives into how to how implement inference pipelines on AWS using Apache Spark and sci-kit learn, as well as ML algorithms implemented by Amazon.

Prerequisite knowledge

  • A basic knowledge of ML, Spark, and Python

What you'll learn

  • Learn to build complex ML workflows using open source libraries, and deploy and scale them for large datasets
Photo of Julien Simon

Julien Simon


Julien Simon is a technical evangelist at AWS. Previously, Julien spent 10 years as a CTO and vice president of engineering at a number of top-tier web startups. He’s particularly interested in all things architecture, deployment, performance, scalability, and data. Julien frequently speaks at conferences and technical workshops, where he helps developers and enterprises bring their ideas to life thanks to the Amazon Web Services infrastructure.