Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Building a large-scale machine learning application using Amazon SageMaker and Spark

David Arpin (Amazon Web Services)
9:00am–12:30pm Tuesday, 09/11/2018
Data science and machine learning
Location: 1A 10 Level: Intermediate
Average rating: **...
(2.80, 10 ratings)

Who is this presentation for?

  • Data scientists, machine learning engineers, and data engineers

Prerequisite knowledge

  • Familiarity with Spark and machine learning
  • A working knowledge of AWS (useful but not required)

Materials or downloads needed in advance

  • A laptop
  • A GitHub account

What you'll learn

  • Learn how to integrate SageMaker's machine learning capabilities into existing Spark processing pipelines


Machine learning’s popularity has grown tremendously in recent years. The drive to integrate machine learning into every solution has never been more pronounced, but the path from investigation to model development to implementation in production can be difficult. Amazon SageMaker AWS’s new machine learning platform seeks to make this process easier.

Machine learning starts with data, and Spark is one of the most popular and flexible solutions for handling large datasets for ETL, ad hoc analysis, and advanced machine learning. However, using Spark for production machine learning use cases can create inconsistencies in algorithm scale, conflicts over cluster resources, and prediction latencies. By offloading training to Amazon SageMaker’s highly scalable algorithms and distributed, managed training environment and deploying with SageMaker’s real-time production endpoints, implementing machine learning in production is easier and more reliable.

David Arpin walks you through building a machine learning application, from data manipulation to algorithm training to deployment to a real-time prediction endpoint, using Spark and Amazon SageMaker.

Photo of David Arpin

David Arpin

Amazon Web Services

David Arpin is a data scientist at Amazon Web Services.

Comments on this page are now closed.


09/13/2018 7:56pm EDT

Hi – are the sample notebooks or slides going to be shared for this presentation? Thanks