Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Building a large-scale machine learning application using Amazon SageMaker and Spark

David Arpin (Amazon Web Services)

9:00am–12:30pm Tuesday, 09/11/2018

Data science and machine learning
Location: 1A 10 Level: Intermediate

Average rating:

(2.80, 10 ratings)

Who is this presentation for?

Data scientists, machine learning engineers, and data engineers

Prerequisite knowledge

Familiarity with Spark and machine learning
A working knowledge of AWS (useful but not required)

Materials or downloads needed in advance

A laptop
A GitHub account

What you'll learn

Learn how to integrate SageMaker's machine learning capabilities into existing Spark processing pipelines

Description

Machine learning’s popularity has grown tremendously in recent years. The drive to integrate machine learning into every solution has never been more pronounced, but the path from investigation to model development to implementation in production can be difficult. Amazon SageMaker AWS’s new machine learning platform seeks to make this process easier.

Machine learning starts with data, and Spark is one of the most popular and flexible solutions for handling large datasets for ETL, ad hoc analysis, and advanced machine learning. However, using Spark for production machine learning use cases can create inconsistencies in algorithm scale, conflicts over cluster resources, and prediction latencies. By offloading training to Amazon SageMaker’s highly scalable algorithms and distributed, managed training environment and deploying with SageMaker’s real-time production endpoints, implementing machine learning in production is easier and more reliable.

David Arpin walks you through building a machine learning application, from data manipulation to algorithm training to deployment to a real-time prediction endpoint, using Spark and Amazon SageMaker.

David Arpin

Amazon Web Services

David Arpin is a data scientist at Amazon Web Services.

Website

Comments on this page are now closed.

Comments

Alex Taub | DEVELOPER

09/13/2018 7:56pm EDT

Hi – are the sample notebooks or slides going to be shared for this presentation? Thanks

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com