Mar 15–18, 2020

Using Ray to Scale Python, Data Processing, and Machine Learning

Robert Nishihara (University of California, Berkeley), Ion Stoica (University of California, Berkeley), Philipp Moritz (University of California, Berkeley)
1:30pm5:00pm Monday, March 16, 2020
Location: LL20D

Who is this presentation for?

Data engineers, data architects, developers

Level

Intermediate

Description

Surprisingly, there is no simple way to scale up Python applications and analytics seamlessly from your laptop to the cloud. Someone developing an ML or data processing application on his or her laptop may parallelize the application across 4 or 8 cores but will then invariably hit a wall. Scaling up to the cloud often requires rewriting and rearchitecting the application using complex low-level tools like Kubernetes and GRPC or high-level but domain specific tools like Spark and Horovod.

Ray is an open source framework for parallel and distributed computing that makes it easy to program at any scale (from your laptop to the datacenter) by providing easy-to-use, general-purpose, and high-performance primitives.

This tutorial will cover the following:
- How to use Ray to scale up existing Python, ML and data processing applications from your laptop to the cloud without rewriting your applications!
- Use Ray’s fault tolerance to program at scale for 1/10th the cost.
- Use Ray’s out of the box tools for hyperparameter search and experiment management to accelerate data science and machine learning.
- Train models and serve predictions at any scale using Ray’s libraries for training and serving.

Prerequisite knowledge

Knowledge of programming in Python is required. A very basic understanding of machine learning concepts would be helpful but is not required.

Materials or downloads needed in advance

Internet access. Turn off corporate VPNs, which may prevent using a Jupyter notebook.

What you'll learn

This tutorial will cover the following: - How to use Ray to scale up existing Python, ML and data processing applications from your laptop to the cloud without rewriting your applications! - Use Ray's fault tolerance to program at scale for 1/10th the cost. - Use Ray's out of the box tools for hyperparameter search and experiment management to accelerate data science and machine learning. - Train models and serve predictions at any scale using Ray's libraries for training and serving.
Photo of Robert Nishihara

Robert Nishihara

University of California, Berkeley

Robert Nishihara is a PhD student working in the University of California, Berkeley, RISELab with Michael Jordan. He works on machine learning, optimization, and artificial intelligence.

Photo of Ion Stoica

Ion Stoica

University of California, Berkeley

Ion Stoica is a professor in the electrical engineering and computer sciences (EECS) department at the University of California, Berkeley, where he does research on cloud computing and networked computer systems. Previously, he worked on dynamic packet state, chord DHT, internet indirection infrastructure (i3), declarative networks, and large-scale systems, including Apache Spark, Apache Mesos, and Alluxio. He’s the cofounder of Databricks—a startup to commercialize Apache Spark—and Conviva—a startup to commercialize technologies for large-scale video distribution. Ion is an ACM fellow and has received numerous awards, including inclusion in the SIGOPS Hall of Fame (2015), the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001).

Photo of Philipp Moritz

Philipp Moritz

University of California, Berkeley

Philipp Moritz is a PhD candidate in the electrical engineering and computer sciences (EECS) department at the University of California, Berkeley, with broad interests in artificial intelligence, machine learning, and distributed systems. He’s a member of the Statistical AI Lab and the RISELab.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires