San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Persistent storage for machine learning in KubeFlow

Skyler Thomas (MapR), Terry He (MapR Technologies)

5:10pm–5:50pm Wednesday, March 27, 2019

Data Engineering & Architecture
Location: 2008

Secondary topics: AI and Data technologies in the cloud, Model lifecycle management, Storage

Average rating:

(4.75, 4 ratings)

Who is this presentation for?

CDOs, data scientists, data engineers, and machine learning engineers

Level

Intermediate

Prerequisite knowledge

Familiarity with container and ML technology (useful but not required)

What you'll learn

Learn how to use persistent storage to support parallelized ML frameworks with differing compute footprints

Description

The Kubernetes and Hadoop ecosystems are conglomerates of occasionally integrated and interrelated tools intended for use by data scientists and data engineers. The advantage conferred by Kubernetes has been the ability to deploy prebuilt offerings from container registries, allowing tools to be easily downloaded (pulled) and deployed on systems, without the traditional install pain around compiling from source that was frequently present in Hadoop ecosystem projects.

This approach is sufficient for simple deployments of single containers running isolated processes. But in most cases, users want to scale workflows up and down, using multiple containers to run parallel processes. In order to do this, templatized offerings and the ability to easily deploy them are needed. The most common way to manage this in Kubernetes is with Helm Charts, Operators, or ksonnets, which are collections of YAML files that describe a deployment template such that it’s reproducible and can be used to generate interconnected pods of containers on demand.

KubeFlow makes all of this functionality a bit more user-friendly by providing some of the commonly used machine learning projects as prebuilt templatized offerings (ksonnets) that are pretested to integrate together in one Kubernetes namespace. The initial list is based off of a common TensorFlow deployment pattern and has been opened up to support other engines and modules. This is revolutionary for managing the complexity around parallelizing compute engines and scaling workflows up and down. But for it to work as intended, there’s a need for a complementary storage layer that can serve the data and models to the compute workflows and even save state.

Skyler Thomas and Terry He explore the problems of state and storage and explain how distributed persistent storage can logically extend the compute flexibility provided by KubeFlow. You’ll learn the benefits of using Kubeflow and the considerations for mounting persistent storage to a KubeFlow tenant in order to provide a unified security model and secure data access that doesn’t require any data movement.

Skyler Thomas

MapR

Skyler Thomas is an engineer at MapR, where he is designing Kubernetes-based infrastructure to deliver machine learning and big data applications at scale. Previously, Skyler was chief architect for WebSphere user experience at IBM, where he worked with more than a hundred customers to deliver extreme-scaled applications in the healthcare, financial services, and retail industries.

Terry He

MapR Technologies

Terry He is a senior director of engineering at MapR, where he manages MapR’s Hadoop and ecosystem engineering teams and leads the company’s AI/ML initiatives.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com