Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

The next step in the evolution of data science with RAPIDS

3:50pm4:30pm Thursday, March 28, 2019
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data scientists, machine learning engineers, and developers

Level

Intermediate

Prerequisite knowledge

  • A basic understanding of Python and distributed computation (e.g., PySpark)

What you'll learn

  • Explore methods and APIs to migrate traditional machine learning workflows from CPU to GPU environments
  • Learn how to migrate CPU workflows to the GPU and create GPU-accelerated graph representations and analytics
  • Understand use cases for GPU-accelerated ML and graph applications

Description

GPUs and GPU platforms have been responsible for the dramatic advancement of deep learning and other neural net methods in the past several years. At the same time, traditional machine learning workloads, which comprise the majority of business use cases, continue to be written in Python with heavy reliance on a combination of single-threaded tools (e.g., pandas and scikit-learn) or large, multi-CPU distributed solutions (e.g., Spark and PySpark).

RAPIDS is the next big step in data science, combining the ease of use of common APIs and the power and scalability of GPUs. Bartley Richardson and Joshua Patterson offer an overview of RAPIDS and and explore cuDF, cuGraph, and cuML—a trio of RAPIDS tools that enable data scientists to work with data in a familiar interface and apply graph analytics and traditional machine learning techniques.

  • RAPIDS cuDF allows for moving the vast majority of machine learning workloads from a CPU environment to GPUs. This affords a substantial speed up, particularly on large datasets, enabling rapid, interactive work that previously was cumbersome to code or very slow to execute.
  • The RAPIDS cuML library operates directly on cuDF data frames to apply traditional machine learning analytics (e.g., PCA, DBSCAN, k-means, knn, and tSVD) at scale and with GPU acceleration. Many data science problems can be approached using a graph/network view, and much like traditional machine learning workloads, this has been either local (e.g., Gephi, Cytoscape, NetworkX) or distributed on CPU platforms (e.g., GraphX).
  • The cuGraph GPU-accelerated graph library allows, with minimal conceptual code changes, both graph representations and graph-based analytics to achieve similar speed-ups on a GPU platform. By keeping all of these tasks on the GPU and minimizing redundant I/O, data scientists are enabled to model their data quickly and frequently, affording a higher degree of experimentation and more effective model generation. Further, keeping all of this in compatible formats allows quick movement from feature extraction, graph representation, graph analytic, enrichment back to the original data, and visualization of results.
Photo of Bartley Richardson

Bartley Richardson

NVIDIA

Bartley Richardson is a senior data scientist on the AI infrastructure team at NVIDIA. Bartley’s focus at NVIDIA is the research and application of GPU-accelerated methods that can help solve today’s information security and cybersecurity challenges. Previously, Bartley was a technical lead and performer on multiple DARPA research projects, where he applied data science and machine learning algorithms at scale to solve large cybersecurity problems. He was also the principal investigator of an internet of things research project focused on applying machine and deep learning techniques to large amounts of IoT data to provide intelligence value relating to form, function, and pattern of life. His primary research areas involve NLP and sequence-based methods applied to cyber network datasets as well as cross-domain applications of machine and deep learning solutions to tackle the growing number of cybersecurity threats. He loves using data and visualizations to tell stories and help make complex concepts more relatable. Bartley holds a PhD in computer science and engineering from the University of Cincinnati with a focus on loosely and unstructured query optimization and a BS in computer engineering with a focus on software design and AI.

Photo of Joshua Patterson

Joshua Patterson

NVIDIA

Joshua Patterson is a director of AI infrastructure at NVIDIA leading engineering for RAPIDS.AI. Previously, Josh was a White House Presidential Innovation Fellow and worked with leading experts across public sector, private sector, and academia to build a next-generation cyberdefense platform. His current passions are graph analytics, machine learning, and large-scale system design. Josh loves storytelling with data and creating interactive data visualizations. He holds a BA in economics from the University of North Carolina at Chapel Hill and an MA in economics from the University of South Carolina Moore School of Business.