Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

GoAi and PyGDF: GPU-accelerated data science with Jupyter notebooks

Joshua Patterson (NVIDIA), Keith Kraus (NVIDIA), Leo Meyerovich (Graphistry)
5:00pm–5:40pm Friday, August 24, 2018

Who is this presentation for?

  • Developers, data scientists, and architects

Prerequisite knowledge

  • A working knowledge of Python, the Jupyter Notebook, and data science concepts and practices
  • Familiarity with machine learning (useful but not required)

What you'll learn

  • Explore the GPU Open Analytics Initiative (GoAi)
  • Learn how to build Jupyter notebooks with GPU-accelerated data processing and visualizations

Description

An open data science ecosystem is important for advancing computing and collaboration among data scientists and developers. On one side, the GPU Open Analytics Initiative (GoAi) is bringing transparent GPU acceleration to everyday tasks for analysts and making GPU-accelerated Python a first-class citizen for developers with libraries such as the Python GPU DataFrame (PyGDF). At the same time, the prototyping, collaboration, and documentation ability of the Jupyter Notebook and JupyterLab have made Project Jupyter the go-to platform for data science and development.

Joshua Patterson, Leo Meyerovich, and Keith Kraus demonstrate how to use PyGDF and other GoAi technologies to easily analyze and interactively visualize large datasets from standard Jupyter notebooks. They offer an overview of GoAi, explain why data scientists and developers are joining together around it, and detail how to leverage it today from your existing notebooks and codebases. Along the way, you’ll learn how open standards such as Apache Arrow are enabling clean integrations and interoperability between data science and GPU tools, including a look into how Graphistry brought end-to-end GPU acceleration of visual graph analytics to Jupyter workflows and how the same pattern is repeating throughout the data processing and visualization community. The result is analysts and developers can focus on solving problems rather than worrying about gluing different libraries and technologies together. They conclude by sharing future plans for GPU data science and new ways you can contribute to improving data science as a whole.

Photo of Joshua Patterson

Joshua Patterson

NVIDIA

Joshua Patterson is the Director of AI Infrastructure at NVIDIA, leading development of RAPIDS. Previously, Josh worked with leading experts across the public and private sectors and academia to build a next-generation cyberdefense platform. He was also a White House Presidential Innovation Fellow. His current passions are graph analytics, machine learning, and GPU data acceleration. Josh also loves storytelling with data and creating interactive data visualizations. He holds a BA in economics from the University of North Carolina at Chapel Hill and an MA in economics from the University of South Carolina’s Moore School of Business.

Photo of Keith Kraus

Keith Kraus

NVIDIA

Keith Kraus is a Washington, DC-based senior engineer on the AI infrastructure team at NVIDIA, where he builds GPU-accelerated solutions around data engineering, analytics, and visualization. Previously, Keith did extensive data engineering, systems engineering, and data visualization work in the cybersecurity domain, focused on building a GPU-accelerated big data solution for advanced threat detection and cyberthreat-hunting capabilities. Keith holds a BEng in computer engineering and an MEng in networked information systems from Stevens Institute of Technology.

Photo of Leo Meyerovich

Leo Meyerovich

Graphistry

Leo Meyerovich cofounded Graphistry, Inc. to help enterprise and federal teams easily scale visual investigations of their event and graph data. Graphistry’s original approach of connecting GPUs in browsers to GPUs in datacenters builds upon the founding team’s work at UC Berkeley on the first parallel web browser and the Superconductor language. Leo is most cited for his work in language-based security and policy verification. His earlier research received awards for the first reactive web language Flapjax, parallelizing the web browser, and the sociological foundations of programming languages.