An open data science ecosystem is important for advancing computing and collaboration among data scientists and developers. On one side, the GPU Open Analytics Initiative (GoAi) is bringing transparent GPU acceleration to everyday tasks for analysts and making GPU-accelerated Python a first-class citizen for developers with libraries such as the Python GPU DataFrame (PyGDF). At the same time, the prototyping, collaboration, and documentation ability of the Jupyter Notebook and JupyterLab have made Project Jupyter the go-to platform for data science and development.
Joshua Patterson, Leo Meyerovich, and Keith Kraus demonstrate how to use PyGDF and other GoAi technologies to easily analyze and interactively visualize large datasets from standard Jupyter notebooks. They offer an overview of GoAi, explain why data scientists and developers are joining together around it, and detail how to leverage it today from your existing notebooks and codebases. Along the way, you’ll learn how open standards such as Apache Arrow are enabling clean integrations and interoperability between data science and GPU tools, including a look into how Graphistry brought end-to-end GPU acceleration of visual graph analytics to Jupyter workflows and how the same pattern is repeating throughout the data processing and visualization community. The result is analysts and developers can focus on solving problems rather than worrying about gluing different libraries and technologies together. They conclude by sharing future plans for GPU data science and new ways you can contribute to improving data science as a whole.
Joshua Patterson is the Director of AI Infrastructure at NVIDIA, leading development of RAPIDS. Previously, Josh worked with leading experts across the public and private sectors and academia to build a next-generation cyberdefense platform. He was also a White House Presidential Innovation Fellow. His current passions are graph analytics, machine learning, and GPU data acceleration. Josh also loves storytelling with data and creating interactive data visualizations. He holds a BA in economics from the University of North Carolina at Chapel Hill and an MA in economics from the University of South Carolina’s Moore School of Business.
Keith Kraus is a Washington, DC-based senior engineer on the AI infrastructure team at NVIDIA, where he builds GPU-accelerated solutions around data engineering, analytics, and visualization. Previously, Keith did extensive data engineering, systems engineering, and data visualization work in the cybersecurity domain, focused on building a GPU-accelerated big data solution for advanced threat detection and cyberthreat-hunting capabilities. Keith holds a BEng in computer engineering and an MEng in networked information systems from Stevens Institute of Technology.
Leo Meyerovich cofounded Graphistry, Inc. to help enterprise and federal teams easily scale visual investigations of their event and graph data. Graphistry’s original approach of connecting GPUs in browsers to GPUs in datacenters builds upon the founding team’s work at UC Berkeley on the first parallel web browser and the Superconductor language. Leo is most cited for his work in language-based security and policy verification. His earlier research received awards for the first reactive web language Flapjax, parallelizing the web browser, and the sociological foundations of programming languages.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com