Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Scaling collaborative data science with Globus and Jupyter

Ian Foster (Argonne National Laboratory | University of Chicago)
11:05am–11:45am Thursday, August 23, 2018
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Developers (particularly those building data portals and distributed, collaborative solutions)

Prerequisite knowledge

  • Familiarity with Jupyter notebooks
  • A basic understanding of security mechanisms (OAuth 2, OpenID Connect, etc.) (useful but not required)

What you'll learn

  • Learn how to integrate advanced data management capabilities into your notebooks and how to leverage robust authentication and authorization mechanisms for accessing and sharing large and distributed data

Description

Jupyter is rapidly becoming the platform of choice for interactive data science in academic and commercial labs alike. While existing data interfaces are sufficient for modest datasets, users often struggle when they need to deal with the large and increasingly distributed data generated by modern science. The Globus team at the University of Chicago develops and operates software as a service for data management that is used by over 75,000 researchers worldwide. The Globus platform provides high-speed, reliable file transfer, sharing, and data publication as well as a federated identity infrastructure that facilitates collaboration across diverse security domains and organizational boundaries, with all services accessible via browser, command line, and REST APIs.

Ian Foster explains how to use Globus and Jupyter to seamlessly access notebooks using existing institutional credentials, connect notebooks with data residing on disparate storage systems (including GPFS, Lustre, Amazon S3, and Google Drive), and make data securely available to business partners and research collaborators. Ian demonstrates the existing integration and shares plans for expanding the joint solution to utilize JupyterLab and other Globus capabilities that further advance data-driven collaboration at scale.

Photo of Ian Foster

Ian Foster

Argonne National Laboratory | University of Chicago

Ian Foster is a senior scientist, distinguished fellow, and director of the Data Science and Learning Division at Argonne National Laboratory as well as the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago and a fellow of the Institute for Molecular Engineering. A computer scientist whose work at the intersection of computing and the sciences has produced both practical technologies that have seen wide adoption and concepts and methods that have proven influential in research and education, Ian is also chief troublemaker at Globus. His research interests span a range of topics in parallel, distributed, and data-intensive computing. A unifying theme is a desire to use the power of rapid communication to accelerate discovery, whether by linking people with remote computers and data, accelerating complex computational processes, or enabling distributed virtual teams. Ian pursues use-inspired basic research, meaning that he employs challenging practical problems to motivate and focus work on hard problems in computer science. Over the years, these practical problems have come from such fields as environmental science, economics, high-energy physics, biomedicine, and engineering. He often builds sophisticated artifacts (i.e., software and distributed systems) in order to apply, evaluate, and disseminate new concepts and methods. Ian’s work frequently involves large teams of disciplinary scholars, computer scientists, and software engineers. Ian has received multiple awards for his work, including the IEEE TCSC Award for Excellence in Scalable Computing (2014), the Inaugural ACM HPDC Lifetime Achievement Award (2012), and the IEEE Tsutomu Kanai Award (2011).