Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Democratizing access to open data by providing open computational infrastructure

Yuvi Panda (Data Science Education Program (UC Berkeley))
5:00pm–5:40pm Friday, August 25, 2017
JupyterHub deployments
Location: Nassau Level: Non-technical
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Open data enthusiasts

What you'll learn

  • Understand why open data by itself is not enough—you need open computational infrastructures as well


“But the plans were on display . . .”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a torch.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice, didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying Beware of the Leopard.”

—Douglas Adams, The Hitchhiker’s Guide to the Galaxy

Merely releasing open data is not enough. To reach its full revolutionary potential, data needs to be equally usable in very powerful ways by people of various backgrounds—not just those with traditional computer science skills. Currently, operating on many datasets assumes you have access to computational infrastructure, knowledge, and time to set up and learn to use a programming environment and various tools like Git and GitHub to publish and share your work. This excludes those who are interested in the data itself but do not have the ability to work around all the accidental complexity that exists around code (installing things, dependency management, Git, sharing code with others, sharing results, etc.).

Yuvi Panda offers an overview of a volunteer-led open knowledge movement that makes all of its data available openly and explores the free, open, and public computational infrastructure recently set up for people to play with and build things on its data (using a JupyterHub deployment). This infrastructure has several additional features:

  • All notebooks are public to everyone at all times, as soon as they are created. These can be shared easily with a permanent linkable URL as well.
  • All the various kinds of provided data (including big dumps and live queryable databases) are made easily accessible from inside the notebook environment—so there’s no need to install new libraries or download anything.
  • Users can import other people’s notebooks, thus easily building on others’ explorations and tools.
  • This infrastructure is openly available to anyone who has an account and has not been banned for abuse. No gatekeeping exists, and the cost of playing around with it is very low.

These features have made this computational environment usable by a wide variety of people who traditionally do not consider themselves programmers and hence would have previously not even attempted to make use of this data. Combined with wide availability of Python tutorials, this has been amazingly useful, particularly to sections of our community that work in non-English or non-European languages. Yuvi showcases some of the wonderful things that people who consider themselves “just users” have programmed due to the democratizing effect of this piece of infrastructure and makes an aspirational case for why open computational infrastructure like this is just as important as open data—and how various institutions that provide open data should also provide open computational infrastructure to allow people to play with their data.

Photo of Yuvi Panda

Yuvi Panda

Data Science Education Program (UC Berkeley)

Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.