“But the plans were on display . . .”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a torch.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice, didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying Beware of the Leopard.”
—Douglas Adams, The Hitchhiker’s Guide to the Galaxy
Merely releasing open data is not enough. To reach its full revolutionary potential, data needs to be equally usable in very powerful ways by people of various backgrounds—not just those with traditional computer science skills. Currently, operating on many datasets assumes you have access to computational infrastructure, knowledge, and time to set up and learn to use a programming environment and various tools like Git and GitHub to publish and share your work. This excludes those who are interested in the data itself but do not have the ability to work around all the accidental complexity that exists around code (installing things, dependency management, Git, sharing code with others, sharing results, etc.).
Yuvi Panda offers an overview of a volunteer-led open knowledge movement that makes all of its data available openly and explores the free, open, and public computational infrastructure recently set up for people to play with and build things on its data (using a JupyterHub deployment). This infrastructure has several additional features:
These features have made this computational environment usable by a wide variety of people who traditionally do not consider themselves programmers and hence would have previously not even attempted to make use of this data. Combined with wide availability of Python tutorials, this has been amazingly useful, particularly to sections of our community that work in non-English or non-European languages. Yuvi showcases some of the wonderful things that people who consider themselves “just users” have programmed due to the democratizing effect of this piece of infrastructure and makes an aspirational case for why open computational infrastructure like this is just as important as open data—and how various institutions that provide open data should also provide open computational infrastructure to allow people to play with their data.
Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on mybinder.org as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org