Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Exit the data cathedral; enter the data bazaar

Who is this presentation for?

  • Data scientists and analysts, CTOs, engineers, researchers, and data archivers

Prerequisite knowledge

  • Familiarity with common pain points found at various stages of finding, transferring, manipulating, storing, and publishing datasets, working with (“munging”) datasets, and open source software principles (useful but not required)

What you'll learn

  • Learn how the principles of open source software can, and should, be applied to open data; how content addressing works; how the distributed web can benefit users’ work with datasets; how versioning relates to datasets; and the power of immutability

Description

I believed that the most important software. . .needed to be built like cathedrals, carefully crafted by individual wizards or small bands of mages working in splendid isolation, with no beta released before it’s time. Linus Torvalds’s style of development—release early and often, delegate everything you can, be open to the point of promiscuity—came as a surprise. No quiet, reverent cathedral-building here—rather, the Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches. . .out of which a coherent and stable system could seemingly emerge only by a succession of miracles.Eric S. Reymond, "The Cathedral & the Bazaar"

Today’s Balkanized “data cathedrals” force us to extract, transform, and load data for before use, leaving us without a way to use data we don’t control. Join in to learn why this approach should be replaced by the “data bazaar,” allowing us to freely compose and build upon each other’s data much the way we do with software today—using Jupyter as a key tool. You’ll explore how this cathedral/bazaar metaphor maps to open data as it exists today and learn why we’re all currently living in a world dominated by data cathedrals. You’ll then dive into the concept of a data bazaar, discovering what characteristics it should have, what would be possible if one were to exist, and what’s holding back the data bazaar from becoming a reality (the web itself). Content addressing and the distributed web provide the missing tools necessary to construct a data bazaar, allowing us to share and depend on each other’s data in ways that were not previously possible. You’ll see how this data bazaar integrates with the Jupyter Notebook and the possibilities this creates and find out how to participate in a growing data commons from the comfort of your existing tools.