Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Defactoring pace of change: Reviewing computational research in the digital humanities

Matt Burton (University of Pittsburgh)
2:40pm–3:20pm Friday, August 25, 2017
Reproducible research and open science
Location: Nassau Level: Non-technical
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Academics and anyone interested in the transformative potential of Jupyter for publishing

What you'll learn

  • Discover the digital humanities and the role Jupyter plays
  • Learn a technique for peer-reviewing computational research in the humanities and the broader implications for computational narratives on scholarly publishing


The Jupyter Notebook is an extremely popular tool in academia for teaching, exploratory analysis, and sharing code. While notebooks are increasingly popular among computational scientists, a very different academic community—the digital humanities—loves Jupyter notebooks as well. Matt Burton offers an overview of the digital humanities community, discusses a novel use of Jupyter notebooks to analyze computational research, and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.

The digital humanities is a growing community of scholars from humanities disciplines like English and history, whose research, teaching, and publications are infused with digital technology. Some digital humanists leverage computational and data-intensive methods to gain new understandings and perspectives on digitized historical records, such as the books stored in the Hathi Trust. Not only do new computational methods expand our understanding of literature and history; they also expand the very basis of how we know what we know.

The formal processes of scholarly publication, especially in the humanities, struggle to accommodate the increasingly multimodal outputs that computational- and data-intensive research produces. As the outputs of academic research become imbricated with code, data, and interpretive prose, how are peers, especially in the humanities, supposed to review computationally inflected work? Matt introduces defactoring, a technique that leverages the expressibility of Jupyter notebooks to computationally interrogate and peer-review the code that is part of digital humanities publications.

Literary historians Ted Underwood and Jordan Sellers use machine learning to analyze large corpora of historical texts. One of Underwood and Sellers’s projects, How Quickly Do Literary Standards Change?, uses logistic regression to draw out differences between reviewed and unreviewed poetry volumes published between 1820 and 1919. While Underwood and Sellers’s final analysis has been formally published in an academic journal, the authors graciously conducted their research openly (rare among humanities scholars), posting article preprints on figshare and—more importantly—sharing their code and data on GitHub. However, sharing data and code is only a first step. We need a technique for critically engaging the code and rigorously reviewing its role in publication.

Defactoring Pace of Change weaves a computational narrative that simultaneously annotates the code with expository prose and executes Underwood and Sellers’s original analysis. At the core of this effort is the Jupyter Notebook, which affords the blending of code, text, and data into a single documentary form.

Photo of Matt Burton

Matt Burton

University of Pittsburgh

Matt Burton is a visiting assistant professor at the School of Computing and Information at the University of Pittsburgh. His research interests include infrastructure studies, data science, and scholarly communication. Matt holds a PhD in information from the University of Michigan. His dissertation, Blogs as Infrastructure for Scholarly Communication, explored digital humanities blogging and the sociotechnical dynamics of web-centric publishing.