The Jupyter Notebook is an extremely popular tool in academia for teaching, exploratory analysis, and sharing code. While notebooks are increasingly popular among computational scientists, a very different academic community—the digital humanities—loves Jupyter notebooks as well. Matt Burton offers an overview of the digital humanities community, discusses a novel use of Jupyter notebooks to analyze computational research, and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.
The digital humanities is a growing community of scholars from humanities disciplines like English and history, whose research, teaching, and publications are infused with digital technology. Some digital humanists leverage computational and data-intensive methods to gain new understandings and perspectives on digitized historical records, such as the books stored in the Hathi Trust. Not only do new computational methods expand our understanding of literature and history; they also expand the very basis of how we know what we know.
The formal processes of scholarly publication, especially in the humanities, struggle to accommodate the increasingly multimodal outputs that computational- and data-intensive research produces. As the outputs of academic research become imbricated with code, data, and interpretive prose, how are peers, especially in the humanities, supposed to review computationally inflected work? Matt introduces defactoring, a technique that leverages the expressibility of Jupyter notebooks to computationally interrogate and peer-review the code that is part of digital humanities publications.
Literary historians Ted Underwood and Jordan Sellers use machine learning to analyze large corpora of historical texts. One of Underwood and Sellers’s projects, How Quickly Do Literary Standards Change?, uses logistic regression to draw out differences between reviewed and unreviewed poetry volumes published between 1820 and 1919. While Underwood and Sellers’s final analysis has been formally published in an academic journal, the authors graciously conducted their research openly (rare among humanities scholars), posting article preprints on figshare and—more importantly—sharing their code and data on GitHub. However, sharing data and code is only a first step. We need a technique for critically engaging the code and rigorously reviewing its role in publication.
Defactoring Pace of Change weaves a computational narrative that simultaneously annotates the code with expository prose and executes Underwood and Sellers’s original analysis. At the core of this effort is the Jupyter Notebook, which affords the blending of code, text, and data into a single documentary form.
Matt Burton is a visiting assistant professor at the School of Computing and Information at the University of Pittsburgh. His research interests include infrastructure studies, data science, and scholarly communication. Matt holds a PhD in information from the University of Michigan. His dissertation, Blogs as Infrastructure for Scholarly Communication, explored digital humanities blogging and the sociotechnical dynamics of web-centric publishing.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com