Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

JupyterHub for domain-focused integrated learning modules

Mariah Rogers (UC Berkeley Division of Data Sciences), Julian Kudszus (UC Berkeley Division of Data Sciences)
11:55am–12:35pm Thursday, August 23, 2018
JupyterHub deployments, Training and education, Usage and application
Location: Beekman/Sutton North Level: Non-technical

Who is this presentation for?

  • Educators and system administrators

What you'll learn

  • Explore the Data Science Modules program at UC Berkeley, which uses Jupyter notebooks to give students hands-on experience with a dataset relevant to their course


The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

In an effort to empower more students with data analysis skills and tools, the modules program seeks out existing courses in the course catalogue and offers instructors a day or two of relief by developing a short, one- to three-class period curriculum that engages directly with their course material. Is the class discussing social inequality? The program develops notebooks investigating the socioeconomic status index or mapping and correlating student collected data versus demographic census data. Reading medieval manuscripts in a literature course? How about some text analysis of Sir Gawain and the Green Knight? Learning about phonological properties of different world languages? Map and correlate these properties on a world map. Discussing political rhetorical strategies? What about trying sentiment analysis on a corpus of political campaign speeches? It’s easy to imagine countless scenarios where a course might benefit from a one- or two-day data-driven perspective.

Until recently, the main obstacle to this dream has been the startup costs of computing for students without a technical background. In addition to the learning curve associated with programming, even the process of installing Python and its dependencies for a particular analysis would easily take an entire class period. JupyterHub has an opportunity to fundamentally change traditional pedagogy beyond CS and data science courses. We’ve already seen its utility for full courses, workshops, and tutorials, but Berkeley has begun to realize its potential for seamless integration into the traditional classroom. With no startup cost, students must only click a link and be dropped right into the user-friendly Jupyter Notebook. After a short, targeted introduction to Python that only introduces relevant concepts to the task at hand, students learn more programming concepts from direct application to something they actually care about. More importantly, they are introduced to new perspectives about the phenomenon they are discussing in class. Researchers have long advocated for teaching concepts in a stimulating and relevant environment. JupyterHub allows us to get there in under five minutes.

Beyond the impressive numbers of students and courses served, the program is particularly proud of its success within the social sciences, arts, and humanities. Most students in these courses have no experience with programming and minimal (if any) experience with data. Moreover, the modules program directly addresses concerns of historically marginalized groups, particularly as they pertain to data science. For example, in a course studying stigma and prejudice, the modules program empowers students to statistically uncover implicit bias in our society. Modules have become a low-stakes opportunity for students to discover data-driven, inferential thinking by trying to answer a question that interests them.

Photo of Mariah Rogers

Mariah Rogers

UC Berkeley Division of Data Sciences

Mariah Rogers is program coordinator for the Division of Data Sciences at UC Berkeley, where she led the effort to build up the Data Scholars program that provides specialized academic support for students from underrepresented and nontraditional backgrounds. Mariah has been working with faculty on campus to build up the academic advising program for the new data science major (announced late Spring 2018) and has also been comanaging the Data Science Modules program to facilitate the introduction of data science concepts in existing courses across the UC Berkeley campus. Mariah holds a degree in computer science from UC Berkeley.

Photo of Julian Kudszus

Julian Kudszus

UC Berkeley Division of Data Sciences

Julian Kudszus was a curriculum developer and program coordinator for the Data Science Modules initiative at UC Berkeley, which brings data science lessons to thousands of the university’s students across a wide range of domains through the use of JupyterHub and Jupyter notebooks. He received the Outstanding Teaching and Leadership award for his work at the Division of Data Sciences. Julian holds a bachelor’s degree in computer science from UC Berkeley.