The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.
In an effort to empower more students with data analysis skills and tools, the modules program seeks out existing courses in the course catalogue and offers instructors a day or two of relief by developing a short, one- to three-class period curriculum that engages directly with their course material. Is the class discussing social inequality? The program develops notebooks investigating the socioeconomic status index or mapping and correlating student collected data versus demographic census data. Reading medieval manuscripts in a literature course? How about some text analysis of Sir Gawain and the Green Knight? Learning about phonological properties of different world languages? Map and correlate these properties on a world map. Discussing political rhetorical strategies? What about trying sentiment analysis on a corpus of political campaign speeches? It’s easy to imagine countless scenarios where a course might benefit from a one- or two-day data-driven perspective.
Until recently, the main obstacle to this dream has been the startup costs of computing for students without a technical background. In addition to the learning curve associated with programming, even the process of installing Python and its dependencies for a particular analysis would easily take an entire class period. JupyterHub has an opportunity to fundamentally change traditional pedagogy beyond CS and data science courses. We’ve already seen its utility for full courses, workshops, and tutorials, but Berkeley has begun to realize its potential for seamless integration into the traditional classroom. With no startup cost, students must only click a link and be dropped right into the user-friendly Jupyter Notebook. After a short, targeted introduction to Python that only introduces relevant concepts to the task at hand, students learn more programming concepts from direct application to something they actually care about. More importantly, they are introduced to new perspectives about the phenomenon they are discussing in class. Researchers have long advocated for teaching concepts in a stimulating and relevant environment. JupyterHub allows us to get there in under five minutes.
Beyond the impressive numbers of students and courses served, the program is particularly proud of its success within the social sciences, arts, and humanities. Most students in these courses have no experience with programming and minimal (if any) experience with data. Moreover, the modules program directly addresses concerns of historically marginalized groups, particularly as they pertain to data science. For example, in a course studying stigma and prejudice, the modules program empowers students to statistically uncover implicit bias in our society. Modules have become a low-stakes opportunity for students to discover data-driven, inferential thinking by trying to answer a question that interests them.
Mariah Rogers is program coordinator for the Division of Data Sciences at UC Berkeley, where she led the effort to build up the Data Scholars program that provides specialized academic support for students from underrepresented and nontraditional backgrounds. Mariah has been working with faculty on campus to build up the academic advising program for the new data science major (announced late Spring 2018) and has also been comanaging the Data Science Modules program to facilitate the introduction of data science concepts in existing courses across the UC Berkeley campus. Mariah holds a degree in computer science from UC Berkeley.
Ronald (Ronnie) Walker is a senior at UC Berkeley, where he is studying economics. Ronnie has served as an undergraduate student instructor, connector course teaching assistant, and modules team lead within the university’s Data Science Education Program. As team lead, he worked with faculty in Linguistics, Information Science, Education, Cognitive Science, Legal Studies, Near Eastern Studies, and Economics to build short modules for their courses. Most recently, he has been busy helping departments integrate existing full courses with data science approaches.
Julian Kudszus was a curriculum developer and program coordinator for the Data Science Modules initiative at UC Berkeley, which brings data science lessons to thousands of the university’s students across a wide range of domains through the use of JupyterHub and Jupyter notebooks. He received the Outstanding Teaching and Leadership award for his work at the Division of Data Sciences. Julian holds a bachelor’s degree in computer science from UC Berkeley.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org