Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Data science at UC Berkeley: 2,000 undergraduates, 50 majors, no command line

Gunjan Baid (UC Berkeley), Vinitra Swamy (UC Berkeley)
5:00pm–5:40pm Thursday, August 24, 2017
Usage and application
Location: Sutton Center/Sutton South Level: Non-technical
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Educators, data scientists, and anyone with an interest for seeing the applications of Jupyter notebooks in a classroom

What you'll learn

  • Explore how UC Berkeley's Data Science program uses Jupyter
  • Learn how you can apply these lessons in your own organization


Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Gunjan Baid and Vinitra Swamy explore UC Berkeley’s Data Science program, which has no math, computing, or statistics prerequisites and is designed to be accessible to students of all backgrounds. At the introductory level, the program consists of a fundamentals course that introduces students to concepts of computer programming and statistics, and there is a diverse set of connector courses that allow students to apply data science to their area of interest, such as geography, immunotherapy, or cognitive science.

Using Jupyter notebooks, students are able to get hands-on experience working with data without the burden of setting up and maintaining a development environment. The program has developed a tool that allows students to obtain notebooks and datasets for an assignment with one click, and autograding, user authentication, and submission are all done through Jupyter notebooks, enabling instructors to focus on real-world issues, such as racial profiling and California water usage, instead of the technical details surrounding the computing infrastructure. The effectiveness of this approach is shown by the numbers: over 2,000 students across 50 majors have taken the fundamentals course and the connector courses in the past four semesters.

Gunjan and Vinitra explain the program in more detail and expand upon the pedagogical challenges faced in scaling Jupyter notebooks for use in large courses. They conclude by discussing how the program’s vision can be applied more generally for teaching data science using Jupyter at other universities and institutions.

Photo of Gunjan Baid

Gunjan Baid

UC Berkeley

Gunjan Baid is a student at University of California, Berkeley. She completed her bachelor’s degree in computer science and biochemistry and is now pursuing a master’s degree in computer science with a research focus on computational biology. Gunjan is associated with the undergraduate Data Science education program, where as a student instructor, she worked with Jupyter notebooks in the classroom and now provides technical support for the program’s JupyterHub infrastructure.

Photo of Vinitra Swamy

Vinitra Swamy

UC Berkeley

Vinitra Swamy graduated two years early with a bachelor’s degree in computer science from the University of California, Berkeley, and is now working toward a master’s degree in computer science. Her research interests include data science, cloud computing environments, and natural language processing. Vinitra is head student instructor for Berkeley’s new Foundations of Data Science course, helping shape curriculum and educating thousands of students from diverse backgrounds. Her efforts in data science education were recently recognized with a Berkeley EECS award of excellence in teaching and leadership. Vinitra also leads a Jupyter development student research team within the Data Science Education program and assists with the technical deployment and use of JupyterHub infrastructure campus-wide.