Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Data analysis and machine learning in Jupyter

Andreas Mueller (Columbia University)
9:00am–12:30pm Wednesday, August 23, 2017
Usage and application
Location: Concourse A Level: Beginner
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Data scientists and business analysts

Prerequisite knowledge

  • Familiarity with pandas and matplotlib (useful but not required)

Materials or downloads needed in advance

  • A machine with Python 2.7 or Python 3.4+ (recommended) and the Anaconda distribution (recommended) installed (Please run the provided "check_env.ipynb" notebook to ensure your environment meets the requirements for the tutorial.)
  • The course materials downloaded from the course GitHub repository, which also provides detailed explanations on setting up your environment (Make sure you have the newest version of the tutorial material before attending the tutorial.)

What you'll learn

  • Learn how to effectively use Jupyter with visualization and analysis tools


Andreas Müller walks you through a variety of real-world datasets using Jupyter notebooks together with the data analysis packages pandas, seaborn, Matplotlib, and scikit-learn. You’ll perform an initial assessment of data, deal with different data types, visualization, and preprocessing, and build predictive models for tasks such as health care and housing. The goal of this tutorial is to make you comfortable using Jupyter to do interactive data analysis and exploration—in particular making use of the very immediate feedback that Jupyter provides.

Photo of Andreas Mueller

Andreas Mueller

Columbia University

Andreas Müller is a lecturer at the Data Science Institute at Columbia University and author of Introduction to Machine Learning with Python (O’Reilly), which describes a practical approach to machine learning with Python and scikit-learn. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science, and democratize the access to high-quality machine learning algorithms. Andreas is one of the core developers of the scikit-learn machine learning library and has been comaintaining it for several years. He is also a Software Carpentry instructor. Previously, he worked at the NYU Center for Data Science on open source and open science and as a machine learning scientist at Amazon.

Comments on this page are now closed.


Picture of Michael Caudy
08/14/2017 4:11pm EDT

The link to the github repo is broken