Brought to you by NumFOCUS Foundation and O’Reilly Media

The official Jupyter Conference

Aug 21-22, 2018: Training

Aug 22-24, 2018: Tutorials & Conference

New York, NY

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks

Matt Brems (General Assembly)

3:30pm–5:00pm Wednesday, August 22, 2018

Reproducible research and open science, Training and education, Usage and application
Location: Murray Hill B Level: Beginner

Average rating:

(5.00, 2 ratings)

Download slides (ZIP)

Who is this presentation for?

Academics, practitioners, aspiring data scientists, and casual enthusiasts

Prerequisite knowledge

Familiarity with linear regression, logistic regression, standard deviation and variance, confidence intervals, and histograms and scatterplots
A working knowledge of Jupyter notebooks and Python programming (statsmodels, NumPy, etc.)

Materials or downloads needed in advance

A laptop with Jupyter installed (recommended but not required)
The course GitHub repository downloaded (link TBD)

What you'll learn

Understand how to visualize and handle missing data
Learn the types of missing data, how to identify them, and how to attempt to fix each
Learn how to implement reweighting and imputation methods in Jupyter notebooks

Description

If you work with data, you’ve almost certainly encountered missing data. The most common approaches are to either ignore or drop anything that’s missing, but this can lead to really bad results.

Matt Brems identifies the three types of missing data, explains how bad dropping or ignoring missing data can be, and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data. Matt focuses on the following techniques: no imputation, deductive imputation, mean, median, and mode imputation, regression imputation, stochastic imputation, and multiply stochastic imputation. You’ll come away with a solid, intuitive understanding of how to handle missing data, practical tips for implementing these techniques, and recommendations for integrating them with your or your company’s workflow.

Matt Brems

General Assembly

Matt currently leads instruction for General Assembly’s Data Science Immersive in Washington, DC, where he helps bridge the gap between theoretical statistics and real-world insights. Matt is passionate about making data science more accessible and putting the revolutionary power of machine learning into the hands of as many people as possible. A recovering politico, Matt was a data scientist for a political consulting firm through the 2016 election. He holds a master’s degree in statistics from the Ohio State University. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, or cuddling with his pug.

Website

Presented by

Strategic Sponsors

Premier Exhibitors

Supporting Sponsor

Diversity and Inclusion Sponsor

Innovator

Non-Profit Exhibitor

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email jupytersponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with JupyterCon, email partners@oreilly.com

Contact Us

View a complete list of JupyterCon contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com