If you work with data, you’ve almost certainly encountered missing data. The most common approaches are to either ignore or drop anything that’s missing, but this can lead to really bad results.
Matt Brems identifies the three types of missing data, explains how bad dropping or ignoring missing data can be, and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data. Matt focuses on the following techniques: no imputation, deductive imputation, mean, median, and mode imputation, regression imputation, stochastic imputation, and multiply stochastic imputation. You’ll come away with a solid, intuitive understanding of how to handle missing data, practical tips for implementing these techniques, and recommendations for integrating them with your or your company’s workflow.
Matt currently leads instruction for General Assembly’s Data Science Immersive in Washington, DC, where he helps bridge the gap between theoretical statistics and real-world insights. Matt is passionate about making data science more accessible and putting the revolutionary power of machine learning into the hands of as many people as possible. A recovering politico, Matt was a data scientist for a political consulting firm through the 2016 election. He holds a master’s degree in statistics from the Ohio State University. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, or cuddling with his pug.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com