Handling data gaps in time series using imputation
Who is this presentation for?
- Early career data scientists handling time series signals from sensors or other real-world sources
Level
Description
Time series forecasting is everywhere. It tells you what tomorrow’s temperature will be, your company’s stock price on Friday, and your blood glucose levels before bed. Often these forecasts depend on sensors or measurements made out in the real, messy world. Those sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing data in your signals.
Alfred Whitehead and Clare Jeon explore a number of methods for handling data gaps and advise you on which to consider and when. You’ll see how to perform tests to determine which method suits your problem the best. And all of this is illustrated with real data from a continuous blood glucose monitor.
The methods they handle include random assignment, average-based imputation, last observed carried forward, linear interpolation, spline interpolation, moving average, Kalman smoothing with structural model, Kalman smoothing with auto-ARIMA model, Stineman interpolation, k-nearest neighbors, and seasonality with Prophet.
Prerequisite knowledge
- A basic understanding of statistics, load, and how to manipulate data in R or Python with pandas
What you'll learn
- Understand the variety of methods available to impute missing data and a sense of how to apply them effectively
Alfred Whitehead
Klick
Alfred Whitehead is the senior vice president of data science at Klick, where he’s responsible for the delivery of data science solutions and oversees a team of data scientists and AI researchers. He brings over 15 years of experience in data science, software development, and high-performance computing to the Klick team, combining his scientific background with an appreciation of the craft of code writing. Previously, he was an information security officer, technology vice president, and acting chief technology officer. He holds two master’s degrees in physical sciences, including thesis work in computational astrophysics, and is also a certified information systems security professional (CISSP).
clare jeon
Klick
Clare Jeon is a data scientist at Klick, where she focuses on identifying digital biomarkers for diagnosis, risk assessment of diseases, and prevention of health problems. She also explores the applications of machine learning to optimize clinic performance. Previously, she was involved in working on the systems biology of cancer and the development of the computational pipeline to identify key genomic and clinical signatures for cancer treatment. She holds a PhD in bioinformatics and systems biology.
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsors
Zettabyte Sponsors
Contributing Sponsors
Exabyte Sponsors
Content Sponsor
Impact Sponsors
Supporting Sponsor
Non Profit
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
strataconf@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
Thanks for attending everyone! Our slides and code can be found here: https://github.com/KlickInc/datasci-strata-talk-missing-data