Handling data gaps in time series using imputation
Who is this presentation for?Early career data scientists handling time-series signals coming from sensors or other real-world sources.
Time series forecasting is everywhere. What will tomorrow’s temperature be? How about my company’s stock price on Friday? My blood glucose levels tonight before bed? Often these forecasts depend on sensors or measurements made out in the real, messy world. Those sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing data in our signals.
In this talk, we’ll show a number of methods for handling data gaps and give advice on which to consider and when. We’ll also show how to perform tests to determine which method suits your problem the best. All of this will be illustrated with real data from a continuous blood glucose monitor.
Methods handled will include:
- Random assignment
- Average-based imputation
- Last observed carried forward
- Linear interpolation
- Spline interpolation
- Moving average
- Kalman smoothing with structural model
- Kalman smoothing with auto-ARIMA model
- Stineman interpolation
- k-Nearest Neighbours
- Seasonality with Prophet
Prerequisite knowledgeBasic statistics, load and manipulate data in R or Python with Pandas.
What you'll learn
Alf is responsible for the delivery of data science solutions at Klick Health, where he oversees a team of data scientists and AI researchers. He brings over 15 years of experience in data science, software development, and high-performance computing to the Klick team, combining his scientific background with an appreciation of the craft of code-writing. He has previously served as an information security officer, technology VP, and acting Chief Technology Officer. He holds two Masters degrees in the physical sciences, including thesis work in computational astrophysics, and is also a Certified Information Systems Security Professional (CISSP).
Clare is a data scientist at Klick Health, where she focuses on identifying digital biomarkers for diagnosis, risk assessment of diseases and prevention of health problems. Also she is exploring the applications of machine learning to optimize clinic performance. She was previously involved in working on the systems biology of cancer and the development of computational pipeline to identify key genomic and clinical signatures for cancer treatment. She holds a Ph.D degree in bioinformatics and systems biology.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts