Pandas Interpolation Methods: 5 Techniques Benchmarked on Sensor Data

#pandas #timeseries #interpolation #missingdata

Linear Interpolation Silently Destroys Seasonality

Most tutorials tell you to call df.interpolate() and move on. That default linear method just cost me a 12% accuracy drop on a demand forecasting model — because linear interpolation flattens seasonal peaks into mush.

The data looked fine. No NaN warnings, shapes matched, the pipeline ran. But the downstream LSTM kept underperforming on validation. After hours of staring at loss curves, I plotted the actual vs. interpolated values for a single sensor. The weekend dips? Gone. The Monday morning spikes? Smoothed into gentle slopes. Linear interpolation had turned my sawtooth pattern into a lazy sine wave.

This post compares five interpolation methods on real-world sensor data with 15% missing values. I'll show you exactly what each method does to your data distribution — and which one preserved the statistical properties I actually needed.