I'm so glad to be back here to continue this journey.
Like my KD people will say, 2 days. I experienced a poor Internet connection these two days which made it difficult for me to show up so I took that opportunity to rest 🙈
Today's episode of improving my Data Science skills, I learned how to use the merge_asof() method to merge ordered or Time-series data.
It is similar to merge_ordered() left join but unlike an ordered left join, it matches the nearest value columns rather than equal matches.
Oya oya I know that grammar is too much. Let me break it down😌
Imagine you're watching a movie in order, scene by scene.
You shouldn't be allowed to see future scenes before they happen, right?
That’s exactly how time-series data works.
When training models, you must only give the model information available at that time, not data from the future. So merge_asof() helps us merge time-series data without “seeing the future”.
It only matches each row with the most recent past event, so our training models stay realistic and don’t cheat.
Let me give a trading example; imagine you have stock prices for Jan 1, Jan 5, and Jan 10… and you have GDP/news/events for Jan 1, Jan 7, and Jan 15. A normal merge won’t work because the dates don’t match. But merge_asof() finds the closest past event. So the stock price on Jan 10 gets matched with the data from Jan 7, not Jan 15, because the market didn’t know the Jan 15 info yet.
This prevents ‘future leakage'. Your model can only use information the market actually had at the time. No cheating. No unrealistic predictions.
-SP
Top comments (1)
Doing great SP