As you know a few things about correlations, hooman thinks this is a good time for you to learn some more complicated things. Hooman has picked a file where they have noted how many cats have bought potato chips on different days of February and wants you to find out if there is anything interesting. The data looks like this:
What did you find out by looking at these numbers? The first thing you notice is, the numbers are noted on different days of the month, from February 1 to February 28. The sales are recorded in one day interval, for all 28 days. Suddenly you have a feeling that, could this be a time series...?
Yes, this indeed is a time series dataset. The data is sorted using time intervals, and they are kinda related to time. How? Hooman now plots the data to graph. You and hooman immediately find some patterns that might have relationship with time.
Hooman tells you that he has found three interesting pieces of insights from this graph that might have some significance. He points out that:
Every Thursday fewer cats buy potato chips for some unknown reason, and on Friday cats seem to buy more potato chips. This pattern occured on every week of February. Looks like a cycle of ups and downs of sales of chips is going on.
As the events are occurring in cyclic order, if you compare all 4 of the fridays, you can see the sales of chips on the second friday is higher than sales on the first Friday, the sales of chips on the third Friday is higher than sales on the second Friday and so on. Same pattern is seen for Thursdays and Saturdays and almost all the other days. So although the sales have ups and downs on different days, the overall sales is kinda increasing.
There are a few days that do not match with the weekly pattern. For example, the sales of chips on the last Sunday of February seems quite different from the other Sundays for some reasons.
This makes you curious. Are these normal? How are you going to explain these events? This is too complex to handle! Hooman now asks you, what if you could find the exact pattern of the ups and downs of the sales? You realize that you could have found the variations hooman mentioned in the third point if you knew an ‘ideal’ pattern. Yes, you’re right. Hooman says he calls the ‘ideal’ pattern of this cyclic ups and downs SEASONALITY. The gradual increase you have found inspite of ups and downs (in point 2) is called TREND (hooman says this can be a decrease of values too). And the variation of your data from the ‘ideal’ pattern is called RESIDUALS.
Now how would the seasonality of your data look? Hooman shows the pattern of ups and downs of the sales data:
Now you can clearly see the ideal pattern. But wait… you think you found an overall increasing trend earlier, but this curve is going up and down within the same level. Where did the trend go?
Oh wow! Hooman has finally shown you the general direction of change of your sales record. Here you can see, some of the points of your sales data are far from your trend, and seasonality too. After comparing your data with both trend and seasonality, the left out points you find are the residuals. Hooman now shows you a comparison:
So there are a number of points that are quite far away from the ideal values. Now, why do your data have a pattern? Or variation from the pattern? There must be an explanation, right? Now you start thinking what usually happens on Thursday or Friday. Well, on Purrsday, I mean Thursday you usually are tired. Being a cat is not an easy job. As hooman is not going outside and working from home for this stupid pandemic, you too are working very hard with him on weekdays, keeping his lap warm. So you do not go outside and keep having chips from your stock. You refill your stock on Friday because you usually have fun with your hooman friend on the weekend, Saturday and Sunday, and eat chips frequently. So… It makes sense, doesn’t it? Your, and most of your furry friends’ behaviour matches with this pattern.
Then why is the data showing variations sometimes? You now want to have a closer look at the residuals to remember what actually happened.
Now you remember! On the second Wednesday there was a football match and you all stocked chips to eat while watching. On the last Sunday of the month, there was a huge thunderstorm and you cats do not like to get wet, and therefore almost everyone of you stayed home. It matches purrfectly!
As now you have learnt about all these troublesome components, you start thinking why would you even need to know this? Hooman now says that a time series with all these components are hard to analyze. He calls this a NON STATIONARY time series. When the components are separated, it turns into a STATIONARY time series. As the components are separated in a stationary dataset, it becomes easier to analyze.
Now the big question. How do you do this separation? Hooman prefers Python libraries in this case too. He shows you an example:
from statsmodels.tsa.seasonal import seasonal_decompose data = [10,12,9,12,6,5,16,12,15,11,13,15,5,18,11,17,12,14,8,8,21,13,5,13,15,11,12,23] nresult=seasonal_decompose(data, model='additive', freq=4) nresult.plot() plt.show()
The output of this code looks like this:
Now you know how to process your data if you have time series. You will be able to forecast your demand using these components. Hooman wants you check out the maths behind these libraries as homework and promises you to explain how to do that the next day.