DEV Community

vindianadoan
vindianadoan

Posted on

ARIMA Modelling: Theory

Introduction

ARIMA models, standing for AutoRegressive Integrated Moving Average, is a great model for predicting temporal time series data with a complex nature. Unlike basic regression models, which may struggle to capture complexity with just a few variables (usually due to autocorrelation, trend and seasonality), ARIMA excels in treating autocorrelation, trend and seasonality through a built-in differencing function (the "Integrated" part of ARIMA); thus, it removes the need for specifying explanatory variables and extra transformation. Examples of application include finance, weather predictions, and even anticipating website traffic.

Define the model

Let's build the ARIMA model from scratch. As discussed, the ARIMA model is made of three parts: an autoregressive part, an integrated part and a moving average part.

For example, given some time series

X=X1,X2,...X = {X_1, X_2, ...}

The autoRegressive (AR) component says that an observation (Xt)(X_t) at certain point in time tt , can be described as a linear combination of its lagged observations (ie. prior time points):
Xt=α1Xt1+α2Xt2...X_t = \alpha_1 X_{t-1} + \alpha_2 X_{t-2} ...

where αi\alpha_i are the parameters or coefficients of such regression.
(Note: if you find the terms linear combination and linear regression confusing, a linear regression is model that outputs predictions based on a linear combination of input features).
Using the lag operator ( LpXt=XtqL^pX_t = X_{t-q} ), you can also express the above as:
Xt=α1(Xt1)+α2(Xt2)...+αp(Xtp) X_t = \alpha_1 (X_{t-1}) + \alpha_2 (X_{t-2}) ... + \alpha_p (X_{t-p})

Xt=α1LXt+α2L2Xt...+αpLpXt X_t = \alpha_1 LX_{t} + \alpha_2 L^2X_{t} ... + \alpha_p L^pX_{t}

Xt=(i=1pαiLi)Xt(1) X_t = (\sum_{i=1}^{p} \alpha_i L^i)X_{t} \hspace{1cm} (1)

An optimisation of the AR part for the ARIMA model is an optimisation of the number of previous terms - or the number of time lags to be included in the linear combination. We call pp the order of the autoregressive model.
For example, when we say the AR part has an order of two, we symbolise it as AR(2)AR(2) , and express it as:
Xt=α1LXt+α2L2XtX_t = \alpha_1 LX_{t} + \alpha_2 L^2X_{t}

Now, equivalently, (Xt)(X_t) can also be expressed as a combination of previous error terms. This is described in the Moving Average part of the ARIMA model:

Xt=ϵt+θ1ϵt1+θ2ϵt2+...+θqϵtq X_t = \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + ... + \theta_q \epsilon_{t-q}
Xt=ϵt+θ1Lϵt+θ2L2ϵt+...+θqLqϵt X_t = \epsilon_t + \theta_1 L \epsilon_{t} + \theta_2 L^2\epsilon_{t} + ... + \theta_q L^q \epsilon_{t}
Xt=ϵt+(i=1qθqLq)ϵt(2) X_t = \epsilon_t + (\sum_{i=1}^{q} \theta_q L^q)\epsilon_{t} \hspace{1cm} (2)

where:
θi\theta_i are the parameters/ coefficients of the linear combination of the error terms
ϵt\epsilon_t are the error terms at time tt
qq is the order of this moving average.
Note that the concept of moving average here can be misleading, as moving average in this context simply refers to the moving window of the previous errors, it does not take an average of anything.

Since both (1) and (2) describe XtX_t , we can add them together:

Xt=(i=1pαiLi)Xt+ϵt+(i=1qθqLq)ϵt X_t = (\sum_{i=1}^{p} \alpha_i L^i)X_{t} + \epsilon_t + (\sum_{i=1}^{q} \theta_q L^q)\epsilon_{t}

This is often expressed as:

Xt(i=1pαiLi)Xt=ϵt+(i=1qθqLq)ϵt X_t - (\sum_{i=1}^{p} \alpha_i L^i)X_{t} = \epsilon_t + (\sum_{i=1}^{q} \theta_q L^q)\epsilon_{t}
(1i=1pαiLi)Xt=(1+i=1qθqLq)ϵt (1 - \sum_{i=1}^{p} \alpha_i L^i)X_{t} = (1 + \sum_{i=1}^{q} \theta_q L^q)\epsilon_{t}

We say that, given time series data XtX_t where tt is an integer index and the XtX_t are real numbers, an ARIMA(p,q)ARIMA(p,q) model is given by:

(1i=1pαiLi)Xt=(1+i=1qθqLq)ϵt (1 - \sum_{i=1}^{p} \alpha_i L^i)X_{t} = (1 + \sum_{i=1}^{q} \theta_q L^q)\epsilon_{t}

Why do we need both an AR component and an MA component?

Since the AR(p) component is a combination of the past 'p' values, and the values of the series become more and more dependent on its past values when moving along the time series. Hence, it captures the trend and autocorrelation in the series.
Meanwhile, the MA(q) part is a combination of the past 'q' forecast errors, so it captures the "shock" (unexpected changes) to the model instead. The impact of a shock in the series decreases over time (called "shock decay"). Thus, the effect of an error made at a particular point in time diminishes as we move further away from that point.
In short, the AR(p) part describes the overall and long term changes, whereas the MA(q) describes the short-term changes. Together, they allow the ARIMA model to capture a wide range of time series patterns.

Top comments (0)