DEV Community

Muinde Esther Ndunge
Muinde Esther Ndunge

Posted on

The Complete Guide to Time Series Models

Table of contents

  1. Introduction
  2. Understanding Time Series Data
  3. Components of Time Series
  4. Methods to Check Stationarity
  5. Converting Non-Stationary Into Stationary
  6. Time Series Models
  7. Python Libraries for Time Series Analysis
  8. Conclusion

1. Introduction

A Time series is a collection of observations made sequentially in time.It is an arrangement of statistical data in accordance with their occurrences in time. Time series models are statistical models used to analyze and forecast the data. The models are widely employed in various domains, including finance, economics, climate science, and more. This guide provides an overview of time series modelling and its various components.

2. Understanding Time Series Data

Time series data is a sequence of observations collected at regular time intervals. It can be univariate(one variable) or multivariate(multiple variables). There is only one assumption in TSA, which is "stationary", which means that the origin of time does not affect the properties of the process under the statistical factor. Understanding the characteristics of time series data is crucial for model selection.
Data can be Stationary which should not have trend, seasonality, cyclical and irregularity time series components.

  • The mean should be completely constant
  • The variance should be constant Data can also be Non_Stationary that is either the mean-variance or covariance is changing with respect to time.

3. Components of Time Series

Time series data consists of the following components:

  • Trend:
    This is the general tendency of data to grow or decline over a long period of time that is the long-term or downward movement in data.

  • Seasonality:
    Seasonality is characterized by repetitive patterns or cycles at fixed intervals. It occurs due to rhythmic forces which occur in a regular & periodic manner.

  • Cyclical Variations:
    These are movements in a time series that are not attributed to a regular movement. There is no fixed interval, uncertainty in movement and its pattern.

  • Irregular Variations:
    These are unexpected situations/events/scenarios and spikes in a short time span.

4. Methods to Check Stationarity

When preparing data for TSA model, it is important to assess whether the dataset is stationary or not. This is done using statistical tests which include:
Augmented Dickey-Fuller(ADF) Test:

It is done with the following assumptions:

  • H0: Series is non-stationary
  • HA: Series is stationary
    • p-value > 0.05 Fail to reject(H0)
    • p-value <= 0.05 Reject (H0)

Kwiatkowski-Philips-Schmidt-Shin(KPSS) Test:

It is used to test for a Null Hypothesis that will perceive the time series as stationary around a deterministic
trend against the alternative of a unit root.

5. Converting Non-Stationary Into Stationary

There are three methods available for this conversion.

Detrending

This involves removing the trend effects from the given data and showing only the differences in values from the trend.
It only allows cyclical patterns to be identified.

Differencing

This transforms the series into a new series, which we use to remove the series dependence on time and stabilize the mean of the time series. Trend ans seasonality are reduced during this transformation.

  • Yt = Yt - Yt-1
  • Yt=Value with time

Transformation
This includes three different methods which are Power Transform, Square Root and Log Transfer. The most commonly used one is Log Transfer.

6. Time Series Models

There are several time series models available, each designed to capture different aspects of the data. Here are some common types:

Moving Average(MA) Model

This is the commonly used time series model. It is slick with random short-term variations. Relatively associated with the components of time series. It is represented as MA(q), where q is the order of the moving average.

The MA is calculated by taking average data of the time-series within k periods
There are three types of moving averages:

  • Simple Moving Average (SMA)
  • Cumulative Moving Average(CMA)
  • Exponential Moving Average(EMA)

Simple Moving Average (SMA)
SMA calculated the under weighted mean of the previous M or N points. The sliding window data points selection is based on the amount of smoothing.

import pandas as pd
import matplotlib.pyplot as plt

# Sample time series data
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
        'Value': [10, 15, 20, 18, 22]}

df = pd.DataFrame(data)

# Calculate SMA with a window size of 3
window_size = 3
df['SMA'] = df['Value'].rolling(window=window_size).mean()

# Plotting the time series data and SMA
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Value'], label='Original Data', marker='o')
plt.plot(df['Date'], df['SMA'], label=f'SMA ({window_size}-period)', linestyle='--')

plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Simple Moving Average (SMA)')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()

plt.show()

Enter fullscreen mode Exit fullscreen mode

Cumulative Moving Average(CMA)
CMA considers all data points up to a certain period, calculating the average cumulatively
Here's an example

import pandas as pd
import matplotlib.pyplot as plt

# Sample time series data
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
        'Value': [10, 15, 20, 18, 22]}

df = pd.DataFrame(data)

# Calculate CMA
df['CMA'] = df['Value'].expanding().mean()

# Plotting the time series data and CMA
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Value'], label='Original Data', marker='o')
plt.plot(df['Date'], df['CMA'], label='CMA', linestyle='--')

plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Cumulative Moving Average (CMA)')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()

plt.show()

Enter fullscreen mode Exit fullscreen mode

Exponential Moving Average
EMA give more weight to recent data points. It is used to mainly identify trends and filter out noise. The weight of elements is decreased gradually over time.

When dealing with TSA in Data Science and Machine learning, we use models like Autoregressive-Moving-Average(ARMA) models with [p,d, and q]

  • p == autoregressive lags
  • q == moving average lags
  • d == difference in the order

Before we dive deeper into these models let's understand the terms below:

Auto-Correlation Function(ACF)

ACF measures the linear relationship between a time series and its lagged values.It indicates how similar a value is within a given time series.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

# Sample time series data
data = np.random.rand(100)

# Create a pandas DataFrame
df = pd.DataFrame({'Value': data})

# Calculate and plot ACF
plot_acf(df['Value'], lags=20)
plt.title('AutoCorrelation Function (ACF)')
plt.xlabel('Lag')
plt.ylabel('ACF')
plt.show()

Enter fullscreen mode Exit fullscreen mode

Partial AutoCorrelation Function(PACF)

PACF measures the direct relationship between a time series and its lagged values while removing the influence of the intermediate lags.
It basically shows the correlation of the sequence with itself with some number of time units per sequence order where only direct effect has been shown.

from statsmodels.graphics.tsaplots import plot_pacf

# Calculate and plot PACF
plot_pacf(df['Value'], lags=20)
plt.title('Partial AutoCorrelation Function (PACF)')
plt.xlabel('Lag')
plt.ylabel('PACF')
plt.show()

Enter fullscreen mode Exit fullscreen mode
  • If the ACF plot declines gradually and the PACF drops instantly, Auto Regressive Model will be the perfect machine learning model in this case

  • If the ACF plot drops instantly and the PACF decline gradually, a Moving Average model will be a perfect ML-model

  • If both ACF and PACF plot decline gradually, then an ARMA model will be used.

  • If both drop significantly, no model is used.

Auto-Regressive Model

This is a simple model that uses linear regression to predict the value of a variable based on its past values. It is mainly used for forecasting when there is some correlation between values in a given time series.

Mathematical Representation:
The AR(1) model can be expressed as:

Xt=ϕ1⋅Xt−1+ϵtXt​

Where:

  • Xt is the value at time t.
  • ϕ1​ is the auto regressive coefficient.
  • Xt−1​ is the value at time t−1.
  • ϵt is white noise or the error term.

Autoregressive Integrated Moving (ARMA AND ARIMA) Models

ARMA is a combination of Auto-Regressive and Moving Average Models. This model provides a weakly stationary stochastic process in terms of two polynomials. It captures both temporal patterns in a time series data.
ARMA is specified by two orders p for auto regressive lags and q for moving average components.

  • The AR(p) component captures the linear relationship with past values.
  • The MA(q) component accounts for the influence of past white noise or error terms.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_model import ARMA

# Sample time series data
data = np.random.randn(100)  # Random data for illustration

# Create a pandas DataFrame
df = pd.DataFrame({'Value': data})

# Fit an AR(2) model
model_ar = ARMA(df['Value'], order=(2, 0))
results_ar = model_ar.fit()

# Fit an ARMA(2, 1) model
model_arma = ARMA(df['Value'], order=(2, 1))
results_arma = model_arma.fit()

# Print model summaries
print("AR Model Summary:")
print(results_ar.summary())
print("\nARMA Model Summary:")
print(results_arma.summary())

# Plot the original data and model predictions
plt.figure(figsize=(10, 6))
plt.plot(df['Value'], label='Original Data')
plt.plot(results_ar.fittedvalues, label='AR(2) Predictions', linestyle='--')
plt.plot(results_arma.fittedvalues, label='ARMA(2,1) Predictions', linestyle='--')

plt.xlabel('Time')
plt.ylabel('Value')
plt.title('AR and ARMA Model Predictions')
plt.legend()
plt.grid(True)
plt.show()

Enter fullscreen mode Exit fullscreen mode

ARMA is best for stationary series thus ARIMA was developed to suport both stationary and non-stationary series.

  • AR ==> Uses past values to predict the future.
  • MA ==> Uses past error terms in the given series to predict the future.
  • I==> Uses the differencing of observation and makes the stationary data.

Python Libraries for Time Series Analysis

To implement time series models in Python, you can use libraries like:

Conclusion

Time series models are powerful tools for analyzing and forecasting time-ordered data. Selecting the right model and understanding the components of the data, are critical for accurate predictions. With the appropriate model and evaluation techniques, you can make informed decisions based on historical data trends and patterns.

Top comments (0)