DEV Community

parmarjatin4911@gmail.com
parmarjatin4911@gmail.com

Posted on

Traffic Forecast using Python | Time Series Analysis | FbProphet | Machine Learning Tutorial

Traffic Forecast using Python | Time Series Analysis | FbProphet | Machine Learning Tutorial

Traffic forecast prediction is a task of predicting traffic volumes, utilizing historical speed and volume data with the help of Time Series Analysis in python. Traffic forecasting is an integral part of the process of designing of road facilities, starting from investment feasibility study to developing of working documentation. You can also apply the Time Series Analysis for stock market, product sales, item demands, etc.
traffic forecast time series analysis using python
Traffic Forecast - Time Series Analysis

In this tutorial, we will load and analyze the data set from a transport company, preprocess the data and apply a prediction model to forecast the traffic and visualize through graphs.

Dataset Information

Investors are considering making an investment in a new form of transportation - JetRail. JetRail uses Jet propulsion technology to run rails and move people at a high speed! While Jet-Rail has mastered the technology and they hold the patent for their product, the investment would only make sense, if they can get more than 1 Million monthly users within the next 18 months.

You need to help Unicorn ventures with the decision. They usually invest in B2C start-ups less than 4 years old looking for pre-series A funding. In order to help Unicorn Ventures in their decision, you need to forecast the traffic on JetRail for the next 7 months.

Download the Dataset Data

Import Modules

Let us import all the basic modules we will be needing for this project.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from fbprophet import Prophet

pandas - used to perform data manipulation and analysis

numpy - used to perform a wide variety of mathematical operations on arrays

matplotlib - used for data visualization and graphical plotting

%matplotlib - to enable the inline plotting.

warnings - to manipulate warnings details

filterwarnings('ignore') is to ignore the warnings thrown by the modules (gives clean results)

Prophet - Module to use the Time Series Analysis API

You must install fbprophet in order to import correctly
Enter fullscreen mode Exit fullscreen mode

Loading the Dataset

df = pd.read_csv('Traffic data.csv')
df.head()

Here, we display the first five data from the data set

The Id is not necessary for this tutorial so it can be left out for faster processing
Enter fullscreen mode Exit fullscreen mode

df

traffic forecast dataset for timeseries

Traffic Forecast Dataset

This is the display of the whole data, partially returned for viewing purposes

The data is composed of three years of collected data (2012-2014)
Enter fullscreen mode Exit fullscreen mode

Preprocessing the dataset

check null values

df.isnull().sum()

As we see, there are no null values present in the data set, which is good and makes the process easier

In case the data set contains at least one null value you must replace it using imputation techniques so it doesn't affect the results
Enter fullscreen mode Exit fullscreen mode

df.info()

Datetime is a column we want to convert so we need to see the data type to know how to process it
Enter fullscreen mode Exit fullscreen mode

convert object to datetime datatype

df['Datetime'] = pd.to_datetime(df['Datetime'], format='%d-%m-%Y %H:%M')
df.info()

Now we converted the Datetime column from an object to a proper datetime attribute
Enter fullscreen mode Exit fullscreen mode

EDA

plt.figure(figsize=(10,7))
plt.plot(df['Datetime'], df['Count'])
plt.show()

visualization of past traffic data with datetime
Visualization of past traffic data

Visualization of the growth of traffic in the specific datetime range collected
Enter fullscreen mode Exit fullscreen mode

Format data for the model

df.index = df['Datetime']
df['y'] = df['Count']
df.drop(columns=['ID', 'Datetime', 'Count'], axis=1, inplace=True)
df = df.resample('D').sum()
df.head()

This process combines all the data set with the same date summing the count
Enter fullscreen mode Exit fullscreen mode

df['ds'] = df.index
df.head()

Defining the input data as 'ds' and the output data as 'y'
Enter fullscreen mode Exit fullscreen mode

Input Split

size = 60
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=size/len(df), shuffle=False)

train.tail()

test_size=size/len(df) - gives a certain percentage

shuffle=False - To ensure that the data is not shuffled
Enter fullscreen mode Exit fullscreen mode

test.head()

Display of the first five data samples in the test data
Enter fullscreen mode Exit fullscreen mode

test.tail()

Display of the last five data samples from the test data
Enter fullscreen mode Exit fullscreen mode

Model Training

model = Prophet(yearly_seasonality=True, seasonality_prior_scale=0.9)
model.fit(train)

Initialization of Prophet model with yearly seasonality as a parameter
Enter fullscreen mode Exit fullscreen mode

future = model.make_future_dataframe(periods=60)
future

Top comments (0)