Today's Purpose
Today, I want to share how you can conduct a simple regression analysis using several libraries.
Libraries Used
LinearRegression
pandas
matplotlib
Data file
Temperature_10y.csv
Year,Month,Day,Temperature,Quality,Homogeneity
2006,1,1,3.6,8,1
・・・
2016,12,31,6.3,8,2
※This file contains 10 years of temperature data
Detailed Method
①Import libraries necessary for regression analysis.
②Read the temperature data file.
③Split the 10 years of temperature data into 9 years (2006–2015) for training and 1 year (2016) for testing.
※The 9 years of data will serve as training data, while the 1 year will be the test data.
④The make_data function prepares the input features and target values from the dataset.
Example:
temps = [3.6, 4.0, 3.7, 4.0, 3.6, 2.1, 2.8,…]
x = [[3.6, 4.0, 3.7, 4.0, 3.6, 2.1], …]
y = [2.8, …]
⑤Create a model using the training data.
⑥Generate predictions using the model and the test data.
⑦Plot a graph showing both the predicted and test data.
⑧Calculate the difference between the predicted and test data.
Source Code
①Import libraries
from sklearn.linear_model import LinearRegression
import pandas as pd
import matplotlib.pyplot as plt
②Read the temperature file
df = pd.read_csv('Temperature_10y.csv', encoding='utf-8')
③Split the data
train_year = df['年'] <= 2015
test_year = df['年'] >= 2016
interval = 6
④Prepare data
def make_data(data):
x = []
y = []
temps = list(data['Temperature'])
for i in range(len(temps)):
if i < interval: continue
y.append(temps[i])
xa = []
for p in range(interval):
d = i + p - interval
xa.append(temps[d])
x.append(xa)
return (x, y)
train_x, train_y = make_data(df[train_year])
test_x, test_y = make_data(df[test_year])
lr = LinearRegression()
⑤Create the model
lr.fit(train_x, train_y)
⑥Make predictions
pre_y = lr.predict(test_x)
⑦Plot the results
plt.figure(figsize=(10, 6), dpi=100)
plt.plot(test_y, c='r') # Actual data in red
plt.plot(pre_y, c='b') # Predicted data in blue
plt.savefig('temperature-lr.png')
plt.show()
⑧Calculate differences
diff_y = abs(pre_y - test_y)
print('Average Difference =', sum(diff_y) / len(diff_y))
print('Maximum Difference =', max(diff_y))
print('Minimum Difference =', min(diff_y))
Result
temperature-lr.png
avg= 1.6640684971954247
max= 8.471949619908472
min= 0.01187698362363232
Top comments (0)