- Predict the percentage of marks of an student based on the number of study hours.
- This is a simple linear regression task as it involves just 2 variables.
- Data can be found at clickhere
- You can use R, Python, SAS Enterprise Miner or any other tool.
- What will be predicted score if a student studies for 9.25 hrs/ day?
Demo
Prediction using Supervised Machine Learning
In this regression task I tried to predict the percentage of marks that a student is expected to score based upon the number of hours they studied.
This is a simple linear regression task as it involves just two variables.
Importing the required libraries
# Importing the required libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Reading the data from source
# Reading data from remote link
url = "https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv"
s_data = pd.read_csv(url)
print("Data import successful")
s_data.head(10)
Step 2 - Input data Visualization
# Plotting the distribution of scores
s_data.plot(x='Hours', y='Scores', style='o')
plt.title('Hours vs Percentage')
plt.xlabel('Hours Studied')
plt.ylabel('Percentage Score')
plt.show()
From the graph we can safely assume a positive linear relation between the number of hours studied and percentage of score.
Step 3 - Data Preprocessing
This step involved division of data into "attributes" (inputs) and "labels" (outputs).
X = s_data.iloc[:, :-1].values
y = s_data.iloc[:, 1].values
Step 4 - Model Training
Splitting the data into training and testing sets, and training the algorithm.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train.reshape(-1,1), y_train)
print("Training complete.")
Step 5 - Plotting the Line of regression
Now since our model is trained now, its the time to visualize the best-fit line of regression.
# Plotting the regression line
line = regressor.coef_*X+regressor.intercept_
# Plotting for the test data
plt.scatter(X, y)
plt.plot(X, line,color='red');
plt.show()
Step 6 - Making Predictions
Now that we have trained our algorithm, it's time to test the model by making some predictions.
For this we will use our test-set data
# Testing data
print(X_test)
# Model Prediction
y_pred = regressor.predict(X_test)
Step 7 - Comparing Actual result to the Predicted Model result
# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
#Estimating training and test score
print("Training Score:",regressor.score(X_train,y_train))
print("Test Score:",regressor.score(X_test,y_test))
Plotting the Bar graph to depict the difference between the actual and predicted value
# Plotting the Bar graph to depict the difference between the actual and predicted value
df.plot(kind='bar',figsize=(5,5))
plt.grid(which='major', linewidth='0.5', color='red')
plt.grid(which='minor', linewidth='0.5', color='blue')
plt.show(
Testing the model with our own data
# Testing the model with our own data
hours = 9.25
test = np.array([hours])
test = test.reshape(-1, 1)
own_pred = regressor.predict(test)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))
Step 8 - Evaluating the model
The final step is to evaluate the performance of algorithm. This step is particularly important to compare how well different algorithms perform on a particular dataset. Here different errors have been calculated to compare the model performance and predict the accuracy.
from sklearn import metrics
print('Mean Absolute Error:',metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('R-2:', metrics.r2_score(y_test, y_pred))
Mean Absolute Error: 4.183859899002975
Mean Squared Error: 21.598769307217406
Root Mean Squared Error: 4.647447612100367
R-2: 0.9454906892105355
R-2 gives the score of model fit and in this case we have R-2 = 0.9454906892105355 which is actually a great score for this model.
I was successfully able to carry-out Prediction using Supervised ML task and was able to evaluate the model's performance on various parameters.
Top comments (0)