DEV Community

yaswanthteja
yaswanthteja

Posted on

5 2

Prediction using Supervised ML

  • Predict the percentage of marks of an student based on the number of study hours.
  • This is a simple linear regression task as it involves just 2 variables.
  • Data can be found at clickhere
  • You can use R, Python, SAS Enterprise Miner or any other tool.
  • What will be predicted score if a student studies for 9.25 hrs/ day?

Demo

Prediction using Supervised Machine Learning

In this regression task I tried to predict the percentage of marks that a student is expected to score based upon the number of hours they studied.

This is a simple linear regression task as it involves just two variables.

Importing the required libraries

# Importing the required libraries
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np  
Enter fullscreen mode Exit fullscreen mode

Reading the data from source

# Reading data from remote link
url = "https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv"
s_data = pd.read_csv(url)
print("Data import successful")
s_data.head(10)
Enter fullscreen mode Exit fullscreen mode

Step 2 - Input data Visualization

# Plotting the distribution of scores
s_data.plot(x='Hours', y='Scores', style='o')  
plt.title('Hours vs Percentage')  
plt.xlabel('Hours Studied')  
plt.ylabel('Percentage Score')  
plt.show()
Enter fullscreen mode Exit fullscreen mode

From the graph we can safely assume a positive linear relation between the number of hours studied and percentage of score.

Step 3 - Data Preprocessing

This step involved division of data into "attributes" (inputs) and "labels" (outputs).

X = s_data.iloc[:, :-1].values  
y = s_data.iloc[:, 1].values  
Enter fullscreen mode Exit fullscreen mode

Step 4 - Model Training

Splitting the data into training and testing sets, and training the algorithm.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) 
regressor = LinearRegression()  
regressor.fit(X_train.reshape(-1,1), y_train) 

print("Training complete.")
Enter fullscreen mode Exit fullscreen mode

Step 5 - Plotting the Line of regression

Now since our model is trained now, its the time to visualize the best-fit line of regression.

# Plotting the regression line
line = regressor.coef_*X+regressor.intercept_

# Plotting for the test data
plt.scatter(X, y)
plt.plot(X, line,color='red');
plt.show()
Enter fullscreen mode Exit fullscreen mode

Step 6 - Making Predictions

Now that we have trained our algorithm, it's time to test the model by making some predictions.

For this we will use our test-set data

# Testing data
print(X_test)
# Model Prediction 
y_pred = regressor.predict(X_test)
Enter fullscreen mode Exit fullscreen mode

Step 7 - Comparing Actual result to the Predicted Model result

# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}) 
df 
Enter fullscreen mode Exit fullscreen mode
#Estimating training and test score
print("Training Score:",regressor.score(X_train,y_train))
print("Test Score:",regressor.score(X_test,y_test))
Enter fullscreen mode Exit fullscreen mode

Plotting the Bar graph to depict the difference between the actual and predicted value

# Plotting the Bar graph to depict the difference between the actual and predicted value

df.plot(kind='bar',figsize=(5,5))
plt.grid(which='major', linewidth='0.5', color='red')
plt.grid(which='minor', linewidth='0.5', color='blue')
plt.show(
Enter fullscreen mode Exit fullscreen mode

Testing the model with our own data

# Testing the model with our own data
hours = 9.25
test = np.array([hours])
test = test.reshape(-1, 1)
own_pred = regressor.predict(test)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))
Enter fullscreen mode Exit fullscreen mode

Step 8 - Evaluating the model

The final step is to evaluate the performance of algorithm. This step is particularly important to compare how well different algorithms perform on a particular dataset. Here different errors have been calculated to compare the model performance and predict the accuracy.

from sklearn import metrics  
print('Mean Absolute Error:',metrics.mean_absolute_error(y_test, y_pred)) 
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('R-2:', metrics.r2_score(y_test, y_pred))
Enter fullscreen mode Exit fullscreen mode

Mean Absolute Error: 4.183859899002975
Mean Squared Error: 21.598769307217406
Root Mean Squared Error: 4.647447612100367
R-2: 0.9454906892105355
R-2 gives the score of model fit and in this case we have R-2 = 0.9454906892105355 which is actually a great score for this model.

I was successfully able to carry-out Prediction using Supervised ML task and was able to evaluate the model's performance on various parameters.

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay