In this project ,I am predicting an employee's salary based on their years of experience using linear regression model.Linear regression is a statistical method used to model the relationship between a dependent variable and an independent variable.
In this case:
- X(Independent Variable) represents Years Of Experience
- Y(Dependent Variable) represents the salary
Libraries Used
The following Python libraries are used in this project:
- pandas for handling data frames
- seaborn and matplotlib for visualization
- Scikit-learn (sklearn) provides a collection of tools for data preprocessing, model training and evaluation
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.feature_selection import f_regression
%matplotlib inline
Data Preparation
Read the excel file that contains the data set.

Then define the X and Y variables .
The X variable is stored in a DataFrame format.
X=df[['YearsExperience']]#Should be in dataframe or 2d array
y=df['Salary']
Splitting the Data set
The dataset is split into training and testing sets ,both X and Y are split accordingly.
In this project 25% data(test size = 0.25) is used for testing the rest to train the model.
##Train Test Split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)
Model validation
To evaluate whether there is a valid relationship exists between years of experience and salary F-regression is used to calculate the:
- F-value(622.5):Measures how well the independent variable explains the dependent variable.
- p-value helps(0.0):Determines the statistical significance of the relationship.
Since the p-value is less than 0.05 we reject the null hypothesis which states that there is no relationship between salary and years of experience. This confirms that a strong relationship exists between the two variables.
In general, a higher F-value and a smaller p-value indicate a stronger and more significant relationship
Prediction
The Linear Regression model is trained by fitting it to the training dataset using model.fit().
model= LinearRegression()
model.fit(X_train,y_train)
The Linear Regression model follows the formula:
Y=b0+b1X
Where:
Y = Dependent Variable (Salary)
b₀ = Intercept (Predicted salary when years of experience = 0)
b₁ = Slope (Rate of change in salary per year of experience)
X = Independent Variable (Years of Experience)
From the trained model, we can obtain:
- The intercept() The predicted salary for an employee with 0 yrs of experience
model.intercept_
- The slope (coefficient) Represents the change in salary for each additional year of experience
model.coef_
The trained model is then used to predict salaries on the test dataset using model.predict(X_test).
Model Evaluation
The model’s performance is evaluated using the R-squared (R²) metric.
The R² score measures how much of the variation in salary is explained by years of experience.
In this case the R² score is 0.93 (93%) indicating that years of experience explain most of the variation in salary showing a strong model fit.
Plot the Model
To visualize the relationship and the regression line:

Conclusion
This project demonstrates that Linear Regression can effectively model the relationship between years of experience and salary.
Thank you for reading!❤️.



Top comments (0)