Ertugrul

Posted on Jul 24 • Edited on Aug 5

5 Regression Projects in Python (with Full Code)

#python #datascience #beginners #machinelearning

5 Regression Projects in Python (with Full Code)

Linear regression is one of the foundational algorithms in machine learning and statistics. But beyond the theory, real-world implementation matters — especially when it comes to building end-to-end predictive systems. In this post, I’ll walk you through five real-time linear regression projects I built using Python and scikit-learn, each solving a different problem using a different dataset.

📐 Introduction to Linear Regression & Evaluation Metrics

Linear regression is a statistical technique used to model the relationship between a dependent variable (target) and one or more independent variables (features). It's one of the simplest and most widely used regression techniques due to its interpretability and ease of implementation.

📈 Why Use Linear Regression?

Easy to implement and explain
Computationally efficient
Good for initial baseline models
Interpretable coefficients

In all of the projects below, linear regression served as a baseline approach to see how well basic relationships between features and targets could be modeled.

📐 Understanding R² and MSE

When evaluating a regression model, two of the most important metrics are R² (R-squared) and Mean Squared Error (MSE). Here's how they work and how to interpret them:

🔹 R² Score (Coefficient of Determination)

What it measures: The proportion of variance in the target variable that is predictable from the features.
Range: 0 to 1 (closer to 1 is better)
Interpretation:
- An R² of 0.90 means 90% of the variance in the target is explained by the model.
- A negative R² can occur when the model performs worse than simply predicting the mean — usually a sign of poor fit or flawed features.

🔹 Mean Squared Error (MSE)

What it measures: The average of the squares of the prediction errors.
Interpretation:
- Lower MSE = better performance
- Sensitive to outliers because it squares the error

🔍 How to Improve These Scores

Feature scaling and normalization
Removing or capping outliers
One-hot encoding for categorical variables
Feature selection or dimensionality reduction
Trying polynomial or regularized models (e.g., Ridge, Lasso)

📘 In my experiments, encoding quality and removing outliers had a huge impact on both metrics.

🔍 Projects Overview

📈 1. Salary Prediction

Dataset: Salary_dataset.csv

🧪 My Experience:
Model Accuracy (R²): 0.9024
Mean Squared Error: 49,830,096.86

Sample Predictions:

Predicted: 115,791.21, Actual: 112,636.00
Predicted: 71,499.28, Actual: 67,939.00
Predicted: 102,597.87, Actual: 113,813.00
Predicted: 75,268.80, Actual: 83,089.00
Predicted: 55,478.79, Actual: 64,446.00

Coefficient (Experience): 9423.82
Intercept: 24,380.20

🚗 2. Car Price Estimation

Dataset: cars24-car-price-clean2.csv

🧪 My Experience:
Model Accuracy (R²): 0.6328
Mean Squared Error: 32.15

Sample Predictions:

Predicted: 6.72, Actual: 7.00
Predicted: 6.32, Actual: 4.75
Predicted: 8.28, Actual: 6.30
Predicted: 6.04, Actual: 5.25
Predicted: 0.53, Actual: 2.10

The model performs moderately well. It shows clear potential in capturing price trends but is sensitive to feature scaling and category encoding. Outliers affect accuracy.

🍷 3. Wine Price Prediction

Dataset: wine.csv

🧪 My Experience:
Model Accuracy (R²): 0.5271
Mean Squared Error: 0.18

Sample Predictions:

Predicted: 7.54, Actual: 7.39
Predicted: 7.32, Actual: 7.59
Predicted: 7.67, Actual: 7.50
Predicted: 6.98, Actual: 6.26
Predicted: 5.77, Actual: 6.25

The model performs reasonably well but struggles with subtle differences in wine composition. Could benefit from more domain-specific features or nonlinear modeling.

🏥 4. Insurance Charges Prediction

Dataset: insurance.csv

🧪 My Experience:
Model Accuracy (R²): 0.7836
Mean Squared Error: 33,596,915.85

Sample Predictions:

Predicted: 8969.55, Actual: 9095.07
Predicted: 7068.75, Actual: 5272.18
Predicted: 36858.41, Actual: 29330.98
Predicted: 9454.68, Actual: 9301.89
Predicted: 26973.17, Actual: 33750.29

Most influential features: smoker, BMI, and age. The model demonstrates high predictive power but shows deviation in extreme cases.

🚖 5. Trip Duration Prediction

Dataset: train.csv

🧪 My Experience:
Model Accuracy (R²): 0.0227
Mean Squared Error: 23,060,274.73

Sample Predictions:

Predicted: 791.72, Actual: 473.00
Predicted: 531.07, Actual: 157.00
Predicted: 1706.45, Actual: 1862.00
Predicted: 1403.61, Actual: 1573.00
Predicted: 1381.69, Actual: 1318.00

Despite good intent, the model underperformed — likely due to high noise, unaccounted categorical values, or insufficient feature engineering.

🔄 Common Project Workflow

Each of the five projects follows a similar pipeline:

Data Loading & Cleaning: Load CSV, remove nulls
Feature Engineering: Encode categorical features using one-hot or binary encoding
Train-Test Split: Typically 80/20 split
Model Training: Use LinearRegression() from scikit-learn
Evaluation: R² Score and Mean Squared Error (MSE)
Visualization: Scatter plot of actual vs. predicted values

▶️ How to Run Each Project

Clone the repo, then run any project with:

python Project_X_name.py

Replace Project_X_name.py with the script for the specific model.

📦 Installation

Ensure dependencies are installed:

pip install -r requirements.txt

📂 Access the Code

You can find the complete repository here:

📎 https://github.com/Ertugrulmutlu/5-Linear-Regression-Projects/tree/main

🧠 What’s Next?

Upgrade the models with Ridge or Lasso Regression
Add interactive dashboards with Streamlit
Deploy the models via web or API

💬 Feedback?

Tried one of the projects? Found a bug? Want to share your own regression experiments?
Let’s chat in the comments or connect via GitHub!

Thanks for reading — and happy modeling!

DEV Community

5 Regression Projects in Python (with Full Code)

5 Regression Projects in Python (with Full Code)

📐 Introduction to Linear Regression & Evaluation Metrics

📈 Why Use Linear Regression?

📐 Understanding R² and MSE

🔹 R² Score (Coefficient of Determination)

🔹 Mean Squared Error (MSE)

🔍 How to Improve These Scores

🔍 Projects Overview

📈 1. Salary Prediction

🚗 2. Car Price Estimation

🍷 3. Wine Price Prediction

🏥 4. Insurance Charges Prediction

🚖 5. Trip Duration Prediction

🔄 Common Project Workflow

▶️ How to Run Each Project

📦 Installation

📂 Access the Code

🧠 What’s Next?

💬 Feedback?

Top comments (0)