Beginner's Guide to Linear Regression in Python – Simple, Step‑by‑Step

#machinelearning #python

Overview

Linear regression is a staple of predictive analytics.

In this guide we’ll build a simple linear model from scratch using only NumPy for vectorised math and tqdm for a progress bar. By the end you’ll understand the math behind gradient descent, see how the code maps to the theory, and be able to extend the approach to more complex scenarios.

1️⃣ Introduction

Suppose you have five measurements of how far a ball travels (y) when it’s kicked from various distances (x).

You want a model that can predict the expected distance for any new kick.

The simplest assumption is a linear relationship:

y \;\approx\; m\,x + b

where m is the slope and b the intercept.

To find the best m and b we minimise the Mean Squared Error (MSE) between predictions and observed values, using gradient descent as the optimisation routine.

2️⃣ What the Code Does (High‑Level View)

Step	What happens	Why it matters
1	Load the data into NumPy arrays.	Enables fast, vectorised calculations.
2	Initialise the model (`y_{\text{hat}} = m \cdot X + b`) with `m = 0`, `b = 0`.	Provides a starting point for optimisation.
3	Run gradient descent for a fixed number of iterations (`\text{epochs}`).	Iteratively improves `m` and `b`.
4	Print the learned slope and intercept.	Shows the final line.
5	Make a prediction for a new input (`x = 6`).	Demonstrates the model’s utility.

3️⃣ Step‑by‑Step Explanation of the Code

import numpy as np
from tqdm import tqdm

# 1️⃣  The data – two 1‑D arrays
X = np.array([1, 2, 3, 4, 5])          # independent variable (x‑values)
y = np.array([2, 4, 5, 4, 5])          # dependent variable (y‑values)
print(X, y)

# 2️⃣  Initialise model parameters
m = 0
b = 0

# 3️⃣  Hyper‑parameters
lr = 0.01          # learning rate
epochs = 1000      # number of passes over the whole dataset
n = len(X)         # number of training examples

# 4️⃣  Training loop
for _ in tqdm(range(epochs)):
    # 4a  Predict using current parameters
    y_hat = m * X + b

    # 4b  Compute gradients of MSE w.r.t. m and b
    dm = (-2 / n) * np.sum(X * (y - y_hat))
    db = (-2 / n) * np.sum(y - y_hat)

    # 4c  Gradient descent update
    m -= lr * dm
    b -= lr * db

# 5️⃣  Output the learned line
print("Slope:", m)
print("Intercept:", b)

# 6️⃣  Make a prediction for a new x‑value
print("PREDICTIONS...")
input_val = 6
pred = m * input_val + b
print(pred)

Key Programming Concepts

Concept	How it’s used	Why it’s useful
NumPy arrays	`X` and `y` are arrays, enabling vectorised arithmetic.	One operation over the whole dataset instead of loops.
Vectorisation	`m * X + b` applies the formula to every element of `X` in one go.	Fast and memory efficient.
Gradient descent	Iteratively updates `m` and `b` using the gradients `dm` and `db`.	Simple optimisation routine that converges for a convex loss like MSE.
**Learning rate (


### 4.4 Gradient Descent Update Rule

With learning rate $\alpha$ (called `lr$$
 in the code):

$$m \leftarrow m - \alpha \, \frac{\partial L}{\partial m}$$
$$b \leftarrow b - \alpha \, \frac{\partial L}{\partial b}$$

The loop implements exactly this:


$$``python
m -= lr * dm
b -= lr * db


### 4.4 Gradient Descent Update Rule

With learning rate $\alpha$ (called `lr$$
 in the code):

$$m \leftarrow m - \alpha \, \frac{\partial L}{\partial m}$$
$$b \leftarrow b - \alpha \, \frac{\partial L}{\partial b}$$

The loop implements exactly this:


$$``python
m -= lr * dm
b -= lr * db

1python dm = (-2 / n) * np.sum(X * (y - y_hat)) db = (-2 / n) * np.sum(y - y_hat)``

4.4 Gradient Descent Update Rule

With learning rate $\alpha$ (called `lr$$
in the code):

$$m \leftarrow m - \alpha \, \frac{\partial L}{\partial m}$$
$$b \leftarrow b - \alpha \, \frac{\partial L}{\partial b}$$

The loop implements exactly this:

$$``python
m -= lr * dm
b -= lr * db


---

## 5️⃣ A Concrete Example (Walk‑through with Numbers)

Let’s manually execute one epoch on the dataset to see how the parameters change.

| i | `x_i` | `y_i` |
|---|--------|--------|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 5 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |

**Initial parameters**: `m = 0`, `b = 0` → predictions `\hat{y} = [0, 0, 0, 0, 0]`.

**Step 1 – Compute errors**:

math
y - \hat{y} = [2, 4, 5, 4, 5]





**Step 2 – Compute gradients**:

$$
\begin{

DEV Community