DEV Community

Viswa M
Viswa M

Posted on

Beginner's Guide to Linear Regression in Python – Simple, Step‑by‑Step

Overview

Linear regression is a staple of predictive analytics.

In this guide we’ll build a simple linear model from scratch using only NumPy for vectorised math and tqdm for a progress bar. By the end you’ll understand the math behind gradient descent, see how the code maps to the theory, and be able to extend the approach to more complex scenarios.


1️⃣ Introduction

Suppose you have five measurements of how far a ball travels (y) when it’s kicked from various distances (x).

You want a model that can predict the expected distance for any new kick.

The simplest assumption is a linear relationship:

y \;\approx\; m\,x + b
Enter fullscreen mode Exit fullscreen mode

where m is the slope and b the intercept.

To find the best m and b we minimise the Mean Squared Error (MSE) between predictions and observed values, using gradient descent as the optimisation routine.


2️⃣ What the Code Does (High‑Level View)

Step What happens Why it matters
1 Load the data into NumPy arrays. Enables fast, vectorised calculations.
2 Initialise the model (y_{\text{hat}} = m \cdot X + b) with m = 0, b = 0. Provides a starting point for optimisation.
3 Run gradient descent for a fixed number of iterations (\text{epochs}). Iteratively improves m and b.
4 Print the learned slope and intercept. Shows the final line.
5 Make a prediction for a new input (x = 6). Demonstrates the model’s utility.

3️⃣ Step‑by‑Step Explanation of the Code

import numpy as np
from tqdm import tqdm

# 1️⃣  The data – two 1‑D arrays
X = np.array([1, 2, 3, 4, 5])          # independent variable (x‑values)
y = np.array([2, 4, 5, 4, 5])          # dependent variable (y‑values)
print(X, y)

# 2️⃣  Initialise model parameters
m = 0
b = 0

# 3️⃣  Hyper‑parameters
lr = 0.01          # learning rate
epochs = 1000      # number of passes over the whole dataset
n = len(X)         # number of training examples

# 4️⃣  Training loop
for _ in tqdm(range(epochs)):
    # 4a  Predict using current parameters
    y_hat = m * X + b

    # 4b  Compute gradients of MSE w.r.t. m and b
    dm = (-2 / n) * np.sum(X * (y - y_hat))
    db = (-2 / n) * np.sum(y - y_hat)

    # 4c  Gradient descent update
    m -= lr * dm
    b -= lr * db

# 5️⃣  Output the learned line
print("Slope:", m)
print("Intercept:", b)

# 6️⃣  Make a prediction for a new x‑value
print("PREDICTIONS...")
input_val = 6
pred = m * input_val + b
print(pred)
Enter fullscreen mode Exit fullscreen mode

Key Programming Concepts

Concept How it’s used Why it’s useful
NumPy arrays X and y are arrays, enabling vectorised arithmetic. One operation over the whole dataset instead of loops.
Vectorisation m * X + b applies the formula to every element of X in one go. Fast and memory efficient.
Gradient descent Iteratively updates m and b using the gradients dm and db. Simple optimisation routine that converges for a convex loss like MSE.
**Learning rate (

### 4.4 Gradient Descent Update Rule

With learning rate $\alpha$ (called `lr$$
 in the code):

$$m \leftarrow m - \alpha \, \frac{\partial L}{\partial m}$$
$$b \leftarrow b - \alpha \, \frac{\partial L}{\partial b}$$

The loop implements exactly this:


$$``python
m -= lr * dm
b -= lr * db
Enter fullscreen mode Exit fullscreen mode


0)** | Controls the step size in the parameter update. | Too large → oscillation or divergence; too small → slow convergence. |
| Epochs | Number of full passes over the data. | More epochs → more accurate parameters, but also more computation. |
| **Progress bar (


### 4.4 Gradient Descent Update Rule

With learning rate $\alpha$ (called `lr$$
 in the code):

$$m \leftarrow m - \alpha \, \frac{\partial L}{\partial m}$$
$$b \leftarrow b - \alpha \, \frac{\partial L}{\partial b}$$

The loop implements exactly this:


$$``python
m -= lr * dm
b -= lr * db
Enter fullscreen mode Exit fullscreen mode


1python
dm = (-2 / n) * np.sum(X * (y - y_hat))
db = (-2 / n) * np.sum(y - y_hat)
``

4.4 Gradient Descent Update Rule

With learning rate $\alpha$ (called `lr$$
in the code):

$$m \leftarrow m - \alpha \, \frac{\partial L}{\partial m}$$
$$b \leftarrow b - \alpha \, \frac{\partial L}{\partial b}$$

The loop implements exactly this:

$$``python
m -= lr * dm
b -= lr * db


---

## 5️⃣ A Concrete Example (Walk‑through with Numbers)

Let’s manually execute one epoch on the dataset to see how the parameters change.

| i | `x_i` | `y_i` |
|---|--------|--------|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 5 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |

**Initial parameters**: `m = 0`, `b = 0` → predictions `\hat{y} = [0, 0, 0, 0, 0]`.

**Step 1 – Compute errors**:

Enter fullscreen mode Exit fullscreen mode


math
y - \hat{y} = [2, 4, 5, 4, 5]





**Step 2 – Compute gradients**:

$$
\begin{
Enter fullscreen mode Exit fullscreen mode

Top comments (0)