As I mentioned in my introductory post to this series, I don’t want math to be a hurdle for developers who want to get into AI and machine learning. You can always start coding and come back to the math when you find yourself needing it.
However, there are some concepts in linear regression that you should definitely know, since they are core to many other algorithms as well.
That said, I don’t believe you should memorize equations. When developing, we rely on libraries to handle that for us. But I do recommend going through this post and building a mental model of the core concepts that are rooted in math — very simple math that you can absolutely handle.
On The Line
In the previous post, we predicted the price of a 250 m² house by drawing a line based on a dataset of 10 houses.
But why that line? And how did we get a predicted value (ŷ) from an input (x)?
The answer is this formula:
ŷ = bx + a
a - The intercept, where the line crosses the Y axis (when x = 0)
b - The slope - how steep the line is
To find a and b, we use:
a = ((Σy)(Σx^2) - (Σx)(Σxy)) / (n(Σx^2) - (Σx)^2)
b = (n(Σxy) - (Σx)(Σy)) / (n(Σx^2) - (Σx)^2)
Don't worry! Its simple than it seems:
Σ = Total sum
- Σx = Total sum of all house sizes (50 + 65 + 80 + 95 + 110 + 130 + 150 + 170 + 190 + 210 = 1250)
- Σy = Total sum of all house prices (140 + 210 + 180 + 260 + 240 + 330 + 310 + 420 + 390 + 470 = 2950)
- Σxy = Total sum of (x · y) for each row (7000 + 13650 + 14400 + 24700 + 26400 + 42900 + 46500 + 71400 + 74100 + 98700 = 419,750)
- Σx² = Total sum of (x · x) for each row (2500 + 4225 + 6400 + 9025 + 12100 + 16900 + 22500 + 28900 + 36100 + 44100 = 182,750)
- n = Total number of rows (10 houses)
So:
a = ((2950)(182750) - (1250)(419750)) / (10(182750) - (1250)^2) ≈ 54.43
b = (10(419750) - (1250)(2950)) / (10(182750) - (1250)^2) ≈ 1.92
Now we can predict the price of a 250 m² house:
ŷ = bx + a
ŷ = 1.92 * 250 + 54.43
ŷ = 534.43 ≈ 535
How Good Is Our Line?
We have a model — but is it a good one?
For each data point, we compare the actual value (y) with the predicted value (ŷ). The difference is called a residual:
residual = y - ŷ
Let's calculate a few residuals using our formula
ŷ = 1.92x + 54.43
Some residuals are positive (we under-predicted), some are negative (we over-predicted). If we simply summed them, they would cancel out — so we square them:
MSE = (1/n) Σ(yᵢ - ŷᵢ)²
This is the Mean Squared Error (MSE) — a single number that tells us how wrong the model is, on average.
Our line should reflect the minimal MSE possible.
How to do that? We already did.
The formulas for a and b are derived by minimizing the MSE.
This means the line we found is the best possible line for this data — there is no better combination of a and b.
In the next post, we'll code our first model!



Top comments (0)