Aanand

Posted on May 22

Why y=mx+b is the heart of AI

#ai #machinelearning #mattermost

We hear words like:

Artificial Intelligence
Machine Learning
Neural Networks

then imagine something mysterious.

Something extremely advanced.

But surprisingly, one of the core ideas behind neural networks starts from a concept many of us studied in school.

y = mx + b

This is called the slope-intercept form of a line.

y = mx + b

x Input — the number we feed in
y Output — the result we get out
m Slope — how much y changes per unit of x
b Y-intercept — the output when x(input) is zero.

Example: take y = 2x + 1 and Plug in different input:
x = 1 → y = 3
x = 2 → y = 5
x = 3 → y = 7

Neural networks use the same idea

Inside a neural network, a single neuron does this:

y = wx + b

w Weight — behaves exactly like slope (m)
b Bias — behaves exactly like the y-intercept

The only difference: the variable names changed. The math is the same.

What does weight actually do?

Suppose an AI is predicting a student's exam score. It has three inputs:

Study hours
Hours of sleep
Hours of phone usage

Do all three affect the score equally? No.

Study hours matter the most
Phone usage probably hurts
Sleep helps a little.

So the formula for this prediction might look like:

score = 5(study) + 2(sleep) − 4(phone) + 10

interpretation:

Study hours have the highest positive weight (5) — they help the most
Sleep has a smaller positive weight (2) — it helps a little
Phone usage has a negative weight (−4) — it actively hurts the score
10 is the bias — the model's default starting score before inputs are counted

But why say "weight" instead of "slope"?

neural networks have many inputs at once. Each input needs its own slope. When there are many of them, we call them weights.

y = w₁x₁ + w₂x₂ + w₃x₃ + b

Each input (x₁, x₂, x₃) has its own weight (w₁, w₂, w₃).
The name "weight" just means: the slope assigned to this particular input.

In model "training"

At the start, a model does not know the right weights. They are set randomly. So predictions are terrible:

random start — predictions are bad
y = 0.13x + 7.8

Training is the process of adjusting the weights and bias, little by little, until predictions get better:

getting closer
y = 2.4x + 1.1

much better
y = 3.0x + 0.5

Notice: both w and b are changing. Training adjusts both the weights and the bias to reduce prediction error. That is all training is.

Putting it all together: one neuron

A single neuron in a neural network does four things in order:

Multiply each input by its weight
Add all those results together
Add the bias
Pass the result through an activation function

z = w₁x₁ + w₂x₂ + b

output = f(z)

What is f(z) — the activation function?
After a neuron computes z = w₁x₁ + w₂x₂ + b, it passes z through a function called the activation function before outputting anything:

To understand why this exists, we need to understand one hard problem first.

The linearity problem
Every neuron computes a weighted sum — which is always a straight line. Stack many neurons together without an activation function, and we still get a straight line.

*The real world is not made of straight lines: *

recognizing faces
understanding language
diagnosing disease

none of these are linear

The activation function breaks the linearity. It bends or squashes the output in a nonlinear way.

So This is why activation functions exist. Without them, deep networks would have no advantage over a single straight line.

DEV Community