We hear words like:
- Artificial Intelligence
- Machine Learning
- Neural Networks
then imagine something mysterious.
Something extremely advanced.
But surprisingly, one of the core ideas behind neural networks starts from a concept many of us studied in school.
y = mx + b
This is called the slope-intercept form of a line.
y = mx + b
x Input — the number we feed in
y Output — the result we get out
m Slope — how much y changes per unit of x
b Y-intercept — the output when x(input) is zero.
Example: take y = 2x + 1 and Plug in different input:
x = 1 → y = 3
x = 2 → y = 5
x = 3 → y = 7
Neural networks use the same idea
Inside a neural network, a single neuron does this:
y = wx + b
w Weight — behaves exactly like slope (m)
b Bias — behaves exactly like the y-intercept
The only difference: the variable names changed. The math is the same.
What does weight actually do?
Suppose an AI is predicting a student's exam score. It has three inputs:
- Study hours
- Hours of sleep
- Hours of phone usage
Do all three affect the score equally? No.
- Study hours matter the most
- Phone usage probably hurts
- Sleep helps a little.
So the formula for this prediction might look like:
score = 5(study) + 2(sleep) − 4(phone) + 10
interpretation:
- Study hours have the highest positive weight (5) — they help the most
- Sleep has a smaller positive weight (2) — it helps a little
- Phone usage has a negative weight (−4) — it actively hurts the score
- 10 is the bias — the model's default starting score before inputs are counted
But why say "weight" instead of "slope"?
neural networks have many inputs at once. Each input needs its own slope. When there are many of them, we call them weights.
y = w₁x₁ + w₂x₂ + w₃x₃ + b
Each input (x₁, x₂, x₃) has its own weight (w₁, w₂, w₃).
The name "weight" just means: the slope assigned to this particular input.
In model "training"
At the start, a model does not know the right weights. They are set randomly. So predictions are terrible:
random start — predictions are bad
y = 0.13x + 7.8
Training is the process of adjusting the weights and bias, little by little, until predictions get better:
getting closer
y = 2.4x + 1.1
much better
y = 3.0x + 0.5
Notice: both w and b are changing. Training adjusts both the weights and the bias to reduce prediction error. That is all training is.
Putting it all together: one neuron
A single neuron in a neural network does four things in order:
- Multiply each input by its weight
- Add all those results together
- Add the bias
- Pass the result through an activation function
z = w₁x₁ + w₂x₂ + b
output = f(z)
What is f(z) — the activation function?
After a neuron computes z = w₁x₁ + w₂x₂ + b, it passes z through a function called the activation function before outputting anything:
To understand why this exists, we need to understand one hard problem first.
The linearity problem
Every neuron computes a weighted sum — which is always a straight line. Stack many neurons together without an activation function, and we still get a straight line.
*The real world is not made of straight lines: *
- recognizing faces
- understanding language
- diagnosing disease
none of these are linear
The activation function breaks the linearity. It bends or squashes the output in a nonlinear way.
So This is why activation functions exist. Without them, deep networks would have no advantage over a single straight line.
Top comments (0)