mahraib

Posted on Jan 12

perceptron - day 01 of dl

#machinelearning #ai #deeplearning #beginners

while starting learning neural networks, perceptron is the first thing. it's simple and shows how learning from points works.

how it works

a perceptron draws a straight line to separate two types of data. it calculates:

output = w1*x1 + w2*x2 + ... + b

if output is positive, it says "class a".
if negative, "class b".

to learn, it uses this trick:

start with random weights.
check one point.
if wrong, adjust weights toward that point.
repeat until all points are right.

the update looks like this:
new weight = old weight + learning rate * (true label - predicted label) * input

simple idea: if you're wrong, move the line toward the mistake.

the problem

the perceptron stops as soon as all training points are correct. but there are often many possible lines that all work perfectly.

imagine separating two groups of points. you could draw the line close to one group, close to the other, or in the middle. all would be 100% correct on your training data.

the perceptron picks whichever line it finds first, it would be line A, B or C. which one you, get depends on:

random starting weights.
the order of points.
luck.

it has one big flaw: it finds any solution that works, not the best one.
train twice, get two different lines. both work on your training data, but one might be much better than the other.

why this matters

a line that just barely separates the data is fragile. real data has noise. new points won't be exactly like your training points. a tight boundary will make mistakes easily.

what we want is the line in the middle of the gap, farthest from both groups. this is more robust and handles new data better.

how loss functions help

loss functions change the question. instead of "is this wrong?" they ask "how wrong is this?" or "how confidently right is this?"

look at hinge loss:
loss = max(0, 1 - true label * prediction)

even if a point is correct, there's still loss if the prediction isn't confident enough. this pushes the line away from points, creating a safety margin.

gradient descent: better learning

with loss functions, we don't update based on single points. we look at all data and find the average error. then we adjust weights to reduce this error most effectively.

this is gradient descent:
new weight = old weight - learning rate * slope of loss

the minus sign is key: we go downhill toward lower error.

the takeaway

the perceptron shows the basics of learning. but it sees the world as binary: right or wrong.

real problems need more nuance. loss functions provide that. they let us:

work with data that can't be perfectly separated.
measure degrees of wrongness.
build robust classifiers.
handle multiple classes.

that's why modern neural networks use loss functions with gradient descent. it turns a simple rule follower into a true learner that handles real world complexity.

DEV Community