"We now have a new kind of programming paradigm. Instead of telling the computer what to do, we show it examples of what we want, and it figures out how to do it."
-- Michael Nielsen
My Journey Back to the Beginning
My first encounter with Artificial Intelligence was during my college days. I had memorised more than I understood, but none of what I studied appeared in the exam, so I wrote whatever I could, and I’m quite certain the professor didn’t understand my answers either.
Fast forward 20 years of building software systems. In all that time, I barely touched AI/ML. Sure, I designed applications that integrated with black box, AI/ML systems for OCR, but that was it.
Then ChatGPT happened.
Like many of you, I started with the ChatGPT web interface, learning prompt engineering. Then I began experimenting—building RAG chatbots, exploring chunking strategies, testing different embedding models and retrieval techniques. I experimented with agents, explored MCPs and agentic patterns. I was learning these tools, building with them—but something bothered me.
I didn't understand how any of it actually worked.
So I decided to go back. Not to the latest paper or the newest framework, but to the very beginning. To the first artificial neuron.
Why This Matters
You might wonder why bother to learn about decades-old concept when we have ChatGPT, Claude and countless AI tools at our fingertips.
Here's why: Every single neuron in GPT-4, in every transformer, in every neural network you've ever used, works on the same basic principles as that first artificial neuron. The perceptron isn't history-It's the foundation.
Understanding it means understanding what's actually happening when you call an LLM API. It means knowing why things work, not just that they work.
If you've felt this same curiosity and want to truly understand the foundations beneath the tools we use every day, join me. Learning from first principles, one concept at a time.
From Biology to Silicon
In 1943, Warren McCulloch and Walter Pitts created the first mathematical model of a neuron. But it was Frank Rosenblatt in 1958 who built the perceptron, the first artificial neuron that could actually learn.
Rosenblatt's breakthrough came from mimicking nature. He studied how biological neurons work and translated that logic into mathematics. Here's how they compare:
Biological Neuron:
Dendrites → Cell Body → Threshold Check → Axon
(receive) (process) (fire if met) (output)
Artificial Neuron (Perceptron):
Inputs → Weighted Sum → Threshold Check → Output
x₁,x₂,... Σ(xᵢ × wᵢ) (≥ threshold?) 0 or 1
The key insight: Learning happens by adjusting the weights.
How a Perceptron Works
Let's break it down to basics.
A perceptron takes inputs, multiplies each by a weight, adds them up, and makes a decision.
def perceptron_forward(inputs, weights, bias):
# Multiply each input by its weight
weighted_sum = sum(x * w for x, w in zip(inputs, weights))
# Add bias (shifts the decision boundary)
weighted_sum += bias
# Activation: output 1 if positive, 0 otherwise
return 1 if weighted_sum > 0 else 0
That's it. That's the core of a perceptron.
What's happening:
- Each input has a weight (how important is this input?)
- We sum up: (input₁ × weight₁) + (input₂ × weight₂) + ... + bias
- If the sum is positive, output 1. Otherwise, output 0.
Example: AND gate
Let's say we want to implement the AND logic gate:
- Input: [0, 0] → Output: 0
- Input: [0, 1] → Output: 0
- Input: [1, 0] → Output: 0
- Input: [1, 1] → Output: 1
Traditional way (if/else):
def and_gate_traditional(input1, input2):
if input1 == 1 and input2 == 1:
return 1
else:
return 0
Perceptron way (learned weights):
With the right weights ([0.5, 0.5] and bias -0.7), the perceptron can solve this:
- [0, 0]: 0×0.5 + 0×0.5 - 0.7 = -0.7 → Output: 0 ✓
- [0, 1]: 0×0.5 + 1×0.5 - 0.7 = -0.2 → Output: 0 ✓
- [1, 0]: 1×0.5 + 0×0.5 - 0.7 = -0.2 → Output: 0 ✓
- [1, 1]: 1×0.5 + 1×0.5 - 0.7 = 0.3 → Output: 1 ✓
The difference? The traditional way is hardcoded. The perceptron learns these weights from examples. That's the new programming paradigm Nielsen talked about.
What Clicked for Me
After implementing and testing the perceptron, here's what became clear:
Weights are just numbers. There's no magic. A weight of 0.5 means "this input matters half as much as an input with weight 1.0."
The bias shifts the boundary. Without bias, the decision boundary always goes through the origin. Bias lets it move anywhere.
Learning is adjustment. When the perceptron makes a mistake, we adjust the weights. That's learning.
It's a linear classifier. The perceptron draws a straight line (or hyperplane) to separate classes. This is both its power and its limitation.
Explore the Code
I've implemented a complete perceptron from scratch with visualizations:
Here is the sample visualization screenshot from the playground
GitHub Repository: perceptrons-to-transformers
What you'll find:
-
01-perceptron/perceptron.py- Full implementation with learning algorithm -
01-perceptron/perceptron_playground.py- Streamlit app to play with it
What's Next
The perceptron can learn AND, OR, and NAND gates perfectly. But it has a fundamental limitation.
No matter how you adjust the weights, there's one simple logic gate it cannot learn. This limitation exposed a critical weakness in single-layer networks.
In the next post, we'll explore this limitation and see why it led to the invention of multilayer networks.
Spoiler: The problem is called XOR, and solving it ultimately enabled path to modern deep learning.
References
- Nielsen, M. (2015). Neural Networks and Deep Learning. Determination Press. Available at: http://neuralnetworksanddeeplearning.com/
Tags: #MachineLearning #AI #DeepLearning #Perceptron #NeuralNetworks
Series: From Perceptron to Transformers
Code: GitHub Repository

Top comments (0)