CNNs are not just convolution stacks. This guide explains how activation, pooling, and fully connected layers work together to transform feature maps into predictions.
Cross-posted from Zeromath. Original article: https://zeromathai.com/en/cnn-layer-composition-en/
CNN Layer Composition (Think Like an Engineer)
A CNN is not magic.
It’s a pipeline:
input → feature extraction → filtering → compression → classification
1. Convolution Alone = Not Enough
Convolution is linear.
Stack linear layers:
→ still linear
So:
- no complex decision boundary
- no deep feature learning
Activation is mandatory.
2. ReLU — The Switch That Enables Depth
ReLU:
f(x) = max(0, x)
Example:
[-3, -1, 0.5, 2] → [0, 0, 0.5, 2]
Why it matters:
- introduces nonlinearity
- avoids vanishing gradient
- filters weak signals
3. Shape Flow (Real Example)
Input:
(224, 224, 3)
Conv:
(224, 224, 64)
ReLU:
(224, 224, 64)
Pooling:
(112, 112, 64)
Key rules:
- spatial ↓
- channels same
4. Why Channels Increase
As depth increases:
- spatial size ↓
- channel count ↑
Why?
→ model learns more feature types
5. Pooling vs Stride
Pooling:
- fixed
- no parameters
Strided Conv:
- learnable
- more flexible
Modern models often prefer strided conv.
6. Max Pooling = Feature Selection
2×2 max pooling:
Input:
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
Output:
6 8
3 4
Effect:
- strongest signal survives
- noise removed
7. Receptive Field
Deeper layers:
- see more context
- capture higher-level features
Flow:
edges → textures → shapes → objects
8. Flatten + Dense
Before classification:
(7, 7, 512) → (25088)
Then:
Dense → Softmax → prediction
9. Modern Trick: Global Average Pooling
Instead of big dense layers:
- average each channel
- fewer parameters
- better generalization
10. Full Pipeline
- Conv → detect
- ReLU → filter
- Pool → compress
- Repeat → hierarchy
- Dense → predict
Debug Mindset
If model fails:
- bad features → conv problem
- weak signal → activation issue
- too slow → pooling issue
- wrong output → classifier issue
Key Takeaways
- CNN = structured system
- ReLU enables learning
- Pooling controls scale
- Dense layers make decisions
Discussion
In real projects, what matters most?
- architecture design?
- training tricks?
- or data quality?
Curious to hear your experience.
Top comments (0)