zeromathai

Posted on Apr 11 • Edited on May 7 • Originally published at zeromathai.com

CNN Layer Composition — A Practical Developer Guide to Activation, Pooling, and Fully Connected Layers

#ai #deeplearning #machinelearning #cnn

CNNs are not just convolution stacks. This guide explains how activation, pooling, and fully connected layers work together to transform feature maps into predictions.

Cross-posted from Zeromath. Original article: https://zeromathai.com/en/cnn-layer-composition-en/

CNN Layer Composition (Think Like an Engineer)

A CNN is not magic.

It’s a pipeline:

input → feature extraction → filtering → compression → classification

1. Convolution Alone = Not Enough

Convolution is linear.

Stack linear layers:

→ still linear

So:

no complex decision boundary
no deep feature learning

Activation is mandatory.

2. ReLU — The Switch That Enables Depth

ReLU:

f(x) = max(0, x)

Example:

[-3, -1, 0.5, 2] → [0, 0, 0.5, 2]

Why it matters:

introduces nonlinearity
avoids vanishing gradient
filters weak signals

3. Shape Flow (Real Example)

Input:
(224, 224, 3)

Conv:
(224, 224, 64)

ReLU:
(224, 224, 64)

Pooling:
(112, 112, 64)

Key rules:

spatial ↓
channels same

4. Why Channels Increase

As depth increases:

spatial size ↓
channel count ↑

Why?

→ model learns more feature types

5. Pooling vs Stride

Pooling:

fixed
no parameters

Strided Conv:

learnable
more flexible

Modern models often prefer strided conv.

6. Max Pooling = Feature Selection

2×2 max pooling:

Input:
1 1 2 4

5 6 7 8

3 2 1 0

1 2 3 4

Output:
6 8

3 4

Effect:

strongest signal survives
noise removed

7. Receptive Field

Deeper layers:

see more context
capture higher-level features

Flow:

edges → textures → shapes → objects

8. Flatten + Dense

Before classification:

(7, 7, 512) → (25088)

Then:

Dense → Softmax → prediction

9. Modern Trick: Global Average Pooling

Instead of big dense layers:

average each channel
fewer parameters
better generalization

10. Full Pipeline

Conv → detect
ReLU → filter
Pool → compress
Repeat → hierarchy
Dense → predict

Debug Mindset

If model fails:

bad features → conv problem
weak signal → activation issue
too slow → pooling issue
wrong output → classifier issue

Key Takeaways

CNN = structured system
ReLU enables learning
Pooling controls scale
Dense layers make decisions

Discussion

In real projects, what matters most?

architecture design?
training tricks?
or data quality?

Curious to hear your experience.

GitHub Resources
AI diagrams, study notes, and visual guides:
https://github.com/zeromathai/zeromathai-ai

DEV Community