DEV Community

shangkyu shin
shangkyu shin

Posted on • Originally published at zeromathai.com

CNN Layer Composition — A Practical Developer Guide to Activation, Pooling, and Fully Connected Layers

CNNs are not just convolution stacks. This guide explains how activation, pooling, and fully connected layers work together to transform feature maps into predictions.

Cross-posted from Zeromath. Original article: https://zeromathai.com/en/cnn-layer-composition-en/


CNN Layer Composition (Think Like an Engineer)

A CNN is not magic.

It’s a pipeline:

input → feature extraction → filtering → compression → classification


1. Convolution Alone = Not Enough

Convolution is linear.

Stack linear layers:

→ still linear

So:

  • no complex decision boundary
  • no deep feature learning

Activation is mandatory.


2. ReLU — The Switch That Enables Depth

ReLU:

f(x) = max(0, x)

Example:

[-3, -1, 0.5, 2] → [0, 0, 0.5, 2]

Why it matters:

  • introduces nonlinearity
  • avoids vanishing gradient
  • filters weak signals

3. Shape Flow (Real Example)

Input:
(224, 224, 3)

Conv:
(224, 224, 64)

ReLU:
(224, 224, 64)

Pooling:
(112, 112, 64)

Key rules:

  • spatial ↓
  • channels same

4. Why Channels Increase

As depth increases:

  • spatial size ↓
  • channel count ↑

Why?

→ model learns more feature types


5. Pooling vs Stride

Pooling:

  • fixed
  • no parameters

Strided Conv:

  • learnable
  • more flexible

Modern models often prefer strided conv.


6. Max Pooling = Feature Selection

2×2 max pooling:

Input:
1 1 2 4

5 6 7 8

3 2 1 0

1 2 3 4

Output:
6 8

3 4

Effect:

  • strongest signal survives
  • noise removed

7. Receptive Field

Deeper layers:

  • see more context
  • capture higher-level features

Flow:

edges → textures → shapes → objects


8. Flatten + Dense

Before classification:

(7, 7, 512) → (25088)

Then:

Dense → Softmax → prediction


9. Modern Trick: Global Average Pooling

Instead of big dense layers:

  • average each channel
  • fewer parameters
  • better generalization

10. Full Pipeline

  1. Conv → detect
  2. ReLU → filter
  3. Pool → compress
  4. Repeat → hierarchy
  5. Dense → predict

Debug Mindset

If model fails:

  • bad features → conv problem
  • weak signal → activation issue
  • too slow → pooling issue
  • wrong output → classifier issue

Key Takeaways

  • CNN = structured system
  • ReLU enables learning
  • Pooling controls scale
  • Dense layers make decisions

Discussion

In real projects, what matters most?

  • architecture design?
  • training tricks?
  • or data quality?

Curious to hear your experience.

Top comments (0)