DEV Community

shangkyu shin
shangkyu shin

Posted on • Originally published at zeromathai.com

Why CNNs Work: Convolution, Feature Hierarchies, and the Real Difference from Fully Connected Networks

Understanding CNNs is not about memorizing layers.

It’s about understanding why this design exists.

Cross-posted from Zeromath. Original article: https://zeromathai.com/en/convolutional-layer-lec-en/


The Core Problem

Images are structured data.

A fully connected network treats them as flat vectors.

Example:

224×224×3 → 150K inputs

Dense layer → millions of parameters

Problems:

  • No spatial awareness
  • Too many parameters
  • Overfitting

What CNNs Fix

CNN introduces two key ideas:

  • Local connectivity
  • Weight sharing

Instead of connecting everything:
→ look locally, reuse globally


CNN Pipeline

Image → Conv → ReLU → Pool → Conv → ... → FC → Softmax


Convolution Layer

A filter slides across the image.

At each position:

  • Multiply
  • Sum
  • Output activation

Shape Example

Input: 32×32×3

Filter: 5×5×3

Output: 28×28


Why It Works

  • Detects local patterns
  • Works anywhere
  • Learns reusable features

Feature Maps

Feature maps are representations.

They answer:

→ where is this feature?


ReLU (Critical)

f(x) = max(0, x)

Without it:

  • Model is linear

With it:

  • Nonlinear learning
  • Better optimization

Pooling Layer

28×28 → 14×14

Benefits:

  • Faster
  • More robust
  • Translation invariant (approx)

Important Insight

CNNs are not truly translation invariant.

Pooling only makes them more robust to shifts.

Too much pooling:
→ destroys spatial detail

Modern CNNs:
→ reduce pooling

→ use strided convolution


Fully Connected Layer

Flatten → combine features → classify

Softmax → probabilities


Feature Hierarchy (Core Idea)

CNNs learn progressively:

Layer Learns
Early edges
Middle textures
Deep objects

Example:
edge → eye → face


Why CNNs Beat Dense Networks

CNN:

  • Efficient
  • Spatially aware
  • Generalizes well

Dense:

  • Huge parameter count
  • No structure awareness
  • Overfits

Debugging CNNs (Underrated Skill)

Use:

  • Activation maps
  • Saliency maps
  • Grad-CAM

These help:

  • Debug errors
  • Understand predictions
  • Improve models

Practical Tips

  • Don’t overuse pooling
  • Track feature map sizes
  • Prefer depth over width
  • Visualize early

Final Insight

The real breakthrough of CNNs is not just convolution.

It is the combination of:

  • Locality
  • Parameter sharing
  • Hierarchical learning

That’s what turns pixels into meaning.


For image tasks today, do you still start with CNNs, or jump straight to Vision Transformers?

Let’s discuss 👇

Top comments (0)