a hidden layer is an intermediate layer between the input and output layers in a neural network. it's called "hidden" because its outputs are not directly observable as final outputs from the network.
key points:
1. transformation function:
each hidden layer performs:
-
linear transformation:
z = w·x + b(weights × inputs + bias)-
matrix representation:
for a hidden layer withminputs andnneurons:hidden layer output = activation(w·x + b) where: w = weight matrix of shape (n × m) x = input vector of shape (m × 1) b = bias vector of shape (n × 1)
-
-
non-linear activation:
a = f(z)(relu, sigmoid, tanh, etc.)
impact:- sigmoid/tanh: early days, suffers from vanishing gradient.
- relu: modern default, solves vanishing gradient but has "dying relu" problem.
- leaky relu/elu: address dying relu issue.
- swish/mish: recent alternatives, often better performance.
activation functions will be discuss in details.
2. what happens in a hidden layer:
- feature extraction: learns patterns from previous layer's outputs.
- hierarchical learning: early layers learn simple features, deeper layers combine them.
3. why are hidden layers so important?
- example: cat image classification
| layer | what it "sees" |
|---|---|
| input | raw pixels |
| hidden 1 | edge detectors |
| hidden 2 | texture patterns |
| hidden 3 | object parts |
| hidden 4 | whole objects |
| output | classification |
the "deep" in deep learning:
the term "deep" in deep learning specifically refers to having multiple hidden layers. this depth enables:
- automatic feature engineering: no need for manual feature extraction.
- hierarchical understanding: from pixels to concepts.
- transfer learning: early layers often learn general features transferable between tasks.
the takeaway:
hidden layers are learned feature extractors.the depth (number of hidden layers) and architecture of these layers determine what kind of patterns the network can learn and how well it can learn them.
without hidden layers, neural networks would be just linear regression. with them, they can learn the complex patterns that power modern ai applications.
Top comments (0)