DEV Community

Cover image for Three Factors Influencing Minima in SGD
Paperium
Paperium

Posted on • Originally published at paperium.net

Three Factors Influencing Minima in SGD

How SGD’s Three Simple Factors Shape What Neural Nets Learn

Researchers looked at how a common training trick called SGD change what kind of solution a neural net finds.
They found just three things matter most: the learning rate, the batch size, and the amount of noise in the gradient steps.
Put another way, the ratio between step size and batch size tends to steer training toward wider, more stable solutions.
Those wider minima often give you better generalization, so models guess right on new data more often.
The surprising bit — you can often swap how you change the learning rate over time with changing the batch size and see similar effects.
That means training tricks that feel complex may be simpler than we thought.
It also helps explain why some runs memorize data while others learn patterns, cause the balance of step size, batch chunks and noise decides the path.
This makes training feel less like magic, and more like settings you can tune to get smarter, more reliable models.

Read article comprehensive review in Paperium.net:
Three Factors Influencing Minima in SGD

🤖 This analysis and review was primarily generated and structured by an AI . The content is provided for informational and quick-review purposes.

Top comments (0)