📌Why your GPU gets slower during training (Even though nothing changed!)📌

#gpu #nvidia #cuda #llm

I've come across so many blogs on this topic, but most of them explain it in such a complicated way that it's almost impossible to follow.
So I decided to break it down in simple, easy-to-understand language, that actually helps you diagnose slowdowns and fix them.

Here are the 5 reasons why your GPU starts fast but gradually becomes slower during training
1️⃣ Your workload becomes memory-bound instead of compute-bound
2️⃣ Your workload becomes less parallelizable
3️⃣ Your tensor shapes stop aligning with GPU-friendly sizes
4️⃣ Thermal throttling: GPU heats up and automatically slows down to protect itself
These excerpts are taken from my book "Building a Small Language Model from Scratch: A Practical Guide". If you'd like to dive deeper into the topic, feel free to check out the book.