I thought I found a great deal on an H100.
~$2.50/hour. Way cheaper than what I’d seen elsewhere.
On paper, it looked like a no-brainer.
It wasn’t.
The mistake I made
Like most people, I compared GPU providers based on:
hourly price
That’s how every pricing page is structured.
So naturally, that’s how we evaluate them.
But after actually running workloads, it became obvious:
the hourly rate is one of the least important numbers.
What actually matters: cost per useful compute
The real question isn’t:
“How much does this GPU cost per hour?”
It’s:
“How much does it cost to get the result I want?”
Training run. Inference throughput. Completed job.
Once you look at it that way, things change fast.
Where the extra cost comes from
Here are the biggest ones I’ve seen:
1. Idle GPUs (this adds up fast)
GPUs are rarely fully utilized.
- jobs wait on data
- pipelines stall
- you overprovision “just in case”
If your GPU is sitting idle 30–40% of the time, your “cheap” instance isn’t cheap anymore.
2. Data movement (way bigger than people expect)
At small scale, compute dominates.
At larger scale:
- dataset transfers
- checkpoint syncing
- cross-region traffic
These costs quietly pile up.
In some setups, they can rival or even exceed compute costs.
3. Retries + interruptions
Stuff fails.
- spot instances get reclaimed
- jobs crash
- pipelines restart
Every retry:
- wastes progress
- extends runtime
- increases total cost
Cheap infra that fails more often = expensive infra.
4. Operational overhead
This one’s less obvious, but real:
- time spent debugging infra
- managing clusters
- fixing deployment issues
A slightly more expensive provider that “just works” can be cheaper overall.
Why this keeps happening
Hourly pricing is simple.
It’s easy to compare.
And it looks precise.
But it hides most of the variables that actually drive cost.
A better way to think about it
Instead of comparing:
$/hour
I’ve started thinking in terms of:
- cost per training run
- cost per 1M inferences
- cost per completed job
And asking:
- how utilized is the GPU actually?
- how often do jobs fail?
- how much data is moving around?
The takeaway
The cheapest GPU on paper is often not the cheapest in practice.
And the difference can easily be 2× depending on how things are set up.
I’ve been digging into this while building tools to compare real GPU/cloud costs across providers.
Curious how others are thinking about this.
Are you still comparing providers by hourly price, or looking at full workload cost?
Top comments (0)