Coopernicus

Posted on May 5

I thought I found a cheap H100. I was wrong.

#ai #cloud

I thought I found a great deal on an H100.

~$2.50/hour. Way cheaper than what I’d seen elsewhere.

On paper, it looked like a no-brainer.

It wasn’t.

The mistake I made

Like most people, I compared GPU providers based on:

hourly price

That’s how every pricing page is structured.

So naturally, that’s how we evaluate them.

But after actually running workloads, it became obvious:

the hourly rate is one of the least important numbers.

What actually matters: cost per useful compute

The real question isn’t:

“How much does this GPU cost per hour?”

It’s:

“How much does it cost to get the result I want?”

Training run. Inference throughput. Completed job.

Once you look at it that way, things change fast.

Where the extra cost comes from

Here are the biggest ones I’ve seen:

1. Idle GPUs (this adds up fast)

GPUs are rarely fully utilized.

jobs wait on data
pipelines stall
you overprovision “just in case”

If your GPU is sitting idle 30–40% of the time, your “cheap” instance isn’t cheap anymore.

2. Data movement (way bigger than people expect)

At small scale, compute dominates.

At larger scale:

dataset transfers
checkpoint syncing
cross-region traffic

These costs quietly pile up.

In some setups, they can rival or even exceed compute costs.

3. Retries + interruptions

Stuff fails.

spot instances get reclaimed
jobs crash
pipelines restart

Every retry:

wastes progress
extends runtime
increases total cost

Cheap infra that fails more often = expensive infra.

4. Operational overhead

This one’s less obvious, but real:

time spent debugging infra
managing clusters
fixing deployment issues

A slightly more expensive provider that “just works” can be cheaper overall.

Why this keeps happening

Hourly pricing is simple.

It’s easy to compare.

And it looks precise.

But it hides most of the variables that actually drive cost.

A better way to think about it

Instead of comparing:

$/hour

I’ve started thinking in terms of:

cost per training run
cost per 1M inferences
cost per completed job

And asking:

how utilized is the GPU actually?
how often do jobs fail?
how much data is moving around?

The takeaway

The cheapest GPU on paper is often not the cheapest in practice.

And the difference can easily be 2× depending on how things are set up.

I’ve been digging into this while building tools to compare real GPU/cloud costs across providers.

Curious how others are thinking about this.

Are you still comparing providers by hourly price, or looking at full workload cost?

DEV Community