DEV Community

Gandalf the Gato
Gandalf the Gato

Posted on

GPU cloud pricing: a practical checklist before renting H100s

#ai

Every GPU rental page is basically the same trap in a different hat.

The big number says $2/hr H100 and your brain goes nice, infrastructure is solved. Then the real bill crawls out from under the couch wearing little shoes.

So I put together a small practical checklist for comparing cloud GPU rentals without opening twelve pricing tabs and pretending this is research.

Live board I am maintaining: https://gpu.fund

The first mistake: comparing sticker prices

Hourly price is useful, but it is not the cost.

A better rough estimate is:

estimated run cost = GPU hourly rate * runtime hours * GPU count
                   + persistent storage
                   + bandwidth or egress
                   + idle setup/debug time
                   + retries, preemptions, or failed runs
Enter fullscreen mode Exit fullscreen mode

For short experiments, idle setup time can matter more than a five-cent hourly difference.

For production inference, utilization matters more than the sticker price.

For training, topology can matter enough to make the cheaper machine secretly expensive.

Check the exact GPU, not just the family

An H100 is not always just an H100.

Before comparing prices, check whether you are looking at H100 SXM, H100 PCIe, H100 NVL, H200, B200, MI300X, A100 80GB, or something with a marketing label that deserves suspicion.

VRAM matters too. If your model fits into 24GB, test on an RTX 3090, 4090, or 5090 before paying H100 money.

If you actually need 80GB VRAM, compare A100 80GB, H100, H200, and MI300X. Do not default to the newest NVIDIA card just because the number is bigger and your wallet is unsupervised.

Normalize per GPU-hour

Some providers show the machine price. Some show a per-GPU price. Some show a thing that looks like a price until you click three more times and find out the node has eight GPUs and a storage policy written by a goblin.

Normalize everything to per GPU-hour before comparing.

Then separately track:

  • number of GPUs
  • VRAM per GPU
  • total machine price
  • region
  • disk cost
  • bandwidth or egress terms
  • interruptible or reserved status

This is boring, which is how you know it saves money.

Availability beats theoretical cheapness

The cheapest H100 is not cheap if it never launches.

Queue time, failed provisioning, low stock, and preemptions all become cost. If you spend an hour debugging provider weirdness to save 30 cents, congratulations, you have invented negative wage engineering.

For production jobs, I would rather pay slightly more for a provider that launches reliably and gives clear inventory.

For experiments, cheap marketplaces are great. Just assume some time will vanish into the machine fog.

Do not forget storage and egress

The classic move is renting a GPU for a few hours, shutting it down, and leaving persistent volumes, snapshots, images, model checkpoints, and data transfer costs quietly nibbling at the card.

Before a run, decide what survives afterward.

After a run, delete what does not.

Deeply advanced stuff. Almost nobody does it.

My practical shortcut list

  • Debug on the cheapest compatible GPU first.
  • Move to H100/H200/B200 only when throughput or VRAM justifies it.
  • Compare throughput per dollar, not just price per hour.
  • Use spot or interruptible only if your job can resume cleanly.
  • Keep datasets near the compute when possible.
  • Check topology before multi-GPU training.
  • Kill idle boxes aggressively.

A small live reference

I am keeping a live GPU rental price board here:

https://gpu.fund

It tracks cloud GPU rental prices across providers and has a report page for quick market checks:

https://gpu.fund/report

I also wrote a longer hidden-cost checklist here:

https://gpu.fund/blog/hidden-costs-cloud-gpu-rentals

Prices move constantly, so treat any static example as stale the moment it is posted. The useful habit is the comparison process, not one magic cheapest provider.

If your model fits on the boring GPU, rent the boring GPU. The boring GPU has saved more startups than the inspirational tweet about scaling AI infrastructure.

Top comments (0)