The hidden cost of cloud GPU training: egress, idle time, and lock-in

#ai #agents #database

The GPU hourly rate is the number everyone compares. It is also the number that tells you the least about what a training run actually costs.

The sticker price, say $2 to $3.50 an hour for an H100 on a specialized cloud, is the visible tip. The real bill is built from three things almost nobody puts on the comparison spreadsheet: the GPU sitting idle, the data you have to move, and the cost of ever leaving. This post breaks down each one, with 2026 numbers, and what you can actually do about it.

1. Idle time: paying full price for nothing

The most expensive line item in most setups is not compute. It is compute you pay for but never use.

The 5 percent problem

A 2026 Cast AI report found average GPU utilization across Kubernetes clusters on major clouds sits around 5 percent. Other analyses are kinder, Anyscale puts sustained production utilization below 50 percent, FinOps studies land at 20 to 30 percent, but the conclusion holds: most of every GPU-hour you pay for produces no useful work.

Why it happens

It is structural, not laziness. Workloads bounce between CPU preprocessing, GPU training, and CPU postprocessing. Python dataloaders on the GPU node starve the accelerator. Teams overprovision to dodge out-of-memory errors and default to the biggest instance "just in case."

Why it hurts more than CPU waste

An idle CPU costs cents per hour. An idle GPU costs dollars per hour. A single AWS p4d.24xlarge left idle over one weekend burns about $1,573 for nothing. A month of overnight and weekend idling typically wastes $3,000 to $8,000 per instance.

What to do

Add idle detection. A script watching nvidia-smi that scales down an instance after utilization stays below ~5 percent for 30 minutes is the highest-ROI thing most teams can ship. Commonly cuts 20 to 35 percent off GPU spend.
Right-size the hardware. Not every job needs an H100. Running on the biggest card when a smaller one delivers the same result is pure burn.
Fix the pipeline first. If dataloaders are starving the GPU, a bigger GPU does not help. Profile the input pipeline before upgrading hardware.

2. Egress: the cost of moving your own data

Uploading data is free everywhere. Moving it out is not, and the asymmetry is deliberate.

The 2026 rates

Verified across multiple pricing surveys: AWS charges about $0.09 per GB outbound, roughly $90 per TB. Google Cloud is higher at $0.12 per GB. Azure sits in the same range. Hetzner includes large free allowances and charges on the order of $1 per TB beyond them. Some object-storage options are zero egress entirely.

Why training amplifies it

The volumes are large and recurring. Datasets, checkpoints, and exported weights all move. On a workload pulling 10 TB out per month, the gap between a hyperscaler and a zero-egress provider is the difference between a four-figure line item and almost nothing.

What to do

Co-locate compute and storage. Keep training data in the same region and provider as the GPUs. Intra-zone transfer is usually free; internet egress is not.
Compress before transfer. gzip or zstd cuts checkpoint and dataset volume 30 to 60 percent, and egress is billed per byte.
Price egress into provider comparisons. A cheap GPU hour on a provider with expensive egress can lose to a pricier hour on one with none.

3. Lock-in: the bill you pay to leave

The third cost is the one you do not see until you try to escape it. It is the same mechanism as egress, viewed over a longer horizon.

Data gravity is the anchor

Once terabytes of data and checkpoints accumulate in one region, moving them is slow and expensive. The egress fee is not just a per-transfer cost, it is an exit tax that grows with every gigabyte you store. By design, the more your data piles up, the less likely you are to leave.

It is not only data

Proprietary services, custom tooling, and provider-specific orchestration all raise the cost of moving. But for training workloads, raw data gravity is the heaviest anchor.

What to do

Prefer portable formats and open tooling. Standard container images, open checkpoint formats, and provider-agnostic orchestration keep your options open.
Model the exit cost up front. Calculate what it would cost to move everything out at a year's expected data volume. If the number is alarming, factor it in now, not later.
Favor zero or low egress for data-heavy work. When leaving is cheap, lock-in mostly evaporates, and you keep leverage over your own infrastructure.

Putting it together

The headline GPU rate is the smallest part of the story. A realistic cost model looks more like this:

(GPU rate x hours x utilization gap) + storage + egress + the eventual cost of leaving

Two things follow. First, fixing utilization is usually the fastest win, because you are already paying for that waste today. Second, egress and lock-in are decisions you make once, at the start, that compound for as long as the project runs.

This is why the provider landscape is shifting. Specialized GPU clouds and regional providers increasingly compete on exactly these hidden costs: transparent hourly billing instead of a maze of ancillary fees, and zero egress so your data, and your freedom to move it, stays yours. Orion AI Factory in Europe is one example of that model, and the same logic shows up across a growing set of regional and specialized providers. The common thread is pricing the things that used to hide in the footnotes.

None of this needs exotic tooling. Watch your utilization, keep data close to compute, compress what you move, and know your exit cost before you are locked in. The teams that win on cost are not the ones with the biggest budgets. They are the ones who read past the hourly rate.

References

Cast AI, GPU utilization report, 2026
Anyscale, production GPU utilization analysis, January 2026
GPUPerHour, data egress pricing across 44+ providers (https://gpuperhour.com/reference/data-egress), April 2026
LeanOps, AI cloud cost optimization guide, 2026