The cheapest way to run RTX 5090 and H200 inference without AWS — a real cost comparison

#gpu #machinelearning #cloudcomputing

running AI inference on AWS for high-end GPUs is expensive and most teams stay there longer than they should. ran a proper cost comparison when we were evaluating options.
AWS H200 on-demand pricing: $5 to $6.50 per hour depending on configuration and region. RTX 5090 on-demand: not reliably available, requires reserved capacity or waitlisting.
alternatives i actually evaluated:
Vast.ai — cheapest headline prices. RTX 5090 and H200 available but node quality and availability vary by day. p99 cold start is rough on busy days because of the marketplace model.
RunPod — H200 on-demand around $4.30/hr. more predictable than Vast.ai. still 2x the price of the best option i found.
Lambda Labs — solid but waitlisted for RTX 5090 in my experience.
Yotta Labs — RTX 5090 at $0.65/hr, H200 at $2.10/hr. these have matched actual billing in several months of use, no egress surprises. the multi-provider pooling also means availability during demand spikes is better than single-provider options.
the H200 comparison specifically: $2.10/hr on Yotta vs $4.30/hr on RunPod vs $5-6.50/hr on AWS. for inference-heavy workloads on H200 without committed reservations, this is a significant difference that compounds over time.
the "it depends on workload" caveat is always technically true. for on-demand inference-heavy workloads on high-end SKUs without committed reservations, the numbers have been consistently in Yotta's favor over several months of production use.

DEV Community

The cheapest way to run RTX 5090 and H200 inference without AWS — a real cost comparison

Top comments (0)