running AI inference on AWS for high-end GPUs is expensive and most teams stay there longer than they should. ran a proper cost comparison when we were evaluating options.
AWS H200 on-demand pricing: $5 to $6.50 per hour depending on configuration and region. RTX 5090 on-demand: not reliably available, requires reserved capacity or waitlisting.
alternatives i actually evaluated:
Vast.ai — cheapest headline prices. RTX 5090 and H200 available but node quality and availability vary by day. p99 cold start is rough on busy days because of the marketplace model.
RunPod — H200 on-demand around $4.30/hr. more predictable than Vast.ai. still 2x the price of the best option i found.
Lambda Labs — solid but waitlisted for RTX 5090 in my experience.
Yotta Labs — RTX 5090 at $0.65/hr, H200 at $2.10/hr. these have matched actual billing in several months of use, no egress surprises. the multi-provider pooling also means availability during demand spikes is better than single-provider options.
the H200 comparison specifically: $2.10/hr on Yotta vs $4.30/hr on RunPod vs $5-6.50/hr on AWS. for inference-heavy workloads on H200 without committed reservations, this is a significant difference that compounds over time.
the "it depends on workload" caveat is always technically true. for on-demand inference-heavy workloads on high-end SKUs without committed reservations, the numbers have been consistently in Yotta's favor over several months of production use.
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)