DEV Community

Ultra Dune
Ultra Dune

Posted on

EVAL #005: GPU Cloud Showdown — Lambda Labs vs CoreWeave vs RunPod vs Vast.ai vs Modal vs AWS/GCP/Azure

EVAL #005: GPU Cloud Showdown — Lambda Labs vs CoreWeave vs RunPod vs Vast.ai vs Modal vs AWS/GCP/Azure

By Ultra Dune | March 2026


You need GPUs. You needed them yesterday. And the market is a mess.

Between NVIDIA's supply constraints, the explosion of foundation model training, and every startup pivoting to "AI-native," GPU compute has become the most contested resource in tech. The problem isn't just availability — it's the sheer confusion of options. There are now roughly two dozen GPU cloud providers, each with wildly different pricing models, GPU generations, billing granularity, and developer experience.

We spent the last three weeks benchmarking, price-comparing, and stress-testing six categories of GPU cloud: the AI-native specialists (Lambda Labs, CoreWeave), the marketplace/community plays (RunPod, Vast.ai), the serverless-first platform (Modal), and the hyperscalers (AWS, GCP, Azure). This is the no-BS breakdown.

If you're spending more than $500/month on GPU compute — or about to start — this issue will save you real money and real headaches.


The Comparison Table

Prices shown are on-demand per GPU-hour for the most common training GPU (H100 80GB SXM) unless noted. Prices fluctuate; these reflect March 2026 averages.

Provider H100 80GB $/hr A100 80GB $/hr H200 $/hr Min Billing Multi-node Persistent Storage API/CLI
Lambda Labs $2.49 $1.29 $3.49 1 hr Yes (8x) Yes (NFS) CLI + API
CoreWeave $2.23 $1.02 $3.19 10 min Yes (256+) Yes (block + NFS) K8s native
RunPod $2.39 $1.09 $3.29 1 min Yes (8x) Yes (network vol) CLI + API
Vast.ai $1.40–2.10* $0.70–1.05* $2.20–3.00* 1 min Limited Yes (local) CLI + API
Modal $2.89 $1.38 1 sec No Ephemeral (volumes) Python SDK
AWS (p5) $3.22 $1.64 $3.89 1 sec Yes (EFA) Yes (EBS/EFS) Full SDK
GCP (a3) $3.06 $1.54 $3.69 1 min Yes (GPUDirect) Yes (PD/Filestore) Full SDK
Azure (ND) $3.19 $1.59 $3.79 1 min Yes (IB) Yes (managed disk) Full SDK

*Vast.ai prices vary by host — shown as typical range. Community GPUs can go lower but reliability varies.

Reserved/committed pricing can cut these by 30–60% across all providers.


Per-Provider Analysis

Lambda Labs — The Developer's Default

Lambda built its reputation on simplicity. You sign up, you get a VM with GPUs, PyTorch is pre-installed, and you SSH in. That's it. No Kubernetes expertise required, no IAM policy nightmares, no 47-click console workflow.

What's good: Fastest time-to-GPU in the industry. Their on-demand H100 clusters are genuinely available most of the time now (a huge improvement over 2024's waitlists). The 1-Click Clusters feature for multi-node training actually works — InfiniBand networking, NCCL optimized, ready to go. Their persistent storage via NFS is simple and reasonably priced at $0.10/GB/month.

What's not: Pricing is mid-range, not cheap. No spot/preemptible instances. Region selection is limited (mostly US). The platform is intentionally bare-bones — if you want autoscaling, load balancers, or managed Kubernetes, look elsewhere. Minimum 1-hour billing stings for quick experiments.

Best for: Individual researchers, small teams doing fine-tuning and training runs. People who value simplicity over features.

CoreWeave — The Serious Training Infrastructure

CoreWeave is where you go when you need 64+ GPUs and you know what you're doing. Built on Kubernetes from the ground up, they offer the most infrastructure-grade GPU cloud outside the hyperscalers — arguably better than the hyperscalers for pure GPU workloads.

What's good: Best price-to-performance ratio for large-scale training. InfiniBand networking across massive clusters (they run some of the largest H100 clusters commercially available). Kubernetes-native means you get all the orchestration tooling you already know. 10-minute minimum billing is fair. Their reserved pricing (1-year commit) drops H100s to ~$1.45/hr — hard to beat. Virtual server and bare-metal options. They now have solid H200 availability.

What's not: The learning curve is real. If you're not comfortable with Kubernetes, Helm charts, and persistent volume claims, you'll struggle. Onboarding requires approval and can take days. Documentation has improved but still has gaps. Not great for quick one-off experiments.

Best for: Funded startups and companies doing serious pre-training and large-scale fine-tuning. Teams with DevOps/MLOps capability.

RunPod — The Versatile Middle Ground

RunPod has quietly become one of the best all-around GPU clouds. They offer both "GPU Pods" (persistent VMs) and "Serverless" (pay-per-second inference endpoints), covering the full spectrum from training to deployment.

What's good: Per-minute billing keeps costs tight. The template system is excellent — spin up a pod with your exact environment in seconds. Serverless GPU for inference is genuinely competitive with dedicated endpoints at lower traffic. Community Cloud option gives access to cheaper GPUs (like Vast.ai but curated). Their Spot instances save 30–50%. Good GPU variety: everything from RTX 4090s ($0.44/hr) to H100s. Network volumes let you persist data across pods.

What's not: Multi-node training is limited to 8-GPU nodes; no InfiniBand across nodes. Networking bandwidth between pods isn't great. Uptime SLAs are weaker than CoreWeave or hyperscalers. Support can be slow on lower tiers.

Best for: Teams that need both training and inference. Indie developers and small startups. Anyone who wants flexibility without hyperscaler complexity.

Vast.ai — The Marketplace Wild Card

Vast.ai is the "Airbnb of GPUs" — a marketplace where anyone with spare GPU capacity can list it, and you rent at market-clearing prices. This makes it the cheapest option, period. But cheap comes with caveats.

What's good: Prices can be 40–60% below on-demand rates elsewhere. A100 80GB for $0.70/hr is real. You can filter by GPU model, VRAM, CPU, RAM, disk, region, reliability score, and internet speed. On-demand and interruptible (bid) pricing. Great for batch jobs, data preprocessing, and inference workloads that tolerate interruption. DLPerf scores help you compare actual GPU performance across hosts.

What's not: Reliability is the elephant in the room. Hosts can go offline. Network speeds vary wildly. Storage is local to the machine — if it goes down, your data may too (always use external backups). Multi-node training is essentially unsupported. No InfiniBand. Security is a concern — you're running on someone else's hardware with limited auditability. Support is community-forum level.

Best for: Cost-optimized batch processing, experimentation, and hobbyist/research workloads where interruption is acceptable. Not for production.

Modal — The Serverless Dream (With Limits)

Modal is architecturally different from everything else on this list. There are no VMs to manage. You write Python, decorate functions with GPU requirements, and Modal handles the rest — cold starts, scaling, scheduling, storage. It's the closest thing to "GPU-as-a-function."

What's good: Developer experience is extraordinary. The Python SDK is a joy. Cold starts are fast (often under 15 seconds for common images). Per-second billing means you pay only for actual compute — no idle time. Built-in secrets management, cron scheduling, web endpoints, and shared volumes. Great for inference APIs, batch processing pipelines, and fine-tuning jobs. Scales to zero automatically.

What's not: No SSH access — you must work within Modal's programming model. Long-running training jobs (12+ hours) feel awkward. No multi-node training support. H200/B200 availability is behind other providers. Pricing per GPU-hour is higher than competitors, but effective cost can be lower due to zero idle time. Vendor lock-in is real — your code is Modal-specific.

Best for: Inference APIs, scheduled batch jobs, fine-tuning pipelines. Teams that prioritize developer velocity over raw cost. The best option if you hate managing infrastructure.

AWS / GCP / Azure — The Enterprise Tax

The hyperscalers need no introduction. They're the most expensive option for raw GPU compute, but they offer things nobody else can: global regions, compliance certifications (HIPAA, SOC2, FedRAMP), managed ML platforms (SageMaker, Vertex AI, Azure ML), and integration with the rest of your cloud stack.

What's good: If you're already on AWS/GCP/Azure, the network effects are real — VPC peering, IAM, managed storage, logging, monitoring all just work. Spot/preemptible instances can cut costs 60–70% (p5 spot H100 at ~$1.30/hr on AWS when available). Managed training services reduce ops burden. Enterprise support and SLAs. Multi-region availability. GCP's A3 Mega instances with H200s and GPUDirect-TCPXO networking are genuinely impressive for large-scale training.

What's not: On-demand pricing is 30–50% more than specialists. Spot availability for GPU instances is unreliable — you'll get interrupted. The console/API complexity is legendary. Quotas are real: getting approval for 8x H100 instances can take weeks. Billing surprises from associated resources (networking, storage, data transfer) add up fast. You're paying the enterprise tax whether you need enterprise features or not.

Best for: Companies with existing cloud commitments, compliance requirements, or need for managed ML pipelines. Large enterprises. Teams already deep in a hyperscaler ecosystem.


The Recommendation Matrix

Here's who should use what, cut to the bone:

"I'm one person fine-tuning models"
→ RunPod or Lambda Labs. RunPod if cost matters. Lambda if you want zero friction.

"We're training a foundation model (100+ GPUs)"
→ CoreWeave. Full stop. Reserved pricing, InfiniBand, Kubernetes orchestration. If you need compliance, GCP A3 Mega.

"We need inference endpoints in production"
→ Modal for serverless/bursty traffic. RunPod Serverless for steady-state. AWS SageMaker if you're already on AWS.

"We're a research lab with tight budgets"
→ Vast.ai for batch experiments. Lambda for cluster work. RunPod Community Cloud as a middle option.

"We're an enterprise with compliance needs"
→ AWS/GCP/Azure. No alternative until CoreWeave finishes SOC2 (expected mid-2026).

"I just want the cheapest H100 hours possible"
→ Vast.ai interruptible, then CoreWeave reserved (1-year commit), then RunPod Spot.

"I hate DevOps and just want to ship"
→ Modal. It's not even close.


The Changelog

Notable releases and updates from the past two weeks:

  1. PyTorch 2.7 released — Improved torch.compile performance (+15% on transformer workloads), native FP4 quantization support, and better FSDP2 integration. Upgrade if you're doing distributed training.

  2. Ollama 0.9 ships structured outputs — JSON mode with schema validation built into the inference engine. Also adds vision model batching. The local inference stack keeps getting better.

  3. vLLM 0.8.2 adds speculative decoding by default — Up to 2.4x throughput on supported models (Llama 3.3, Qwen 2.5, Mistral Large). Draft model selection is automatic.

  4. Hugging Face Inference Endpoints now supports H200 — 2x the VRAM bandwidth vs H100. Noticeable for large-batch inference on 70B+ models.

  5. NVIDIA CUDA 12.8 released — Adds FP4 tensor core support for Blackwell, improved memory pool APIs, and better MIG (Multi-Instance GPU) flexibility for H100/H200.

  6. CoreWeave launches EU region (London) — H100 and H200 availability in LHR-1. Important for data residency requirements under GDPR.

  7. Weights & Biases launches Weave 2.0 — Production LLM monitoring and tracing, now with cost tracking per inference call. Integrates with OpenAI, Anthropic, and open-source endpoints.


The Signal

Trends worth watching:

GPU prices are finally dropping — but not uniformly. H100 spot prices have fallen 25% since January as H200 and B200 supply ramps up. But on-demand pricing has barely moved. The gap between spot and on-demand is widening, which means the market is bifurcating: sophisticated buyers with fault-tolerant workloads are getting deals, while everyone else pays full price. If you're not building spot/preemptible into your training pipeline, you're leaving money on the table.

The "GPU cloud" category is consolidating. Two smaller providers (Paperspace/DigitalOcean GPU and Jarvis Labs) have quietly shut down or frozen new signups in Q1 2026. CoreWeave's IPO has given it a war chest. Lambda raised another round. The marketplace models (Vast, RunPod Community) are aggregating long-tail supply. Expect 2-3 more exits or acqui-hires by year end. Pick providers with staying power.

Inference is eating training's lunch in GPU demand. Morgan Stanley estimates that by late 2026, 70% of GPU cloud spend will be inference, not training. This explains why every provider is shipping inference-optimized products (Modal's serverless, RunPod's endpoints, CoreWeave's NVIDIA Triton integration). If you're building a GPU cloud strategy, optimize for inference economics first — that's where your long-term spend will be.


Subscribe

EVAL is a weekly signal for AI engineers who build. No hype, no hand-waving — just what matters.

Subscribe on Buttondown: https://buttondown.com/ultradune
Explore the repo: https://github.com/softwealth/eval-report-skills

If this issue saved you time or money, forward it to someone who's about to spin up their first GPU cluster. They'll thank you.

— Ultra Dune


EVAL #005 | March 2026 | GPU Cloud Showdown

Top comments (0)