Thurmon Demich

Posted on May 31 • Originally published at bestgpuforai.com

RTX 5090 for AI in 2026: 6-Month Honest Retrospective

#rtx5090 #ai #retrospective #blackwell

This article was originally published on Best GPU for AI. The full version with interactive tools, FAQ, and live pricing is on the original site.

Quick answer: The RTX 5090 earned its keep for VRAM-bound work — Llama 70B at Q4, Flux.2 in FP16, beefier LoRA batches. But for image generation and most hobbyist workflows, the $400 premium over a 4090 wasn't worth it. Honestly, six months in, we recommend the 5090 only if you're VRAM-limited.

Who this is for

This is for the person sitting on a perfectly good RTX 4090 wondering if the 5090 upgrade is worth $2,000. Or the buyer choosing between them for a fresh AI rig. We've run both daily since January 2026 — SDXL, Flux, Llama 70B inference, LoRA training, some PyTorch research code — and the answer is more nuanced than the launch-week reviews implied.

What actually improved (the wins)

Let's start with what the 5090 genuinely changed.

32GB GDDR7 is the headline, and it earned it. The 4090's 24GB is a real ceiling. We hit it constantly: Llama 70B at Q4_K_M needs ~40GB but barely squeezes onto a 5090 with offloading at usable speeds — on a 4090 you're stuck at Q3 or splitting layers to CPU and watching tok/s collapse. Flux.2 at FP16 (full precision, not the gimped FP8 version) wants ~28GB. SDXL LoRA training with batch size 4 and full text encoder unfrozen? 4090 OOMs, 5090 fits with headroom. This is real, not marketing.

FP8 training is finally usable on consumer hardware. Blackwell's native FP8 tensor cores aren't just a checkbox — we've trained LoRAs in FP8 with measurable VRAM savings and ~1.7x throughput vs BF16 on the same card. The 4090 can do FP8 via software (Transformer Engine emulation) but it's clunky and the speedup evaporates. If you're serious about training, this matters. We've covered this trade-off in more depth in our best GPU for PyTorch guide, where FP8 native support genuinely shifts the recommendation.

Memory bandwidth at 1,792 GB/s vs 1,008 GB/s is a real LLM inference boost. Llama 70B Q4 went from "barely usable" (~12-15 tok/s with offloading on a 4090) to "actually fine" (~35-40 tok/s on a 5090, no offloading). For a 13B model, 5090 hits ~140 tok/s vs 95 on the 4090. That 35-46% spread on LLM tok/s is consistent across our RTX 4090 vs 5090 head-to-head testing and matches the Knightli benchmarks the community has been citing.

For AI research workloads, the 32GB unlocks experiments you couldn't run before. Bigger context windows, full-fp16 attention on longer sequences, MoE models with more experts loaded — the kind of work we cover in our best GPU for AI research deep-dive simply wasn't possible on a 4090 without dropping to quantization or splitting across cards. If you're doing actual research workloads (not just running pretrained models), the 5090's VRAM is a structural advantage.

What didn't change as much as advertised

Now the contrarian half — and this is where the launch-week articles oversold the card.

Image generation only improved 20-25%. SDXL went from 6.5s/img to ~4.0s/img. Flux dev from 18s to ~14s. That's nice, but not life-changing. If you're cranking out images, you'll notice. If you generate a few per session, you genuinely will not feel $400. Look — if you only do image gen, save your $400 and buy a 4090.

Gaming-grade BF16 isn't 2x faster. Despite the spec sheet showing ~130 TFLOPS FP16 vs 82.6 on the 4090 (a ~57% theoretical jump), real-world BF16 training throughput sits closer to +35-45% in our runs. The Blackwell scheduler and driver stack are still maturing — we've seen kernels regress between driver versions. Honestly, the 5090 disappointed us here. We expected 1.7-2x, got 1.4x.

The $400+ premium hurts when you account for PSU and case. 575W TGP vs 450W means a lot of 850W PSUs that ran a 4090 fine are now marginal. Add ~$150 for a quality 1000W unit, factor in transient spikes that have crashed some 1000W units (real reports, not theory), and you're looking at $500-600 total upgrade cost over a 4090. For 25% faster image gen.

For the majority of AI hobbyists, the 4090 is still the right answer. It's not slower in any way that breaks workflows — it's slower in ways that add seconds, not minutes. We still recommend it as the default in our best GPU for AI cluster guide, and after 6 months with both cards we're not walking that back.

Workload-by-workload verdict

Here's how we'd actually advise per workflow, based on six months of daily use:

Workload	Recommendation	Why
Image gen (SDXL, Flux dev)	RTX 4090	20-25% speedup doesn't justify $400. Both fit the models comfortably.
Image gen (Flux.2 FP16, video models)	RTX 5090	24GB OOMs on Flux.2 full precision. 32GB is the fix.
LLM inference (≤13B)	RTX 4090	Both run flat-out. 95 vs 140 tok/s is a "nice to have."
LLM inference (34B-70B)	RTX 5090	This is where 32GB earns the upgrade. Q4 70B is usable, not painful.
LoRA fine-tuning (small models)	RTX 4090	Workflows fit. Speedup exists but isn't transformative.
Full fine-tuning / FP8 training	RTX 5090	Native FP8 + bandwidth + VRAM all compound here. Real win.
AI research (long context, MoE, novel architectures)	RTX 5090	The 8GB extra unlocks experiments. Not optional.
Mixed hobbyist (some of everything)	RTX 4090	Honest answer. Most workflows aren't VRAM-limited.

Common mistakes we've watched people make

After six months of watching the 4090 → 5090 upgrade discourse, four mistakes keep coming up.

1. Buying the 5090 just for image generation. If your SDXL/Flux workflow runs fine on a 4090, you are buying 25% speed for $400. Save the money or put it toward a better monitor.

2. Underestimating the PSU and thermal upgrade. That 850W gold-rated PSU you bought for your 4090 build is marginal at 575W card TGP plus a modern CPU. Plan for $150-200 of platform upgrades, not just the GPU swap.

3. Assuming "newer architecture" means "always faster." Blackwell drivers were rough through Q1 2026. We saw PyTorch training kernels actually slower than Ada Lovelace on certain ops until the April driver. If you need rock-stable today, the 4090's mature software stack is genuinely an asset.

4. Buying the 5090 because the 5080 is bad. The RTX 5080 at 16GB is a real letdown for AI — too little VRAM for the price. Don't let that push you up to the 5090 unconsidered. The 4090 is the actual sweet spot in that lineup, not the 5090 by default.

Final verdict

	RTX 4090	RTX 5090
VRAM	24GB GDDR6X	32GB GDDR7
Bandwidth	1,008 GB/s	1,792 GB/s
Image gen	6.5s/img SDXL	4.0s/img SDXL
70B inference	Painful	Actually usable
FP8 training	Software emulation	Native, ~1.7x faster
TGP	450W	575W
Street price	~$1,600	~$2,000
Our verdict	Default pick for most	Worth it only if VRAM-bound

One-sentence verdict: The RTX 5090 is a real upgrade for VRAM-bound and FP8-training workflows — but for the average AI hobbyist running SDXL and 13B models, the 4090 still wins on value, and we'd buy it again.

Related guides on Best GPU for AI

The full version lives on Best GPU for AI — VRAM calculator, GPU comparison table, and live Amazon pricing.

DEV Community