Thurmon Demich

Posted on Jun 22 • Originally published at bestgpuforai.com

Flux.2 vs SD 3.5 Hardware: GPU Requirements Compared 2026

#gpu #flux2 #stablediffusion35 #imagegeneration

From the Best GPU for AI archive. The canonical version has interactive calculators, an up-to-date GPU comparison table, and live pricing.

I get this question almost weekly now: "Should I buy a GPU for Flux.2 or for Stable Diffusion 3.5?" The honest answer is that those two models pull in very different directions on hardware, and picking the wrong one means either burning money on VRAM you don't need or stuttering through 30-second generations on a card that was never going to keep up.

So here's the head-to-head, with the numbers I actually trust from running both stacks locally over the past two months.

Quick answer

Flux.2 (quality-first, willing to wait): RTX 5090 if you want FP16 headroom, RTX 5080 16GB if you're sane and use FP8.
SD 3.5 Large (fast, flexible, ControlNet-heavy): RTX 4080 Super or RTX 5070 Ti. Fits FP16 with room for LoRAs.
SD 3.5 Medium (iteration speed, hobbyists): RTX 4060 Ti 16GB or even RTX 3060 12GB at FP8.
One card for both: RTX 5080. It's the only "I don't want to think about this again" answer.

Who this is for

You're either choosing a GPU specifically to run one of these two flagship image models, or you already have a card and need to know which model your hardware can realistically support in 2026. I'm assuming you care about local generation (privacy, batch work, custom LoRAs) rather than just hitting an API.

If you're still on Flux.1 Dev and not sure whether Flux.2 is worth the upgrade, I'll cover that too — the VRAM jump is real.

VRAM side-by-side

This is the table I wish someone had handed me when Flux.2 dropped its FP8 build in May. Numbers below assume a 1024×1024 generation with standard pipelines, no offloading tricks, modest LoRA stack.

Model	Params	FP16 VRAM	FP8 VRAM	Q4 VRAM
Flux.2 Dev	32B	~28 GB	~16 GB	~10–12 GB
SD 3.5 Large	8B	~14 GB	~7–8 GB	~5 GB
SD 3.5 Medium	2.6B	~6 GB	~4 GB	~3 GB

A few things jump out. Flux.2 in full FP16 is essentially an RTX 5090 / 4090 exclusive — anything less and you're swapping to system RAM, which kills the speed advantage. The FP8 path is the great equalizer: NVIDIA's May optimization pass made FP8 Flux.2 fit comfortably in 16GB cards with almost no visible quality loss in side-by-side blind tests I ran.

SD 3.5 Large at FP16 is the sweet spot for 12–16GB cards. SD 3.5 Medium is essentially free hardware-wise — if you're on an 8GB laptop GPU, this is the only modern image model that doesn't fight you.

VRAM chart available at the original article

For the VRAM math behind these numbers — KV cache, activation overhead, why FP8 isn't half of FP16 in practice — I broke that down in the how much VRAM for Flux guide. Same principles apply to Flux.2, just shifted up.

Speed side-by-side

Generation times below are 1024×1024 at 30 steps, measured on stock ComfyUI pipelines with each model in its recommended precision. I'm reporting the median of five runs after a warm-up generation (first run on any pipeline is slower).

GPU	Flux.2 FP8	SD 3.5 Large FP16	SD 3.5 Medium FP16
RTX 5090	~12–14 s	~5–7 s	~2–3 s
RTX 4090	~18–22 s	~8–12 s	~3–5 s
RTX 5080	~22–26 s	~9–13 s	~4–6 s
RTX 5070 Ti	~28–32 s	~12–15 s	~5–7 s
RTX 4070 Ti Super	~30–35 s	~13–16 s	~5–7 s
RTX 4060 Ti 16GB	~55–70 s (Q4)	~22–28 s	~9–12 s

Two patterns matter here. First, Flux.2 is roughly 2x slower than SD 3.5 Large on the same GPU, even though it's only ~4x the parameters. The 32B architecture is more compute-bound than memory-bound at FP8, so newer-gen cards (5090 / 5080) pull ahead more than raw spec sheets suggest. Second, SD 3.5 Medium is so fast it changes how you work — you can iterate prompts at near-interactive speed.

Which model should drive your buy?

This is where most "comparison" articles go vague. Here's the decision logic I actually use when friends ask:

Quality-first, willing to wait 20 seconds: Flux.2 wins on prompt adherence, complex composition, and especially text rendering. Buy for Flux.2 — that means 16GB minimum (RTX 5080 / 5070 Ti / 4070 Ti Super).
Speed-first, iteration matters more than peak quality: SD 3.5 Large gives you ~3x the throughput at 85% of the quality. Buy for SD 3.5 — RTX 4080 Super or RTX 5070 Ti is plenty.
VRAM-tight (12GB card, can't upgrade): SD 3.5 Large at FP8 is your ceiling. Flux.2 will technically run at Q4 but the quality gap to FP8 is noticeable, unlike the FP16→FP8 step.
Budget under $500: RTX 4060 Ti 16GB. Stick with SD 3.5 Medium for daily work, dip into SD 3.5 Large FP8 for hero shots. Skip Flux.2 entirely — the experience isn't worth it.
Multi-modal / ControlNet-heavy: SD 3.5's ecosystem is dramatically more mature. Flux.2 ControlNets exist but are sparse as of mid-2026.

If you're building a serious ComfyUI workflow with multiple models loaded at once, the calculus shifts again — I covered the loadout question in the best GPU for ComfyUI guide, and the short version is that 16GB is the new floor for serious node graphs.

Flux.2 isn't always the better choice

Worth saying plainly: Flux.2 gets called "the new standard" a lot, and that's not wrong on absolute quality, but it ignores three things.

It's slow. A 20-second generation breaks the prompt-iteration loop. If you generate 200 images a day refining a concept, SD 3.5 Large will get you to the final image faster even though Flux.2's individual outputs are better.

The LoRA ecosystem is still catching up. Civitai had thousands of Flux.1 LoRAs within months. Flux.2 LoRAs exist but the long tail of style/character/concept training that makes SD ecosystems sticky is still building.

And the VRAM floor is brutal. If you're running on an RTX 3080 10GB or any 12GB card, Flux.2 forces you into Q4 territory where the quality lead over SD 3.5 Large evaporates. In that lane, Flux.1 Dev on the same hardware is honestly the better stopgap until you upgrade — it runs cleaner at 12GB than Flux.2 at Q4 does.

Common mistakes

Buying a 12GB card hoping to run Flux.2 well. It runs. It doesn't run well. Q4 quality at 50+ second generations is not the experience anyone wants. 16GB is the real Flux.2 floor.
Assuming SD 3.5 = SDXL hardware. It doesn't. SD 3.5 Large is meaningfully heavier than SDXL — closer to 14GB FP16 vs SDXL's ~10GB. If you sized a build for SDXL, check before you upgrade the model.
Ignoring FP8 because "it's lower precision." On Blackwell and Ada Lovelace, FP8 quality loss for both these models is below visual detection threshold in blind A/B tests. The VRAM and speed wins are free.
Buying a 5090 for SD 3.5 Medium. You will not see the GPU sweat. Generation will be limited by CPU/IO. Buy a 4070 Ti Super and pocket the difference.

For a broader view across all SD-family models on consumer GPUs, the best GPU for Stable Diffusion overview covers the older SDXL / SD 1.5 considerations that still matter for the LoRA back-catalog. And if Flux.2 specifically is your target, the best GPU for Flux.2 buyer's guide goes deeper on memory bandwidth and the May 2026 FP8 optimization details.

Final verdict

Use case	Pick	Real budget
Flux.2 FP16 (no compromise)	RTX 5090	~$2,000+
Flux.2 FP8 daily driver	RTX 5080	~$1,000
SD 3.5 Large primary	RTX 5070 Ti	~$750
Mixed Flux.2 / SD 3.5 budget	RTX 4070 Ti Super	~$800
SD 3.5 Medium + occasional Large	RTX 4060 Ti 16GB	~$450
ML enthusiast on used market	RTX 3090	~$700 used

The one-sentence verdict: if you can only buy one card for both models in 2026, it's the RTX 5080 — and if you can't, build around whichever model you actually use 80% of the time, not the one that benchmarks prettier.

Related guides on Best GPU for AI

Read the full guide on Best GPU for AI — includes our VRAM calculator, GPU comparison table, and live pricing.

DEV Community