This article was originally published on runaihome.com
Three generations of image models now live in a typical ComfyUI installation (Windows users: see our ComfyUI Windows setup guide), and the choice between them isn't obvious. SD 1.5 still commands the deepest fine-tune ecosystem ever built around a single model. SDXL is the default backbone for most home-lab artists. Flux.1 produces images that read as professional photography — handles human hands, readable in-image text, and complex lighting in ways that SD and SDXL can't reliably match.
The tradeoff is hardware. Flux requires 12–24 GB VRAM and takes 4–10× longer per image than SDXL on the same GPU. Whether that matters depends on how many images you generate per session and what GPU you're running. This article quantifies those costs: verified generation times across two GPU tiers, converted into dollar-per-image electricity costs at the current US average of $0.182/kWh (EIA 2026 forecast), and a cloud comparison that tells you when a $30/month Midjourney subscription is still the smarter call.
The Three Models at a Glance
| Model | Architecture | Parameters | Native Resolution | VRAM (FP16) | VRAM (FP8/GGUF) |
|---|---|---|---|---|---|
| SD 1.5 | U-Net | 860M | 512×512 | 4–6 GB | — |
| SDXL 1.0 | U-Net (dual) | 3.5B | 1024×1024 | 8–12 GB | — |
| Flux.1 Dev | DiT transformer | 12B | 1024×1024 | ~24 GB | 12–14 GB |
| Flux.1 Schnell | DiT transformer | 12B | 1024×1024 | ~24 GB | 12–14 GB |
SD 1.5 and SDXL use U-Net architectures — compact, fast, designed for iterative denoising. Flux uses a Diffusion Transformer (DiT) architecture at 12 billion parameters. The quality jump is observable and consistent: Flux renders legible text in generated images, renders human anatomy with significantly fewer errors, and handles complex multi-element compositions more coherently. SDXL cannot do any of these reliably.
Dev vs Schnell: Flux.1 Schnell uses knowledge distillation to produce usable images in 4 steps instead of the 20+ steps Flux Dev requires. Schnell is Apache 2.0 licensed; Dev carries a non-commercial research restriction. For personal home use, either is legally fine. Schnell is faster, but most users running quality-critical work prefer Dev at 20 steps for the added detail — especially for photorealistic subjects.
Raw Speed: Verified Benchmarks
The SDXL numbers below come from ComfyUI's public benchmark thread (Discussion #2970), which aggregated community-submitted hardware results for SDXL 1.0 at 1024×1024, 20 steps in ComfyUI. The Flux.1 Dev numbers come from ComfyUI Discussion #4571 (RTX 4090 Flux benchmarks, multiple contributors). SD 1.5 timings are derived from Automatic1111 community benchmarks; the 4090 vs 3090 ratio is confirmed by Tom's Hardware testing.
SDXL at 1024×1024, 20 steps
| GPU | it/s | Sec/Image |
|---|---|---|
| RTX 3070 8 GB | 2.26 | 8.8 s |
| RTX 3090 24 GB | 3.61 | 5.5 s |
| RTX 4080 16 GB | 3.53 | 5.7 s |
| RTX 4090 24 GB | 7.61 | 2.6 s |
The RTX 3090 and RTX 4080 16GB land within 3% of each other on SDXL — roughly equal inference speed despite the VRAM difference. The RTX 4090 pulls ~2× ahead.
Flux.1 Dev at 1024×1024, 20 steps
| GPU | Precision | Sec/Image |
|---|---|---|
| RTX 4090 | FP8 + --fast
|
9–10 s |
| RTX 4090 | Q8 GGUF | 15–17 s |
| RTX 4090 | FP16 full | 18–41 s |
| RTX 3090 | FP8 | ~14–18 s |
The FP16 time for RTX 4090 varies widely (18–41 s) depending on whether torch.compile is active and whether the VRAM pressure forces any CPU offloading. FP8 with --fast is the practical default on 24 GB cards — it fits cleanly, the quality delta from FP16 is undetectable at normal viewing distances, and the 9–10 second generation time is genuinely workflow-usable.
The RTX 3090 FP8 estimate (~14–18 s) is derived from community reports of the 3090 running approximately 40–45% slower than the 4090 per iteration, consistent with multiple benchmark sources.
Flux.1 Schnell at 1024×1024, 4 steps
| GPU | Precision | Sec/Image |
|---|---|---|
| RTX 4090 | FP8 | ~4–5 s |
| RTX 3090 | FP8 | ~6–8 s |
Schnell at 4 steps is competitive with SDXL at 20 steps in pure generation time on the RTX 4090. Quality isn't SDXL-equivalent — it's better in photorealism, weaker in fine-grained compositional control where SDXL's ecosystem of refined samplers and CFG schedules still has an edge. For prompt-exploration workflows where you're running 50+ generations to find the right composition, Schnell makes Flux economically viable on a 3090.
SD 1.5 at 512×512, 50 steps
| GPU | it/s | Sec/Image |
|---|---|---|
| RTX 4090 | ~37.6 | ~1.3 s |
| RTX 3090 | ~18.8 | ~2.7 s |
SD 1.5's native resolution is 512×512. At that resolution and 50 steps, the RTX 4090 generates roughly 46 images per minute. The gap over SDXL and Flux in raw throughput is dramatic. For workflows that require hundreds of iterations — LoRA testing, prompt engineering sessions, batch rendering concept grids — SD 1.5's speed advantage is real and meaningful.
The Electricity Math
At $0.182/kWh (US residential average, EIA 2026 forecast) and official NVIDIA TDPs (RTX 4090: 450W, RTX 3090: 350W):
Formula: cost = (seconds/image × 1000 images ÷ 3600) × (TDP_kW) × ($/kWh)
| Model | GPU | Sec/Image | TDP | Cost / 1,000 Images |
|---|---|---|---|---|
| SD 1.5 | RTX 4090 (450W) | 1.3 s | 450W | $0.030 |
| SD 1.5 | RTX 3090 (350W) | 2.7 s | 350W | $0.048 |
| SDXL | RTX 4090 (450W) | 2.6 s | 450W | $0.060 |
| SDXL | RTX 3090 (350W) | 5.5 s | 350W | $0.097 |
| Flux Schnell | RTX 4090 (450W) | 4.5 s | 450W | $0.102 |
| Flux Schnell | RTX 3090 (350W) | 7.0 s | 350W | $0.124 |
| Flux Dev (FP8) | RTX 4090 (450W) | 10 s | 450W | $0.228 |
| Flux Dev (FP8) | RTX 3090 (350W) | 16 s | 350W | $0.284 |
Three things stand out:
1. Electricity is not the cost driver — hardware is. Even running Flux Dev on an RTX 3090 at full throughput 24/7 for a month produces roughly 162,000 images and costs about $46 in electricity. The GPU purchase is always the dominant number.
2. Flux Schnell on an RTX 4090 costs roughly the same electricity-per-image as SDXL on an RTX 3090. The 4090 generates Schnell images nearly twice as fast, which largely cancels out its higher TDP.
3. The gap from SDXL to Flux Dev is real. At 10 seconds per image versus 2.6 seconds, Flux Dev takes 3.8× longer on the same 4090, which translates to 3.8× the electricity cost. For 10,000 images monthly, that's $2.28 vs $0.60 in electricity — not consequential on its own, but multiply by years and it adds up.
VRAM Tiers: What You Can Actually Run
The VRAM question isn't just about whether a model fits — it's about whether it fits at a speed that matches your workflow.
12 GB cards (RTX 3060 12GB, RTX 4060 Ti 12GB): SD 1.5 at full speed. SDXL runs but benefits from 16 GB headroom, especially with ControlNet or a refiner loaded simultaneously. Flux requires GGUF Q5 or lower quantization and will use CPU offloading for the text encoders — expect 30–60 seconds per image. Usable for final production renders, impractical for iterative workflows.
ComfyUI's Dynamic VRAM system (released March 2026) improved the 12 GB Flux experience by reducing peak RAM pressure, but it doesn't change the fundamental compute bottleneck. The 3060 12GB is still a solid SDXL card — it's a slow Flux card.
**16 GB cards ([RTX 4060](https://www.amazon.com/s?k=RTX+4060&tag=ru
Top comments (0)