This article was originally published on runaihome.com
The question isn't which model looks best in a side-by-side. Flux beats SDXL beats SD 1.5 on output quality — that hierarchy is settled. The question is what that quality upgrade actually costs you in GPU time, electricity, and cloud rental, because the spread is wide enough to change whether you buy hardware or rent it.
At 1,000 images a month for personal use, the difference is noise. At 50,000 product mockups or a fine-tuning dataset, the arithmetic matters: SD 1.5 on a rented RTX 4090 costs roughly $0.19 per 1,000 images; Flux.1 Dev on the same hardware costs $1.70 per 1,000. Nine times more. That changes the build-vs-rent math completely.
This article puts numbers on that gap: VRAM requirements, generation speed on three GPU tiers, electricity cost per 1,000 images at the US average rate, and the cloud rental cost on RunPod. Then it tells you which model makes sense at which volume.
The three models in 30 seconds
SD 1.5 (Stability AI, 2022): The original 860M-parameter UNet. Native output is 512×512 or 768×768. VRAM floor is 4 GB. Still the fastest option on consumer hardware by a wide margin, and the one to use if your main constraint is throughput or you're stuck with an 8 GB card.
SDXL 1.0 (Stability AI, 2023): Scaled up to 3.5B parameters with a native 1024×1024 output. Noticeably better composition, text rendering, and detail retention at that resolution. Uses the same UNet architecture, so the speed model is similar — just slower due to the larger resolution and model size.
Flux.1 (Black Forest Labs, 2024): A 12B-parameter Diffusion Transformer (DiT). Two locally runnable variants:
- Schnell — 4-step distilled model, Apache 2.0 license. Commercial use permitted.
- Dev — 25–50 step version for best quality, non-commercial license only.
The critical architectural fact: each Flux step requires approximately 5× more floating-point operations than an equivalent SD 1.5 step. "Flux.1 Schnell uses only 4 steps" sounds like it should be much faster than SD 1.5 at 50 steps. In practice on current hardware, Schnell generates one image in roughly the same time SDXL does, because each of those 4 steps is much heavier. Flux.1 Dev at 20–50 steps is consistently the slowest of the three on any given GPU.
VRAM requirements
This table determines which models you can actually run before any discussion of speed.
| Model | Minimum VRAM | Practical VRAM | Notes |
|---|---|---|---|
| SD 1.5 (512×512) | 4 GB | 6 GB | Runs on GTX 1080 with attention slicing |
| SD 1.5 (768×768) | 5 GB | 8 GB | Needs attention slicing on 6 GB cards |
| SDXL base (1024×1024) | 6 GB | 8 GB | 6 GB workable with xformers + attention slicing |
| SDXL + refiner | 8 GB | 12 GB | Both models don't fit simultaneously on 8 GB |
| Flux.1 Schnell/Dev (FP16) | 24 GB | 24 GB | Full precision; RTX 3090 or 4090 only |
| Flux.1 Schnell/Dev (FP8) | 12–15 GB | 16 GB | Best quality-per-VRAM tradeoff; runs on RTX 4060 Ti 16GB |
| Flux.1 Schnell/Dev (Q4 GGUF) | 6–8 GB | 8 GB | Fits RTX 3060 12GB; visible quality drop vs FP8 |
An RTX 3060 12GB can generate SD 1.5 images at full speed, SDXL images with some patience, and Flux only at Q4 GGUF — which is a different product visually from what you see in Flux demos. If Flux quality is the goal, 16 GB VRAM (for FP8) or 24 GB (for FP16) is the real entry point.
For a full breakdown of what VRAM tier covers which models, see the GPU buying guide for local AI.
Speed benchmarks by GPU tier
These figures are for ComfyUI with xformers enabled, 1024×1024 output where applicable, measured at steady-state (not first-run with model loading).
RTX 4090 (24 GB VRAM, 450W TBP)
| Model | Steps | Seconds/image | Images/hour |
|---|---|---|---|
| SD 1.5 | 50 | ~2 sec | ~1,800 |
| SDXL | 30 | ~3.2 sec | ~1,125 |
| Flux.1 Schnell (FP8) | 4 | ~4–6 sec | ~600–900 |
| Flux.1 Dev (FP16) | 25 | ~18 sec | ~200 |
The RTX 4090 is the only consumer card that runs Flux.1 Dev at native FP16 with reasonable wait times. At 18 seconds per 1024×1024 image, it's still usable for iterating on prompts — but you feel that gap when you're used to SD 1.5 spitting out 30 images per minute.
RTX 3090 (24 GB VRAM, 350W TBP)
| Model | Steps | Seconds/image | Images/hour |
|---|---|---|---|
| SD 1.5 | 50 | ~3.3 sec | ~1,090 |
| SDXL | 30 | ~5–6 sec | ~600–720 |
| Flux.1 Schnell (FP16) | 4 | ~19 sec | ~190 |
| Flux.1 Dev (FP16) | 25 | ~25–30 sec | ~120–145 |
The RTX 3090 is 46% slower than the RTX 4090 as a median across benchmark suites — the gap widens on Flux due to the lower memory bandwidth (936 GB/s vs 1,008 GB/s). The 3090 can run Flux.1 Dev at full FP16 precision thanks to its 24 GB VRAM, but you're looking at half a minute per image. That's workable for overnight batch runs; painful for iterative prompting.
See the used RTX 3090 evaluation for current street prices, which as of May 2026 are running around $680 used on eBay.
RTX 3060 12GB (12 GB VRAM, 170W TBP)
| Model | Steps | Seconds/image | Images/hour |
|---|---|---|---|
| SD 1.5 | 50 | ~6–8 sec | ~450–600 |
| SDXL | 30 | ~25–35 sec | ~100–145 |
| Flux.1 Q4 GGUF | 4 | ~60–90 sec | ~40–60 |
| Flux.1 Dev Q4 GGUF | 25 | 3–5 min/image | ~12–20 |
The RTX 3060 12GB draws the hard line: it is a capable SD 1.5 and SDXL machine but hits a wall with Flux. Q4 GGUF lets you run Flux technically, but the image quality is not representative of what Flux.1 can produce, and the generation time makes it impractical for any volume. If Flux is part of your workflow, this card is the ceiling you've hit.
Electricity cost per 1,000 images
US average residential electricity rate: $0.1765/kWh (EIA data, February 2026).
GPU power draw during image generation inference runs at roughly 80–90% of rated TBP. Using conservative estimates: RTX 4090 at 350W, RTX 3090 at 300W, RTX 3060 at 150W during active generation.
Formula: (watts / 1000) × (seconds_per_image × 1000 / 3600) × $0.1765
| Model | GPU | Sec/image | Power | kWh per 1K images | Cost per 1K images |
|---|---|---|---|---|---|
| SD 1.5 | RTX 4090 | 2 | 350W | 0.194 | $0.034 |
| SD 1.5 | RTX 3090 | 3.3 | 300W | 0.275 | $0.049 |
| SD 1.5 | RTX 3060 | 7 | 150W | 0.292 | $0.051 |
| SDXL | RTX 4090 | 3.2 | 350W | 0.311 | $0.055 |
| SDXL | RTX 3090 | 5.5 | 300W | 0.458 | $0.081 |
| SDXL | RTX 3060 | 30 | 150W | 1.25 | $0.221 |
| Flux.1 Schnell (FP8) | RTX 4090 | 5 | 400W | 0.556 | $0.098 |
| Flux.1 Dev (FP16) | RTX 4090 | 18 | 400W | 2.0 | $0.353 |
| Flux.1 Dev (FP16) | RTX 3090 | 28 | 330W | 2.57 | $0.453 |
For the power bill math behind 24/7 workloads more broadly, see the electricity cost article.
Three things stand out here:
Electricity is not where the cost difference actually hurts. At 10,000 images/month, Flux.1 Dev on an RTX 4090 costs $3.53 in electricity vs. $0.34 for SD 1.5. That's $3.19/month extra — annoying but not meaningful.
The RTX 3060 penalty on SDXL is real. It uses more electricity per image than the RTX 4090 because the slower generation means the GPU runs hot longer, even at lower wattage. At 30 seconds per SDXL image, a 3060 uses 1.25 kWh per 1,000 images — vs. 0.31 kWh for the 4090.
The 4090 is genuinely efficient per image. More watts at the wall, but far fewer seconds of work — net kWh per image drops.
Cloud rental cost per image on RunPod
If you don't own a GPU, you rent one. RunPod Community Cloud pricing as of May 2026:
| GPU | $/hr | SD 1.5 (images/hr) | SDXL (images/hr) | Flux Dev (
Top comments (0)