This article was originally published on aifoss.dev
TL;DR: Your GPU's VRAM determines which image model you can actually run — benchmarks don't matter if the model won't fit in memory. SDXL is the 8GB choice with the widest LoRA ecosystem; SD 3.5 Medium earns its place at 6–10GB with better text rendering and a workable commercial license; Flux.1 Dev leads on raw quality at 24GB but is restricted to non-commercial use only.
| SDXL 1.0 | SD 3.5 Medium | Flux.1 [dev] | |
|---|---|---|---|
| Best for | LoRAs, fine-tunes, widest tooling | Consumer GPU, text-in-image | Maximum quality, research |
| Min VRAM | 8GB (comfortable) | ~6GB | 6GB (GGUF Q4) |
| Full-precision VRAM | ~12GB | ~10GB | 24GB |
| Commercial license | Yes (CreativeML RAIL++-M) | Yes (up to $1M revenue) | No |
| LoRA ecosystem | Massive | Growing | Growing |
| Steps (typical) | 20–30 | 25–40 | 20–50 |
Honest take: On an 8GB GPU, use SDXL. On 10–16GB, SD 3.5 Medium or Flux.2 Klein 4B depending on whether you need a commercial license and step count. On 24GB, Flux.1 Dev is the default unless you specifically need text rendering, where SD 3.5 Large wins.
The Hardware Question Comes First
Most Flux-vs-SDXL articles lead with quality comparisons. That framing is backwards. The model that scores best in abstract benchmarks is irrelevant if it requires 24GB and you have 12GB. Before you compare quality, figure out what your GPU can actually run without quantization tricks destroying the advantage.
The models in scope here, as of June 2026:
- Flux.1 [dev] — 12B parameters, Black Forest Labs, released August 2024. The quality reference in open-source image generation. Non-commercial license.
- Flux.1 [schnell] — Same architecture, distilled to 4 inference steps. Apache 2.0. The commercial-safe Flux option before Klein arrived.
- Flux.2 Klein 4B — Released January 15, 2026. 4B parameters, Apache 2.0, 4 steps. The current practical recommendation for commercial Flux work under 16GB VRAM.
- SDXL 1.0 — 3.5B parameters, Stability AI, July 2023. The widest LoRA and extension ecosystem in open-source image generation.
- SD 3.5 Medium — 2.5B parameters, Stability AI, October 2024. The sweet spot for consumer hardware with strong text-in-image performance.
- SD 3.5 Large — 8.1B parameters, Stability AI. Higher quality than Medium, but requires 16–18GB at FP16.
VRAM and Speed Reference Table
Community benchmarks at 1024×1024, ComfyUI, no batch, no torch.compile unless noted. Numbers vary by driver version, software optimization, and step count.
| Model | Parameters | Min VRAM | Comfortable | Speed (RTX 4090) | Steps |
|---|---|---|---|---|---|
| Flux.1 [dev] | 12B | 6GB (GGUF Q4) | 16GB (FP8) | ~18s/img | 20–50 |
| Flux.1 [schnell] | 12B | 6GB (GGUF Q4) | 16GB (FP8) | ~8s/img | 4 |
| Flux.2 Klein 4B | 4B | ~10GB (Q8) | 13GB (FP16) | ~6s/img | 4 |
| SDXL 1.0 | 3.5B | 8GB | 10–12GB | ~3–4s/img | 20–30 |
| SD 3.5 Medium | 2.5B | 6GB | 8–10GB | ~5s/img | 25–40 |
| SD 3.5 Large | 8.1B | 11GB (FP8) | 16–18GB | ~12s/img | 25–40 |
RTX 4090 Flux.1 Dev speed (~18 seconds per image at FP16) from ComfyUI community benchmarks. SDXL speed (~3.2–4 seconds at 20 steps on RTX 4090) from Salad benchmark data. Speed on an RTX 3090 for Flux.1 Dev runs approximately 25–35 seconds at FP16; SDXL on a 3090 generates in roughly 5–7 seconds at 20 steps.
If You Have 8GB VRAM
An RTX 3060 Ti, RTX 4060, or AMD RX 7600 all land at 8GB. SDXL was designed for this tier and runs well without workarounds.
Set up ComfyUI with SDXL, enable xformers, and you get 1024×1024 images in 10–20 seconds at 20 steps. The LoRA catalogue on Civitai alone has tens of thousands of style-specific fine-tunes. ControlNet, AnimateDiff, IP-Adapter — all mature and well-documented at this tier. See the 8GB VRAM image generation guide for the specific flags and memory workarounds that matter on this GPU class.
SD 3.5 Medium is technically possible at 8GB with CPU offloading enabled in ComfyUI. The experience degrades: decode times slow noticeably, and the VAE and T5-XXL text encoder compete for memory with the main model. At 8GB, it's a compromise you don't need to make when SDXL runs cleanly.
Flux.1 Dev at 8GB requires GGUF Q4 quantization. It works, and you can find guides showing it running on a 3060, but generation takes 60+ seconds and Q4 quantization visibly degrades fine detail — faces, textures, and fine linework. The quality advantage Flux Dev has over SDXL shrinks considerably at Q4. If you need Flux-level quality output occasionally and don't want to quantize it, a RunPod A100 instance for a few hours is cheaper than the quality tax of Q4 on consumer hardware.
8GB verdict: SDXL. No contest.
If You Have 10–16GB VRAM
This covers the RTX 3080 10GB, RTX 3080 Ti 12GB, RTX 4070, RTX 4070 Ti, and the newer RTX 5070 Ti 16GB. This is the most contested tier because multiple models run well here and the right choice depends on what you're actually building.
SD 3.5 Medium is the general-purpose answer. At 10–12GB it runs at or near full precision without offloading. Quality beats SDXL on prompt adherence, spatial composition, and — critically — text rendered inside images. The triple text encoder (T5-XXL + CLIP-L + OpenCLIP-G) produces coherent on-image text at a reliability SDXL simply cannot match. The Stability AI Community License allows commercial use up to $1M in annual revenue, which covers most independent developers and small teams.
The friction: SD 3.5 Medium's LoRA ecosystem is substantially smaller than SDXL's, and its MMDiT architecture is not compatible with SDXL's UNet-based ControlNet and AnimateDiff implementations. Some workflows built around SDXL don't have direct SD 3.5 equivalents yet.
Flux.2 Klein 4B is the stronger option if you want Flux-lineage quality and need Apache 2.0 licensing. Released January 15, 2026, it runs in approximately 13GB at FP16, generates in 4 inference steps, and produces output quality between Schnell and Dev on complex prompts. It's newer, so fine-tune support and ControlNet integrations are still developing — but if you need a Flux-family model that you can deploy commercially without license friction, Klein 4B is the current answer.
Flux.1 Dev in FP8 fits at 12–16GB and produces noticeably better outputs than Schnell on prompts with complex spatial relationships. The constraint is the license: non-commercial only. It's valid for personal work, research, and building internal tooling that won't face end users commercially.
10–16GB verdict: SD 3.5 Medium for most commercial work; Flux.2 Klein 4B if you want better quality with Apache 2.0; Flux.1 Dev FP8 for non-commercial personal or research work.
If You Have 24GB VRAM
An RTX 3090, RTX 4090, or RTX 5080 runs everything at full or near-full precision. The question shifts from capability to use case.
For photorealism, multi-subject scenes, facial detail, and general-purpose best-quality generation, Flux.1 Dev at FP16 is the default. It generates in ~18–20 seconds per image on a 4090 at 50 steps. At 20 steps you get results in roughly 8–10 seconds with minimal quality loss — useful for draft iteration. The non-commercial license is the one constraint to know before you start.
SD 3.5 Large is worth reaching for in specific cases: images where text must appear inside the output, highly compositional scenes where the T5-XXL encoder matters, or when you want to fine-tune and need a model with a commercial-friendly license. At FP1
Top comments (0)