Jovan Chan

Posted on Jun 2 • Originally published at aifoss.dev

flux-vs-sdxl-vs-sd35-2026

#opensource #ai #selfhosted #linux

This article was originally published on aifoss.dev

---
title: 'Flux vs SDXL vs SD 3.5 2026: Which Image Model Wins'
description: 'Flux.1, SDXL, and SD 3.5 compared on quality, VRAM, and ecosystem. Pick the right open-source image generation model for your hardware and use case in 2026.'
pubDate: 'May 24 2026'

tags: ["stablediffusion", "ai", "imagegeneration", "gpu", "opensource"]

The open-source image generation landscape has gone through a consolidation phase. After two years of fragmentation — every month brought a new model family claiming to beat everything else — three distinct tiers have emerged that cover 95% of real-world use cases: Flux (quality tier), SDXL (ecosystem tier), and SD 3.5 (the contested middle). Where you land depends on your hardware, your use case, and whether you need a commercial license.

Here's a deep look at all three, including the newest additions to each family as of May 2026.

The Three Families at a Glance

	Flux.1 [dev]	Flux.2 Klein 4B	SDXL 1.0	SD 3.5 Medium	SD 3.5 Large
Parameters	12B	4B	3.5B	2.5B	8.1B
Min VRAM (practical)	6GB (GGUF Q4)	~13GB FP16	8GB	~10GB	~11GB FP8
License	Non-commercial	Apache 2.0	CreativeML RAIL++-M	Stability Community	Stability Community
Steps (typical)	20–50	4	20–30	25–40	25–40
LoRA ecosystem	Growing	Minimal	Massive	Minimal	Minimal
Best for	Quality, research	Fast commercial gen	Fine-tuning, ControlNet	Consumer hardware + text	Quality + text rendering

Short version: if you need the best images and don't mind a non-commercial license, Flux.1 dev is the default. If you need commercial use under 16GB VRAM, SDXL or Flux.2 Klein 4B. If you're specifically working with text-in-image or complex compositions and have NVIDIA RTX hardware, SD 3.5 Large with TensorRT is the one case where it earns its place.

Flux: The Quality Benchmark

Black Forest Labs released the original Flux.1 family in mid-2024, and as of 2026 it remains the reference point for open-source image quality. The family has expanded substantially.

Flux.1 [dev] — The 12B reference model

Flux.1 [dev] is a 12-billion-parameter rectified flow transformer, distilled from the commercial Flux.1 [pro]. In practice, it consistently outperforms SDXL on prompt adherence, face detail, spatial coherence in complex multi-subject scenes, and rendered text inside images. Put them side by side with the same prompt and the gap is immediately obvious on photorealistic subjects.

The hardware cost is the main barrier. Running Flux.1 dev at full FP16 requires approximately 24GB VRAM — an RTX 3090, 4090, or A100. GGUF quantized variants shift this considerably: a Q4_K_M GGUF brings the requirement down to around 6GB VRAM, fitting an RTX 3060. There is a quality trade-off at that level — fine textures and high-frequency detail in faces suffer — but for most use cases it's still better than stock SDXL.

The other barrier: Flux.1 [dev] ships under the FLUX.1-dev Non-Commercial License. You cannot use it in products or services. For production deployments, you need Flux.1 [schnell] or Flux.2 Klein.

Flux.1 [schnell] — Fast, Apache 2.0

Schnell uses 4 inference steps and is fully Apache 2.0 licensed. The quality gap versus dev is real but narrower than expected at 4 steps — it's a solid choice for rapid iteration and applications where sub-second generation matters more than peak fidelity. Most commercial Flux deployments were running schnell before Klein arrived.

Flux.2 Klein — January 2026, the practical upgrade

On January 15, 2026, Black Forest Labs released FLUX.2 [klein] in two sizes.

The 4B variant (Apache 2.0) generates in 4 inference steps and runs in approximately 13GB VRAM at FP16. Quality sits between schnell and dev — better than schnell on complex prompts, not quite dev at full precision. For commercial applications on under-16GB hardware, this is now the first model to reach for.

The 9B variant raises quality further but reverts to a non-commercial license. At FP16 it needs 27–29GB VRAM; FP8 quantization brings this to roughly 14–16GB, which fits an RTX 4090 with text encoder offloading.

Running Klein 4B locally via the official inference repo:

# Clone the inference repo
git clone https://github.com/black-forest-labs/flux2
cd flux2

# Install dependencies
pip install -e ".[all]"

# Download Klein 4B weights
huggingface-cli download black-forest-labs/FLUX.2-klein-4B \
  --local-dir ./models/flux2-klein-4b

In ComfyUI, load the .safetensors checkpoint through the UNet Loader node — Klein 4B uses the same node structure as Flux.1, so existing workflows load without modification.

Flux.1 Kontext [dev] — A different tool entirely

Flux.1 Kontext [dev] is the same 12B architecture adapted for image-to-image editing via text instructions: change a background while keeping the subject, swap clothing, add objects, adjust lighting. It is not a replacement for Flux.1 dev for text-to-image generation — it's a separate workflow for iterative image editing. License is non-commercial. If you're building an editing tool, it's the most capable open-weight option available as of mid-2026.

When NOT to use Flux

You have 8GB VRAM and want clean output without GGUF artifacts — SDXL is the better fit.
You need a mature LoRA library for a specific subject or style. Civitai has a fraction of SDXL's catalog for Flux, and training Flux LoRAs requires more VRAM than SDXL Dreambooth.
You need Flux.1 dev commercially — the non-commercial restriction applies to dev and the 9B Klein variant.

SDXL: The Ecosystem Workhorse

SDXL 1.0 is a 3.5B-parameter model that Stability AI released in 2023. In 2026, stock SDXL loses on raw quality to both Flux and SD 3.5 Large. It wins on something else: no other open-source image model has the same depth of community fine-tuning, custom checkpoints, LoRAs, and ControlNet adapters.

The hardware case

SDXL runs on 8GB VRAM — Stability AI's own baseline recommendation. At 8GB, you get the base model at 1024×1024 without the refiner. Adding the refiner (the second-pass model that adds fine detail) pushes requirements to 12–16GB. For most use cases, 8GB gets you 90% of the SDXL experience.

The license is CreativeML Open RAIL++-M. Commercial use is allowed without revenue caps, subject to use-based restrictions (no illegal content, no harmful applications). This is more permissive than the Stability AI Community License on SD 3.5, which caps free commercial use at $1M annual revenue.

The ecosystem argument

Civitai hosts tens of thousands of SDXL LoRAs, fine-tuned checkpoints, embeddings, and ControlNet adapters — covering photography styles, anime, architecture, product visualization, and character consistency training. No other open model family comes close. Fine-tuning SDXL for a specific character or aesthetic using Dreambooth or LoRA training is thoroughly documented, with tooling in both ComfyUI and Automatic1111/Forge.

The ComfyUI custom nodes ecosystem for SDXL is particularly deep. SeargeSDXL, ComfyRoll, and the built-in SDXL nodes enable multi-ControlNet pipelines, LoRA stacking, aspect ratio management, and refiner scheduling in a single workflow. If you need a customized pipeline for a specific domain, SDXL is still where you start.

Where stock SDXL falls short

Raw text-to-image without fine-tuning: SDXL loses to Flux.1 dev on faces, complex multi-subject prompts, and text rendering in images. The gap is large enough to matter for photorealistic use cases. An SDXL checkpoint fine-tuned on domain-specific data often outperforms generic Flux.1 on that domain — but out of the box, SDXL generates images that look

DEV Community