DiffusionGemma: Text Diffusion LLMs Explained, and When to Actually Use One (2026)

#diffusiongemma #gemma #text #diffusion

Originally published on rohitraj.tech

Google open-sourced DiffusionGemma on June 10, 2026 — a 26B MoE that writes a 256-token block in parallel instead of one token at a time, hitting 700+ tokens/sec on an RTX 5090 and up to 4x faster than Gemma 4. The catch: quality sits below standard Gemma 4. Here is the developer read — how text diffusion works, how to run it locally, the speed-vs-quality decision, and when to skip it.

Read the full version with code samples, diagrams, and architecture details: DiffusionGemma: Text Diffusion LLMs Explained, and When to Actually Use One (2026)

More engineering notes: rohitraj.tech/en/notes