Originally published on rohitraj.tech
Google open-sourced DiffusionGemma on June 10, 2026 — a 26B MoE that writes a 256-token block in parallel instead of one token at a time, hitting 700+ tokens/sec on an RTX 5090 and up to 4x faster than Gemma 4. The catch: quality sits below standard Gemma 4. Here is the developer read — how text diffusion works, how to run it locally, the speed-vs-quality decision, and when to skip it.
Read the full version with code samples, diagrams, and architecture details: DiffusionGemma: Text Diffusion LLMs Explained, and When to Actually Use One (2026)
More engineering notes: rohitraj.tech/en/notes
Top comments (0)