Two labs race to make AI write whole paragraphs at once instead of word by word

#diffusion #openweight #google #inference

Diffusion text models — which draft an entire block of text at once and then iteratively refine it, rather than generating one token at a time left to right — have moved from research curiosity to a real two-horse race this week. Google released DiffusionGemma as an open-weight model, and Inception Labs launched Mercury 2 as a hosted service, both betting that parallel generation is the future of fast text.

Key facts

What: Diffusion text models generate in parallel blocks rather than left to right; Google's open DiffusionGemma and Inception's Mercury 2 are now in a head-to-head over speed.
When: 2026-06-22
Primary source: read the source

The approach replaces the conventional autoregressive habit — writing one word, then the next, each waiting on the one before it — with a process closer to a photo coming into focus all at once: a rough, garbled draft that is repeatedly cleaned up until it reads correctly. Because diffusion models polish text in parallel rather than sequentially, they can produce output far faster than a conventional model of similar size.

The open-weight contender is Google's DiffusionGemma (model card), released under a permissive license so anyone can download and run it. It climbed near the top of the download charts within days even though, unusually, no big cloud company is yet offering it as a ready-to-use hosted service. That gap created a scramble: tooling sprang up to answer the urgent community question of how to run it locally, including fine-tuning support from Unsloth and a community-built local interface (diffusiongemma-lab).

The challenger comes from Inception Labs, whose Mercury 2 (inceptionlabs.ai) is a diffusion text model offered only as a hosted service and claims to be faster still. The contest lines up cleanly: an open model you can own but have to set up, versus a closed one you can't inspect but can call instantly. We've covered this paradigm before, in the story of a bigger text model that doesn't write left to right, and the underlying idea is laid out in our explainer on diffusion language models.

Speed isn't a luxury — it changes what's economically possible. A model that can generate a long document or a big chunk of code in a fraction of the time costs a fraction as much to run at scale, and feels qualitatively different to use: less waiting, more conversation. If diffusion text models keep their quality while running this fast, they could reshape the economics of anything that involves generating a lot of text — summaries, code, drafts, translations — and put real pressure on the one-word-at-a-time approach that has dominated since chatbots began.

The trade-off: the traditional method is like a careful writer composing a sentence and only moving on once it's perfect — reliable, but you watch every word appear. The diffusion method is like a sculptor starting with a rough block and chiseling the whole shape into focus at once — potentially much faster, but you're trusting the cleanup process to land in the right place. Both can produce beautiful results; they fail in different ways.

The honest caveat is that speed is the easy part to demonstrate and quality is the hard part to prove. Generating text in parallel makes it trickier for the model to keep a long argument perfectly consistent, since it's not building strictly on what came just before. Researchers are still scrutinizing how these models hold up on long, reasoning-heavy tasks compared to the conventional kind — and asking harder questions about how interpretable they are (How transparent is DiffusionGemma, and why it matters) — and the speed claims, especially the "we're faster than them" kind traded between two competitors, deserve independent testing before anyone treats them as settled. What's not in doubt is that parallel text generation has gone from a research curiosity to a real race, with one strong open option and one strong closed one pushing each other.

Originally published on Ground Truth, where every claim is checked against the primary source.