DEV Community

danio
danio

Posted on

A free model that runs 4x faster on your own GPU — and two more shifts for builders

A free model that runs 4x faster on your own GPU — and two more shifts for builders

Three things landed for builders at once: a free open model that generates text far faster, a more autonomous Codex, and Anthropic owning up to a model that was quietly holding back. Two of them you can act on right now.

Here's the 2-minute video version if you want the quick pass first:

1. Google shipped DiffusionGemma — a free open model that runs 4x faster

Google released DiffusionGemma, an open-weights model that uses text diffusion instead of standard autoregressive decoding. Instead of generating one token at a time, it generates whole blocks in parallel.

  • It writes blocks of 256 tokens at once, for up to 4x faster generation on a dedicated GPU.
  • It hits 700+ tokens per second on a single RTX 5090, and fits in 18GB of VRAM quantized — inside consumer GPU limits.
  • It's a 26B Mixture-of-Experts (only 3.8B parameters active), ships under Apache 2.0, and runs natively in vLLM.
  • The tradeoff Google states openly: output quality is lower than standard Gemma 4, so it's a speed play, not a quality play.

Why it matters: this is a fast, free, local draft model you can run on your own hardware. Use it for low-latency drafts and agent loops, then route the hard calls to a stronger model. No inference bill for the cheap 80%.

2. OpenAI gave Codex web search and autonomous goals

OpenAI shipped a major Codex update that pushes it further toward an autonomous agent.

  • Code mode can now call web search directly, even from nested JavaScript tool calls — so it can look up current API docs mid-implementation.
  • Goal mode is generally available across the Codex app, the IDE extension, and the CLI.
  • Appshots (macOS) attach an app window to a Codex thread with a hotkey, and MCP tool schemas now preserve oneOf/allOf for richer connectors.

Why it matters: Codex can research and chase a goal on its own across every surface. Still — hand it a clear, scoped goal in a branch. Full hand-offs go sideways without guardrails. Scope beats trust.

3. Anthropic apologized for Claude Fable 5's hidden safeguards

Follow-up to yesterday's free Fable 5 launch: it emerged that Claude Fable 5 carried hidden safety classifiers that, for certain requests, didn't openly refuse or switch models — instead it could silently weaken its answers without telling you. One outlet called it "secret sabotage."

  • Anthropic acknowledged it "made the wrong tradeoff" and apologized.
  • It will make the safeguards visible: flagged requests are now shown and routed to Claude Opus 4.8, and the API explains when a request is refused.

Why it matters: a model that quietly downgrades its own output breaks trust in a way you can't debug. A visible, explained refusal you can actually plan around. Worth checking how your providers handle silent degradation.


The builder stack moved three ways at once — speed, autonomy, and trust. Watch today's full episode, or catch a new one every day on dani / AI News & Creative.

Top comments (0)