DEV Community

FlareCanary
FlareCanary

Posted on

xAI retired 8 Grok models on May 15 — the slugs still resolve, so your bill and output quality changed silently

On May 15, 2026 at 12:00 PM PT, xAI retired eight model slugs from the Grok API:

  • grok-4-1-fast-reasoning
  • grok-4-1-fast-non-reasoning
  • grok-4-fast-reasoning
  • grok-4-fast-non-reasoning
  • grok-4-0709
  • grok-code-fast-1
  • grok-3
  • grok-imagine-image-pro

Here is the line from xAI's migration notice that makes this dangerous:

The slugs themselves continue to resolve, so you do not need to change your code to avoid breakage.

That sounds reassuring. It is the opposite of reassuring. "You do not need to change your code" is exactly why most teams didn't — and a retirement that requires no code change is a retirement that ships no signal. Nothing 404s. No SDK exception. No deploy. The same request you sent on May 14 still returns 200 on May 16. What changed is underneath the slug, and none of the usual alarms are wired to it.

Here is the silent-fail surface we keep seeing on review.

1. grok-code-fast-1 now bills at grok-4.3 rates — and that's your highest-volume slug

grok-code-fast-1 was xAI's cheap, fast, coding-optimized model. Its entire reason to exist was running a lot of tokens for a little money — agentic coding loops, refactor passes, repo-wide edits, autocomplete backends. High call volume, low unit price. That's the slug people deliberately picked because it was cheap.

After May 15, requests to grok-code-fast-1 redirect to grok-4.3, billed at grok-4.3's rate of $1.25 per 1M input tokens and $2.50 per 1M output tokens — flagship pricing, not the fast-tier pricing you chose. The redirect is the worst possible combination: it lands hardest on the slug with the highest token throughput, and it produces no error, no warning, no changed status code. The first signal is the invoice, and the invoice arrives weeks late.

If you run agentic coding on Grok, this is not a "review next sprint" item. Your cost per run changed on May 15 and your monitoring almost certainly didn't notice, because cost-per-token isn't something most teams alert on until finance asks a question.

2. The reasoning slugs are now answering at low effort

The redirect is not a clean one-to-one swap. xAI maps the retired slugs onto grok-4.3 with a reduced reasoning setting:

  • Every retired reasoning slug (grok-4-fast-reasoning, grok-4-1-fast-reasoning) → grok-4.3 with low reasoning effort.
  • Every retired non-reasoning slug → grok-4.3 with none reasoning effort.

If you picked grok-4-fast-reasoning specifically because a task needed the model to think — structured extraction, multi-step tool planning, anything where you traded latency for correctness — you are now getting low effort by default. The model still answers. The answer is still well-formed JSON, still parses, still passes your schema validation. It's just measurably worse on the hard cases, and there is no field in the response that says "I thought less about this than I used to." Your eval suite is the only thing that would catch it, and only if you re-ran it after May 15 — which nobody schedules, because nothing told them to.

This is the textbook drift shape: a valid-looking response that is a correct answer to a different question than the one your code thinks it asked.

3. Cost-attribution dashboards now lie

A lot of teams tag spend by the model slug they send: a model dimension on a metrics counter, a column in a usage table, a group-by in the monthly cost rollup. Those dashboards key off the string you sent, not the model that actually ran.

Post-May-15, your dashboard still shows a tidy line item for grok-code-fast-1 at the old unit price in your own math — while xAI bills the account at grok-4.3 rates. Internal cost attribution and the actual bill have silently diverged. Every "cost per feature" or "margin per customer" number that flows from that slug is now wrong, and it will stay wrong until someone reconciles the xAI invoice against the dashboard by hand and notices the totals don't match.

4. grok-imagine-image-pro is a different image model now

grok-imagine-image-pro redirects to grok-imagine-image-quality. That is a different image model, not a renamed one. Anything downstream that made assumptions about the old model's output — dimensions, style, latency budget, cost per image, safety-filter behavior — is now feeding a different generator into the same pipeline with no version bump. Image pipelines are especially exposed here because the output "looks fine" to code; only a human comparing before/after notices the model changed.

5. Fallback chains lost their cheap degraded mode

Routers built during past provider incidents tend to look like this:

primary: grok-4.3
fallback:
  - grok-4-fast-non-reasoning   # cheap degraded mode
  - grok-3
Enter fullscreen mode Exit fullscreen mode

The intent was: if the primary is rate-limited or down, drop to a cheaper model and keep serving. After May 15 both fallback entries resolve to grok-4.3. The "cheap degraded mode" is now full-price grok-4.3 — so the exact moment you fail over under load is the exact moment your per-request cost jumps to flagship rates, with no error and no log line saying the cheap path is gone. Incident plus silent cost blowout, stacked.

6. Pinned eval baselines now track a moving target

If you run regression evals against a fixed model slug — standard practice for catching prompt regressions — you have grok-4-fast-reasoning or similar hardcoded in the harness. That pin was the whole point: a stable baseline to diff prompt changes against.

After May 15 the pin resolves to grok-4.3 at low effort. Your "stable baseline" moved. Every prompt-change diff you run against it from now on is measuring two variables at once — your prompt edit and a model swap you didn't make — and the harness has no idea, because the slug string in the config is unchanged.

What to actually do

The migration itself is small. The detection is the hard part, because there is no schema diff to catch at review time and no error to alert on.

  1. Grep every repo, IaC file, notebook, and prompt config for the retired slugs:
   git grep -nE "grok-(4-1-fast-(reasoning|non-reasoning)|4-fast-(reasoning|non-reasoning)|4-0709|code-fast-1|3|imagine-image-pro)"
Enter fullscreen mode Exit fullscreen mode

Include eval harnesses, fallback/router configs, and cost-attribution code — not just your main call sites. Those three are where this hides.

  1. Pin grok-4.3 explicitly and choose your reasoning effort. Don't keep riding the redirect. The redirect picks low/none for you; only an explicit grok-4.3 call with an explicit effort level (none/low/medium/high) puts the quality/cost tradeoff back in your hands.

  2. Re-run your evals after switching, and treat any pinned-baseline eval as invalidated as of May 15. Capture a fresh baseline against an explicit model+effort you control.

  3. Reconcile one xAI invoice line by line against your internal cost dashboard. If they don't match, your attribution is keying off the sent slug and needs to key off actual billed usage.

  4. Add a cost-per-token alert, not just a request-count alert. This entire class of failure is invisible to availability monitoring and visible only to spend monitoring.

The reason this one is worth a sprint and not a backlog ticket: every other model retirement this year threw an error eventually. This one is engineered specifically not to. "Your code keeps working" is the failure mode, not the mitigation.


FlareCanary watches your third-party APIs and SDKs for breaking changes like this one — including model retirements, silent slug redirects, and pricing-tier remaps — and surfaces them before the invoice does. Free tier monitors 5 endpoints.

Top comments (0)