On Friday afternoon a government order hit Anthropic, and by Saturday morning Fable 5 and Mythos 5 were disabled for every customer worldwide. Not deprecated. Gone. Two days later OpenAI shut Sora down because it was losing fifteen million dollars a day.
I don't have a strong take on the politics. What I had was a smaller, more selfish question at 8am Saturday: if I'd staffed a real workflow on either of those, what would I actually do right now?
So I tested it. Here's what happened.
"We'd just switch" is a hope, not a plan
I'd been telling myself I had redundancy for months. If my main model fell over, I'd move to a second vendor. Easy.
The problem with that sentence is that I had never once run it. A fallback you've never executed isn't a fallback. It's a guess with good posture.
So Saturday I took my single most critical AI-dependent workflow - a spec-to-task-breakdown pipeline I lean on every day - and ran it end to end on a different vendor's model. One time. Just to find out whether the guess held.
It didn't.
Break #1: the prompt was overfit to one model
The first thing that broke was the prompt itself. My prompt had drifted into a shape that worked beautifully on the model I built it against. Tight, terse, lots of implicit structure the model had learned to fill in.
The backup model read the same prompt and produced mush. Not wrong exactly, just vague and unstructured, the kind of output you'd toss.
The fix was real work, not a config flag:
- summarize the spec and break it into tasks
+ You are breaking a spec into engineering tasks.
+ Output JSON only, matching this shape:
+ { "tasks": [{ "title": "", "estimate_pts": 0, "depends_on": [] }] }
+ Rules:
+ - every task must be independently shippable
+ - no task larger than 3 points; split if larger
+ - depends_on references task titles, not indexes
Model A filled in all that structure on its own. Model B needed it spelled out. That's twenty minutes of restructuring I'd much rather spend on a calm Saturday than during an actual outage.
Break #2: a silent tool-call dependency
The second break scared me more because it was invisible. One step in the pipeline depended on a tool call - a function the model invokes to pull live data. The backup model's tool-calling format was different enough that the call silently no-op'd.
The output still looked plausible. It just used stale data and didn't tell me. That's the worst failure mode there is: confidently wrong, no error, no flag. I only caught it because I was looking for trouble. On a normal day that bad output flows downstream and someone makes a decision on it.
Availability belongs on the risk register
Here's the reframe I walked away with. We already handle the API being down. You get a 503, you back off, you retry, it comes back. That's an outage with an SLA and a status page that eventually goes green.
This is the model being gone. No SLA. No restore ETA. No green status page, because it isn't coming back. A policy order or a vendor's burn-rate review can end it overnight, and you find out the same way everyone else does.
For a service you don't control and can't restore, that's a single point of failure on your critical path. We'd never ship that for a database. Most of us are shipping it for the model doing half the thinking.
The one-pager that deletes your worst hour
The cheapest move turned out to be the most useful. The first hour after a model goes dark gets burned figuring out what just broke - which workflows touched that model, what versions, where the outputs live.
IBM found 88% of enterprises don't keep a complete inventory of the AI and agents they run. You can't reroute around a dead model if you don't know what depended on it. So I wrote one file:
workflows:
- name: spec-to-tasks
model: primary-vendor/model-a
criticality: must-survive
fallback: tested 2026-06-13, prompt needs restructure
- name: standup-digest
model: primary-vendor/model-a
criticality: can-wait
fallback: none, recovery order documented
- name: video-assets
model: openai/sora
criticality: can-wait
export_path: download MP4s + project json before EOL
That last line is the Sora lesson. When a vendor kills a product, not just a model, you also have to ask where your outputs go and how you get them out. One extra column.
The point isn't fear
I want to be clear, because the lazy version of this post is "AI is unreliable, panic." It isn't, and that's not useful. Depending on these models is the right call. The teams that win aren't the ones who avoided the dependency. They're the ones who can keep the work moving the morning it disappears.
That competence costs an afternoon to build and almost nobody has built it yet:
- Run your most critical workflow on a second model once. The rehearsal is the whole instrument.
- Sort workflows into must-survive-today vs can-wait. Only the short list earns a tested fallback.
- Keep a one-page workflow-to-model list so the first lost hour becomes a glance.
I ran my test on a quiet Saturday and it cost me twenty minutes and a little ego. The alternative was running it for the first time on the morning it counted.
What would break first in your stack if your main model wasn't there tomorrow - and have you ever actually checked?
Top comments (4)
The silent no-op is the part that should change how people design fallbacks. an outage fails loud. this fails quiet, and quiet failures are the ones that ship.
worth pushing further: the prompt restructure (break #1) is annoying but cheap to fix once. the tool-call format mismatch (break #2) isn't a prompt problem, it's a contract problem . your pipeline assumed a specific model's tool-calling shape as if it were a stable interface. that assumption is the actual single point of failure, not the model itself.
did the backup model give any signal at all that the tool call failed, or was the output indistinguishable from a successful run with fresh data?
that's the design debt nobody puts on the backlog — 'can this fail silently?' the tool-call mismatch broke exactly like that: status green, work not done, no error to grep for. now every fallback in my stack has a forced trace log or it doesn't ship.
The silent tool-call dependency is the one that'd keep me up. Stale data masquerading as fresh output — that's worse than an outage because nobody knows to panic. The one-pager is smart, but the rehearsal is what actually saves you.
the 'nobody knows to panic' framing is the right one — the incident doc is about how to react, but the rehearsal is what reveals whether you even know something went wrong. ran ours on a non-critical flow first and found two silent failures in the first 10 minutes.