One AI Vendor Is a Single Point of Failure. Treat It Like One.
The AI model you built your workflow on today may be indistinguishable from its competitor next quarter. That's not a problem. Betting on one of them is.
Pick up any two frontier models and run the same prompt through both. A year ago, you'd get noticeably different outputs: different reasoning styles, different strengths, different failure modes. Today, the gap has narrowed to a point where many enterprise workloads can't reliably tell them apart. That convergence is not an accident. It's the predictable result of an industry eating itself.
And it has significant implications for how you should be building.
Why the Models Are Converging
The reasons stack up fast.
Models trained on models. When a frontier lab releases a strong model, that model's outputs become training data. For researchers. For competitors. For distillation pipelines that compress the big model's behavior into smaller, cheaper ones. [1][2] The knowledge encoded in GPT-4 didn't stay inside OpenAI. It propagated through every dataset that included AI-generated content — which is most of the internet now. Models are increasingly trained on each other's outputs, knowingly or not. OpenAI accused DeepSeek of doing exactly this: using OpenAI API outputs to train a competing model, with the White House AI czar stating there was "substantial evidence that DeepSeek distilled the knowledge out of OpenAI's models." [1] Convergence is baked into the pipeline.
Talent moves. The AI industry has roughly the same twelve hundred people who understand how to train frontier models at scale. They circulate. The researchers who built GPT-3 helped found Anthropic — Dario Amodei, Daniela Amodei, and several colleagues left OpenAI and immediately began developing Constitutional AI and mechanistic interpretability. [3] Fortune reported in 2025 that OpenAI engineers were 8x more likely to leave for Anthropic than the reverse; Meta poached at least eleven researchers from OpenAI, DeepMind, and Anthropic in a single hiring sprint. [4] The intellectual property of training methodology travels with the humans who developed it, and the humans don't stay put. The result: training approaches converge because the people designing them are the same people, working from the same theoretical foundations, just wearing different lanyards.
Standards are consolidating — faster than anyone expected. The MCP story is the clearest evidence of how quickly the industry is converging around shared infrastructure, and it's worth pausing on. Anthropic announced the Model Context Protocol in November 2024. [5] OpenAI — Anthropic's direct competitor — adopted it four months later, in March 2025. Sam Altman's post said simply: "People love MCP and we are excited to add support across our products." [6] Google's Demis Hassabis publicly endorsed MCP within weeks, and Google followed with formal support. [7] Microsoft hit general availability at Build 2025. [8] By December 2025, Anthropic had donated MCP to the Linux Foundation — with OpenAI and Block as co-founders of the new Agentic AI Foundation alongside Anthropic, and Google, Microsoft, and AWS as supporting members. [9]
The pace of that trajectory is worth internalizing. A standard created by one lab, adopted by every major competitor within six months, then transferred to neutral open-source governance within a year. SDK downloads went from roughly 100,000 per month at launch to 97 million per month by late 2025 — nearly a thousand-fold increase. [10] There are now over 10,000 active MCP servers. The New Stack ran a piece titled "Why the Model Context Protocol Won." [11]
When competitors adopt your standard that quickly, it means one thing: the underlying problem it solves is universal enough that no one benefits from a proprietary alternative. That's the definition of infrastructure. And infrastructure is, by definition, commodity. Token formats are standardizing on the same logic — which is precisely why model routers can exist as a product category at all. When the interfaces are identical, the model underneath is interchangeable.
The benchmark treadmill. Every major lab is optimizing against the same public benchmarks: MMLU, HumanEval, SWE-bench, GPQA. When you train to the same tests, you build the same competencies. The models get better at the same things, in the same ways. Differentiation exists at the frontier and at the edge cases — the things the benchmarks don't measure. For the vast middle of enterprise use cases, the models are functionally equivalent and getting more so.
The insight that changes the strategy: The model isn't the moat. The model is the commodity. The moat is the workflow, the data, the institutional knowledge of how to use the tool. That remains yours regardless of which model powers it — if you build for portability.
The Mission-Critical Problem
This is a movie that every systems architect has seen on repeat.
No serious organization runs mission-critical infrastructure on a single server without failover. Not because the server is likely to fail, but because the cost of unplanned downtime is high enough that the insurance is worth the overhead. You build redundancy, you test failover, you know what happens when the primary goes down — before it goes down.
AI tools have crossed the mission-critical threshold for a growing number of organizations. [12] When your development team's velocity depends on an AI coding assistant, when your customer service runs through an AI agent, when your analysts are using AI for research synthesis — an outage isn't a convenience problem. It's a business problem.
This is not theoretical risk.
On April 15, 2026, a critical incident took down Claude.ai, the Claude API, Claude Code, and the platform console simultaneously for approximately three hours. [13] Login failures locked out users who hadn't already established a session; the API went completely dark before recovering.
Less than a week later, on April 20, 2026 (today as I write this), OpenAI experienced an outage of over two hours that took down ChatGPT, Codex, and the entire API platform simultaneously. [14] Elevated authentication errors, European region failures, and business workspace disruptions occurred in the same week. An AI workflow with no routing alternative was simply dead in the water.
Also, for the past 4 days as I write this (April 17 through 20), Google's' Gemini API showed partial outages, and AI Studio logged partial outages continuously from April 2 through April 20. [15] That's three for three of the top-tier LLMs with customer-impacting problems over the past week. The pattern is consistent enough that it should inform architecture, not just post-mortems.
It's important to note that the failure modes for AI services are different from server failures. Servers either work or they don't. AI services degrade: quality drops, rate limits hit, new pricing tiers appear, context windows shrink during peak hours, reasoning depth gets quietly dialed back. [16] The service is still technically available. It's just worse. Detecting and responding to that kind of degradation requires a different architecture than traditional high-availability design.
The right architecture has the same shape as any redundant system: multiple providers, automatic failover, and the ability to route work to wherever it can be done best — or cheapest — at any given moment.
The Vendor Lock-in Trap
Most enterprise AI deployments are built on a single provider. One API key. One model. One pricing tier. One support relationship.
This made sense twelve to eighteen months ago, when the capability differences between providers were large enough to justify the dependency. It makes less sense now, and will make even less sense six months from now as convergence accelerates.
The lock-in risk is not primarily about the provider going bankrupt. It's subtler:
- Pricing power. A provider that owns your workflow can raise prices with confidence. You've already proven you depend on them. The negotiating position of an organization that can switch in 48 hours is fundamentally different from one that would need six months of re-integration work.
- Quality degradation without exit. When Anthropic quietly reduced reasoning depth for consumer sessions, organizations locked into Claude had no lever. You can complain on Reddit or you can vote with your workload. Only one of those changes the provider's behavior.
- Capability ceilings. No single model is best at everything. Code generation, long-document synthesis, structured data extraction, creative writing, multi-step reasoning — the rankings shift by task and by model version. An organization that can route each task to the best available tool gets better outputs than one that forces everything through a single model because re-integration is too expensive.
- Geopolitical and regulatory exposure. As AI regulation diverges across jurisdictions and as export controls on AI capabilities tighten, an organization dependent on a single provider inherits all of that provider's regulatory risk. Diversification is also risk management.
Build the Router
The practical answer to commoditization and to vendor lock-in is the same thing: a model router.
Not just for budget control — though that's real and important, as I've written about before. Token costs vary 10x across providers for equivalent capability. [17] Dispatching the right query to the right model at the right price is genuinely valuable. But the budget argument is the least interesting reason to build routing infrastructure. I've built. modelrouter that we're beginning to test now. When I first built it, it was primarily for budget control -- "how do we prevent end-of-month sticker shock?"
Build it for failover: when one provider is degraded or rate-limited, queries should route automatically to the next best option without human intervention. I have NOT built this in yet, but I can imagine a system that is monitoring results across users, monitoring status pages, and perhaps even monitoring news, and feeding those inputs into a scoring system that adjusts routing.
Build it for quality routing: longer, more complex reasoning tasks to the model with the best benchmark on that class of problem. Routine extraction and summarization to the cheapest model that clears the quality bar. Real-time interaction to the model with the lowest latency. I built this logic into my nanobot, and immediately extended my token runway without affecting quality (anecdotally).
Build it for antagonistic validation: run the same high-stakes output through two different models and compare. Where they agree, confidence goes up. Where they diverge, a human reviews. This is genuinely a different quality control architecture than single-model review — the models have different failure modes, different training biases, different blindspots. Making them check each other's work surfaces errors that neither would catch alone. I've built hooks into my router that could allow behavior like this, and I've used this strategy very productively with agents. A colleague was sharing this morning that he spins up multiple Tmux sessions with different agents running to test the same plugin in different contexts. That would work through my modelrouter architecture as designed.
Build it for portability: when the next model generation arrives and reshuffles the capability rankings, your workflows should be able to point at a new endpoint with minimal rework. I built a system to system into modelrouter to register models, and to establish failover paths if a model can't be reached. If opus is down, try sonnet. If sonnet is down, try chatGPT. If that's down, try gemini. Or push to locally-hosted open source models in ollama if you prefer.
The Skill Portability Proof
The convergence argument isn't just theoretical. I tested it directly.
The differences in how skills (reusable AI workflows) are defined across Claude Code, Gemini CLI, and Codex have gotten small enough that automated migration is straightforward. I built a tool — skillporter — that ports any coding-agent skill across four major platforms in a single pass: Claude Code, Codex, Antigravity, and Gemini CLI.
The fact that this tool is possible tells you something important. A year ago, the conceptual models were different enough that automated translation would have produced garbage. Today, the translation fidelity is high enough to be genuinely useful. The platforms have converged around similar enough patterns that the same underlying skill, expressed in each platform's native syntax, does the same work.
That's not a coincidence. It's the same convergence dynamic playing out at the tooling layer. And it's an early signal of where the model layer is headed.
The practical takeaway for enterprises building on AI: architect as if the specific model doesn't matter, because increasingly it won't. Your prompt engineering, your context management, your institutional knowledge of how to get quality outputs — those compound over time and belong to you (you the enterprise, or you the individual? That's a debate for another article and well worth thinking deeply about...stay tuned). The model you're running them on is becoming as interchangeable as the cloud region your servers run in.
What To Build For
Three things that compound value regardless of which model wins the next benchmark cycle:
Routing infrastructure. Even a simple implementation — a routing layer that can target different providers with different query types, and that can fail over when a provider is degraded — is worth building now. The harder the dependency is to remove later, the worse your negotiating position and your resilience.
Prompt and context libraries. Well-crafted prompts and context strategies are model-agnostic to a first approximation. The effort you put into specifying exactly what good output looks like, what context the model needs, and how to validate the result pays dividends every time the model underneath changes.
Evaluation harnesses. The organizations that know how to measure AI output quality — not just "does it look right" but "does it pass defined acceptance criteria" — are the ones who can confidently switch models when a better option appears. You can't port to a new model if you can't tell whether it's performing as well as the old one.
The Bottom Line
The frontier model arms race is producing a useful side effect: the models are getting good enough, fast enough, that the differences between them at the margin are shrinking. For most enterprise use cases, the specific model is becoming the wrong thing to optimize for. The right things to optimize for are workflow quality, routing flexibility, and the organizational competency to evaluate and switch.
Build as if the model is infrastructure — because it is. Commodity infrastructure. And the organizations that have treated it as such will have the leverage and the resilience that single-provider shops are going to spend the next cycle wishing they'd built.
Are you running multi-provider AI infrastructure, or still on a single model? Have you hit the vendor lock-in ceiling yet? I'd like to hear what's actually driving routing decisions in your organization.
If this resonated, here are some related articles:
- For the evidence that providers are already adjusting quality and capacity without announcement — the "reasoning_effort=25" discovery and what it means for your workflows: AI Shrinkflation: Your AI Model Was Quietly Dialed Back | Dev.to | Medium | Substack
- For why the model isn't the moat — and what actually is defensible when AI commoditizes your stack: Software Moats in the Age of AI: What's Actually Defensible? | Substack
- For why compute scarcity is real but the ROI math still favors adoption — the infrastructure cost argument that makes routing a financial priority: AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI | Substack
- For the practical case that tokens are currency and should be budgeted — the financial controls argument that pairs with routing infrastructure: The Token Bill Is Coming. Nobody's Ready for It.
References
Axios, OpenAI says DeepSeek may have used its outputs to train competing model, January 2025. (White House AI czar: "substantial evidence that DeepSeek distilled the knowledge out of OpenAI's models.")
arXiv, A Survey on Knowledge Distillation of Large Language Models, 2024. (Documents how LLM distillation pipelines use a "teacher" model's outputs as training data for smaller "student" models.)
Wikipedia, Anthropic. (Founding team of Dario Amodei, Daniela Amodei, and colleagues departed OpenAI carrying Constitutional AI and mechanistic interpretability methodology.)
Fortune, OpenAI and DeepMind Losing Engineers to Anthropic in a One-Sided Talent War, June 2025. (OpenAI engineers 8x more likely to leave for Anthropic than reverse; Meta poached 11 researchers from OpenAI, DeepMind, and Anthropic.)
Anthropic, Introducing the Model Context Protocol, November 25, 2024.
TechCrunch, OpenAI adopts rival Anthropic's standard for connecting AI models to data, March 26, 2025. (Sam Altman: "People love MCP and we are excited to add support across our products.")
The New Stack, Google Embraces MCP, 2025. (Google DeepMind CEO Demis Hassabis publicly endorsed MCP in April 2025; formal Google Cloud MCP support announced December 10, 2025.)
Microsoft Copilot Studio Blog, Model Context Protocol (MCP) is now generally available in Microsoft Copilot Studio, May 29, 2025. (GA announcement following Microsoft Build 2025.)
Anthropic, Donating the Model Context Protocol and establishing the Agentic AI Foundation, December 9, 2025. (Co-founders: Anthropic, Block, OpenAI; supporting members: Google, Microsoft, AWS, Cloudflare, Bloomberg.)
MCP Blog, One Year of MCP, November 2025. (SDK downloads grew from ~100K/month at launch to 97M/month by late 2025; 10,000+ active servers.)
The New Stack, Why the Model Context Protocol Won, December 7, 2025. (Analysis of MCP's industry-wide adoption trajectory.)
McKinsey, The State of AI: Global Survey 2025, 2025. (88% of organizations regularly use AI in at least one business function; 72% have deployed generative AI, up from 33% in 2024.)
Anthropic Status, Incident: Increased errors across Claude services, April 15, 2026. (~3-hour critical outage affecting Claude.ai, Claude API, Claude Code, and platform console simultaneously.)
OpenAI Status, Users unable to load ChatGPT, Codex and API Platform, April 20, 2026. (~2-hour outage affecting ChatGPT, Codex, and API simultaneously.)
Google AI Studio Status, Service status history, April 2026. (Google AI Studio partial outages April 2–20, 2026; Gemini API partial outages April 17–20, 2026.)
GitHub / Hacker News, Claude Code reasoning depth drop — 67% reduction documented across 6,852 sessions, April 2026. (Analysis by Stella Laurenzo, AMD AI group; reasoning_effort parameter set to 25/100 in consumer sessions.)
Epoch AI, LLM inference prices have fallen rapidly but unequally across tasks, 2025. (Wide cost variance across providers for equivalent capability.)
Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude and Codex as AI collaborators.
Top comments (0)