Keith MacKay

Posted on Jun 18 • Originally published at tlcmentor.substack.com

Many Are Building Cathedrals on Quicksand

#ai #architecture #infrastructure #technicaldebt

Many Are Building Cathedrals on Quicksand

The foundations of AI development shift every quarter. These are the architectural choices that outlast the churn.

Medieval cathedrals were designed to outlast their builders. The architects who laid the first stones at Notre-Dame knew they'd never see it finished. They planned in centuries.

We're doing the opposite. We're building software on foundations that shift every quarter, with vendor relationships that treat genuinely competitive commercial providers as neutral infrastructure, and with code that hard-codes behaviors that will be deprecated before the next sprint cycle.

GPT-4 was state of the art in early 2023. By late 2024, it was middle of the pack [1]. Entire startups built on specific model behaviors woke up to find their core assumption was gone. Not wrong. Not deprecated with a migration guide. Just: gone, or quietly changed, or superseded by something so different the old prompts didn't work anymore.

That's the terrain we're traversing as leaders.

The question isn't whether the ground will shift. It's whether your architecture can handle it when it does.

The Problem with Betting on a Foundation That's Still Being Poured

Here's what the past several years have looked like from where I sit:

2022: GPT-3 was the obvious choice. Build on it.
2023: GPT-4 changes everything. Rebuild or fall behind.
2023 (late): Claude 2, open-source models, local inference. Suddenly the answer wasn't obvious.
2024: GPT-4o, Claude 3 Opus, Gemini Ultra, Llama 3. All competitive. All different.
2025: Reasoning models, multimodal, agents. The architecture question gets much harder.
2026: Tools and harnesses are maturing, workflows are settling, swarms are better at parallelizing tasks, teams are beginning to think about tokenomics. Model is becoming a commodity -- local open-source models are much closer to frontier model capabilities. China's coordination across its AI ecosystem is showing real gains against the US AI ecosystem.

Every one of those transitions created winners and losers, and the losers were almost always the teams that had built the most tightly-coupled solutions to a specific model's API.

Not because those teams were bad engineers. Because they were optimizing for the wrong thing. They were building for today's foundation instead of building for foundation-change.

The deprecation notices tell the story. Anthropic's stated minimum notice window before a model is retired is 60 days -- and several recent models have hit exactly that floor [2]. Claude Sonnet 4 and Claude Opus 4 went from launch to complete retirement in under a year. OpenAI's entire Assistants API product -- a structural foundation many teams built on -- is being removed in August 2026, requiring a complete migration to the Responses API [3]. This isn't a deprecation. It's a teardown with a deadline.

The release pace compounds it. Frontier model releases arrived roughly once every 37 days in 2023. By 2026, the interval had compressed to roughly every 11 days [4]. The ground doesn't just move. It moves faster every year, every quarter, every month, every week.

The cloud-native movement figured this out the hard way a decade ago. The teams that won didn't write code that assumed AWS and only AWS forever. They wrote code that treated AWS as a utility, abstracted behind interfaces they controlled, using APIs that could accommodate hybrid cloud environments. In the mergers-and-acquisitions deals I see, limiting acquisition targets to companies using the same cloud provider as the buyer is rarely an acceptable constraint. This means using containerized applications, database abstraction layers, and vendor-agnostic infrastructure-as-code where possible.

Same lesson. Different decade. Somehow we're learning it again from scratch. What's old becomes new again.

What Actually Changes vs. What Stays Stable

A useful (and simple) mental model that works here is the following:

Some concepts in AI (or any broad technology category) are stable. Some are not. Your architecture should only hard-code the stable ones.

Stable: tokens, attention mechanisms, context windows as a concept, embeddings as a concept, the basic prompt-completion pattern, retrieval-augmented generation as an approach to prompt augmentation.

Unstable: specific API parameters, model-specific prompt formats, context window sizes (they keep growing, though max usable window for predictable results has not grown much...YET), pricing structures, rate limits, specific model behaviors that aren't documented as guarantees, fine-tuning APIs, function-calling syntax.

When engineers hard-code model-specific behaviors into business logic, they're writing code with an unknown (but near-certain-to-happen) expiration date. However, if they abstract those behaviors behind interfaces their team controls, they're buying themselves optionality.

Optionality is the actual product you're building when you build model-agnostic infrastructure.

One concrete example: prompt templates. Teams that wrote prompts directly into application code, formatted specifically for GPT-4's preferred patterns, had real migration work to do when they needed to switch. Teams that externalized prompts into configuration, with a thin layer that could reformat them per model, had a much easier time. Same underlying logic. Very different operational posture.

The Vendor Lock-In Problem (Again)

OpenAI, Anthropic, and Google are not neutral infrastructure providers.

I don't say that to be critical of any of them. They're building remarkable technology. But they have commercial interests, competitive pressures, and strategic priorities that are not aligned with your need for stable, predictable infrastructure. Treating them like AWS S3 is strategically naive.

AWS S3 has maintained complete API backward compatibility since its 2006 launch -- twenty years. Their own 20th-anniversary post states it plainly: "the code you wrote for S3 in 2006 still works today, unchanged" [5]. That's because AWS built S3 as durable utility infrastructure, and their business model depends on your data staying there.

The frontier model providers are in a race. They're iterating fast because they have to. They're changing behaviors, deprecating models, shifting pricing, and launching new capabilities on a schedule that serves their competitive position, not your deployment stability.

The teams treating AI providers as utilities are building abstraction layers they control. This means:

An LLM gateway or router that sits between your application and the model providers.
A model-agnostic interface that lets you swap the underlying model without touching business logic.
Evaluation frameworks that work across models so you can make the switch decision on evidence instead of intuition.

For a router with model-agnostic interface that Claude and I wrote in Rust with budget controls for individuals/teams/projects, OTEL observability built in, hooks (to add DLP, evals, or other checks), and full command-line admin capability for automation or integration, see https://github.com/keithmackay/modelrouter.

The emergence of MCP (Model Context Protocol) is itself evidence the industry arrived at this conclusion independently. Anthropic introduced MCP in November 2024 and donated it to the Linux Foundation for vendor-neutral governance in December 2025, by which point it had 97 million monthly SDK downloads and had been adopted by OpenAI, Google DeepMind, and Microsoft [6]. MCP standardizes how AI models connect to external tools and data sources. That's real and useful. But what does it solve? MCP addresses tool integration portability. It doesn't standardize prompt behavior, context handling, reasoning model APIs, or deprecation schedules. The abstraction layer that sits between your application and which model handles a request still needs to be built by your team.

The teams that haven't figured this out yet are the ones where switching providers means a multi-month engineering project. That's not a technical problem. It's an architectural choice that's going to compound.

The Case for Staying Deep

Before you build the abstraction layer, know what you're giving up.

Claude responds better to XML-structured prompts. GPT-4.x responds better to JSON-delimited instructions. Gemini handles context differently. When you write prompts to the lowest common denominator across models, or maintain per-model variants (which is just hidden coupling), you're trading optimization headroom for portability.

Each gateway hop also adds latency. For real-time voice interfaces or sub-200ms UX targets, that overhead is a real engineering constraint, not a theoretical one [7].

And there's a perverse argument from pricing history. GPT-4 tokens fell roughly 9x in 17 months -- from $30/million input tokens at launch to around $3/million by late 2024, without any migration required [8]. Teams that stayed deep on GPT-4 during that period captured the cost compression at zero switching cost. The question is whether the next price move works in your favor, and whether you can afford to wait.

The model-agnostic argument isn't "abstraction layers have no cost." They do. The argument is that the cost of unplanned migration without them is higher -- and that the migration event is inevitable. You're only choosing whether you're ready for it. Given that Anthropic's minimum deprecation notice is 60 days and OpenAI's Assistants API is disappearing entirely, "we'll deal with it when we need to" is a plan that has already failed for a lot of teams.

What Model-Agnostic Architecture Looks Like

You don't need to over-engineer this. The goal is the right abstraction layers, not abstraction for its own sake.

The LLM gateway layer. A single internal service that all your AI calls go through. It handles routing, rate limiting, cost tracking, model selection, and failover. Your application code doesn't know or care whether it's talking to GPT-4o or Claude 3.5 or a local Llama deployment. That's the gateway's job. Teams that built this early have a meaningful operational advantage right now. The market recognized this fast: LiteLLM, the most widely deployed open-source LLM proxy, has proxied over a billion requests and has nearly 48,000 GitHub stars as of mid-2026 [9]. Gartner predicts that by 2028, 70% of organizations building multi-LLM applications will use AI gateway capabilities -- up from less than 5% in 2024 [10].

Prompt portability. Externalize your prompts. Version control them separately from your application code. Build a thin translation layer that can reformat them for different model families. This sounds like overhead until the day you need to migrate, and then it sounds like foresight [11].

Model-agnostic evaluation. How do you know if the new model is better for your use case? You need evals that aren't written assuming a specific model's output format. Build your quality benchmarks against the behavior you actually care about, not against what GPT-4 happened to produce [12].

Avoid the model-specific feature trap. Every frontier model has features that look compelling and are only available on that model. Some of them are worth using. But every time you build a core business capability on a feature that's only available from one provider, you're writing a ransom note to yourself.

The test: if Anthropic or OpenAI raised prices by 5x tomorrow, how long would it take you to switch? If the answer is more than a few weeks, you've got architectural debt that's quietly accumulating.

The Organizations Getting This Right

They have a few things in common.

They treat AI infrastructure like they treat cloud infrastructure: with abstraction layers, provider diversity, and a clear strategy for avoiding single-vendor dependency. They're not anti-any-vendor. They're pro-optionality.
They invest in internal capability around the stable concepts: understanding embeddings, retrieval, context management, evaluation frameworks. The engineers who understand why things work are much less disrupted by changes in how things work.
They run structured experiments when new models arrive rather than either ignoring them or immediately migrating. They have the evaluation infrastructure to make that decision on evidence. They know which models perform better for which task types in their specific context, not just what the benchmarks say.
And they're honest about the tradeoff. Model-agnostic architecture has real costs. It's more engineering work upfront. Some model-specific optimizations aren't available through abstraction layers. The organizations doing this well accept those costs explicitly, because they've done the math on the alternative.

The Bottom Line

We are early in a long infrastructure transition. The foundational models will keep changing. The toolchains will keep evolving. The vendors will keep competing, which means the landscape will keep shifting.

The cathedral builders who got it right designed for the long arc. Stone that could be added to. Foundations that could bear weight they couldn't yet imagine. Architecture that survived the deaths of the architects.

You can't build that if you're optimizing only for today's model and today's API.

The teams that will look smart in three years are building abstraction layers now. They're externalizing configuration, investing in evals, treating vendors as utilities, and developing engineers who understand the stable underlying concepts instead of just the current API.

The quicksand is real, but it has a texture that experienced developers recognize. You can learn the signs and build on pylons rather than just hoping the ground holds.

What's your current AI infrastructure posture: utility abstraction or single-vendor deep integration? Have you had to migrate yet? I'm curious what the migration cost looked like in practice. Share your experience in the comments.

References

If this resonated, here are some related articles and resources:

modelrouter - an Open Source, Rust-based model router with OTEL observability, tokenomics cost control management, and command-line control of all features for automated admin: https://github.com/keithmackay/modelrouter
For the cost side of the infrastructure equation -- why AI infrastructure scarcity is driving up spend even as model prices fall, and why ROI still wins: On LinkedIn: AI Infrastructure Scarcity is Raising Costs, but AI Usage Will Still Provide Unbeatable ROI | On Substack | On Medium
For how the abstraction shift changes what developers actually build and maintain -- when the code is written for other AI systems to read, not humans: On LinkedIn: When AI Stops Writing Code for Humans | On Substack | On Medium
For a look at which competitive advantages actually survive when AI commoditizes the software layer -- directly connected to the "optionality is the product" argument here: On LinkedIn: Software Moats in the Age of AI: What's Actually Defensible? | On Medium
For how the return to specification-driven development mirrors the architectural discipline this article argues for: On LinkedIn: The Irony of AI Development: How Context Engineering Is Taking Us Back to Waterfall | On Substack | On Medium

Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude Code and Codex as AI collaborators.

DEV Community

Many Are Building Cathedrals on Quicksand

Many Are Building Cathedrals on Quicksand

The Problem with Betting on a Foundation That's Still Being Poured

What Actually Changes vs. What Stays Stable

The Vendor Lock-In Problem (Again)

The Case for Staying Deep

What Model-Agnostic Architecture Looks Like

The Organizations Getting This Right

The Bottom Line

References

Top comments (0)