Something interesting is happening in backend engineering. The tools writing our code are getting smarter every month, but the infrastructure those tools have to target hasn't changed much in a decade. We're pointing increasingly capable AI agents at the same multi-service architectures we built for human developers, and in many cases, the output is more fragile than it needs to be.
The conversation around AI-assisted development has been almost entirely about the models. Which agent is best. Which IDE integration is fastest. Which model scores highest on SWE-bench. But there's a quieter, more consequential question that fewer people are asking: what should the target architecture look like when AI is writing a growing share of the code?
I don't think this requires throwing out everything we know about backend engineering. But I do think it's worth examining which architectural patterns help AI agents succeed and which ones create unnecessary friction.
Where the friction actually lives
Here's the core tension. Modern backend architecture evolved to solve human organizational problems. Microservices exist because teams needed to deploy independently. ORMs exist because developers didn't want to write SQL. Docker exists because "works on my machine" was destroying release cycles. Kubernetes exists because container orchestration is hard.
These tools work. Teams ship production software on them every day and will continue to do so. But none of them were designed with the assumption that a language model would be writing and modifying the code. They were designed for humans working in teams across long time horizons with institutional knowledge about how the pieces fit together.
An AI coding agent doesn't have institutional knowledge. It has a context window. And every additional service in your stack, every configuration file, every implicit dependency between systems, consumes that context window and introduces another surface where the agent can make mistakes.
I've watched Claude Code try to set up a standard production stack: Express, Prisma, Postgres, Redis, a WebSocket server, and Docker Compose. It gets each individual piece maybe 80% right. But the integration between them is where things get shaky. Environment variables don't match between services. The ORM generates a migration that conflicts with the seed data. The cache invalidation logic doesn't account for the way the WebSocket server reads from the database. Each bug is small. Together they cost you an afternoon.
Can strong teams mitigate this with existing practices? Absolutely. Good tests, thorough code review, and solid CI pipelines catch most of these issues regardless of whether the code was written by a human or an AI. The question isn't whether traditional practices still work. They do. The question is whether the architecture itself could make the AI's job easier in the first place.
How teams are responding
The industry hasn't converged on a single answer yet. Teams are at very different stages of adoption, from simply adding Copilot to their existing workflow to rethinking their infrastructure from the ground up. Many are somewhere in between, and that's fine. But several patterns are emerging that are worth understanding.
Adding AI to existing stacks
The most common approach today is incremental. Keep your existing architecture. Add an AI coding assistant. Use it for boilerplate, test generation, code review, and documentation. Improve your CI pipeline to catch AI-introduced errors.
This works, and it's the right starting point for most teams. You get productivity gains without migration risk. The ceiling on this approach is that the AI is still targeting an architecture that wasn't designed for it, so you're relying more heavily on validation layers (tests, reviews, linting) to catch the integration mistakes that the architecture's complexity makes likely.
There's nothing wrong with this. It's where most of the industry is today and where it will remain for a while.
Backend-as-a-Service platforms
Supabase, Firebase, Appwrite, and Convex all reduce surface area by bundling database, auth, storage, and functions into a managed platform. The developer writes application logic. The platform handles infrastructure.
This works well for AI agents because there's less to configure. An agent writing Supabase code really only needs to know the client SDK and the database schema. It doesn't need to reason about connection pooling or deployment manifests.
The tradeoff is control. You're renting someone else's architecture. When you need something the platform doesn't support, you either work around it or migrate, and migration from a BaaS is notoriously painful. The other tradeoff is performance ceiling. When your database, your auth layer, and your edge functions are all separate services behind a network boundary, there's a latency floor you can't optimize below, no matter how good your queries are.
Infrastructure-from-code tools
Encore, SST, and Pulumi's newer offerings let you declare infrastructure as part of your application code. Instead of writing Terraform separately, you annotate your TypeScript with infrastructure semantics and the tool provisions everything.
This is clever because it keeps the infrastructure definition close to the application logic, which means an AI agent reading your codebase can see both at once. Fewer files to reason about. Fewer implicit dependencies.
The tradeoff is that you're still running multiple services at runtime. The code might be co-located, but your database is still a separate process from your API server, which is still a separate process from your cache. The deployment is simplified, but the runtime architecture is not. An agent can more easily set things up, but the same class of integration bugs still exists once the system is running.
Declarative frameworks
NestJS, Redwood, and Blitz collapse some of the decision space by being opinionated about project structure. They pick the ORM, the testing framework, the file layout. An agent working in a Redwood project has fewer choices to make, which means fewer wrong choices.
But these are still frameworks, not runtimes. They sit on top of the same multi-service architecture underneath. Your Redwood app still needs a database connection, still needs a deployment target, still needs infrastructure decisions that the framework doesn't make for you.
Unified runtimes
This is the approach I find most architecturally interesting, though it's also the most opinionated and the earliest in adoption. Instead of bundling services together at the management layer or the code layer, unified runtimes actually fuse them at the process level. Database, cache, application logic, and messaging run in the same memory space.
Harper is the most developed example of this pattern I've come across. Your data model is a GraphQL schema. REST APIs are generated automatically from that schema. Custom endpoints are JavaScript classes that extend your tables. Real-time messaging is built in via WebSockets, MQTT, and server-sent events. Caching isn't a separate layer because all data access is already in memory and persistent on disk (really, solid-state drives).
The entire application is three files. A schema, a config, and a resources module. That's not a simplification of the developer experience on top of hidden complexity. That's the actual architecture. There is no separate database process. There is no Redis instance. There is no message broker to configure.
For AI agents, this is a fundamentally different target. The agent doesn't need to reason about how services communicate because there's only one service. The agent doesn't need to manage connection strings because there are no connections. The schema is the source of truth for the data model, the API, and the access patterns simultaneously.
The tradeoff is real. You're not using Postgres. You're not using standard ORMs. Your team needs to learn Harper's model, and if you decide to leave, you're migrating data and rewriting your API layer. That's a meaningful cost, and for teams with deep investment in their current stack, it may not be worth it.
But there's something worth watching in where this goes long-term. Harper's deployment platform, Harper Fabric, distributes your application across a global cluster by selecting regions and latency targets. Because the runtime knows everything about your application from those three declarative files, it can make deployment decisions that would require significant DevOps expertise in a traditional stack. The gap between "I wrote the code" and "it's running in production across three continents" collapses to a single command.
When you project this forward into a world where AI agents are writing a larger share of initial code, the combination of a minimal-surface-area runtime and an infrastructure-aware deployment platform is compelling. Whether it becomes a dominant pattern or a niche one is still an open question, but it's worth tracking.
What makes a stack AI-friendly
Regardless of which approach you take, a pattern emerges. The stacks where AI agents produce the best output share a few properties. These aren't requirements. They're design principles that reduce the surface area for AI-introduced errors.
Fewer files that matter. Every file in your project is context the agent needs to hold. A three-file application is easier to reason about than a thirty-file application. This isn't about lines of code. It's about the number of distinct configuration surfaces.
Explicit over implicit. When your data model is declared in a schema that generates the API, the agent can see the relationship between data and endpoints. When your API is hand-wired through a routing layer that references a service layer that references a repository layer that references an ORM, the agent has to trace four levels of indirection to understand what a GET request returns.
Declarative over imperative. Telling the system what you want rather than how to do it means the agent makes fewer implementation decisions. Fewer decisions means fewer wrong decisions. A unified runtime's schema annotation like @export that generates a REST endpoint is one line the agent needs to write. Compared to a hand-coded controller with validation, error handling, and serialization, which is forty lines, the agent needs to get right.
Co-located concerns. When your database schema, your API definition, your caching behavior, and your deployment config all live in the same place or are derived from the same source, changes propagate automatically. The agent doesn't need to remember to update the cache invalidation logic when it changes the data model because they're the same thing.
Deterministic deployment. If the deployment system can derive everything it needs from the application definition, the agent never needs to touch infrastructure config. If deploying requires a separate Dockerfile, Kubernetes manifest, and CI pipeline, the agent needs to maintain consistency across all of them.
None of this means your existing stack is broken. If you have strong engineering practices, good test coverage, and a team that reviews AI output carefully, you can get excellent results with a traditional architecture. These principles just describe what makes the AI's job easier, which in turn means less time spent on review and debugging.
The shift that's forming
I think we're in the early stages of a broader evolution in how developers and AI collaborate on backend systems. Not a revolution where everything gets replaced overnight, but a gradual shift in what we optimize for when choosing tools.
For twenty years, we've been decomposing backend systems into smaller, more specialized pieces. Separate database. Separate cache. Separate message broker. Separate API gateway. Separate auth service. Each piece is independently excellent. The complexity lives in the composition.
That composition complexity was manageable when humans were doing all the wiring, because humans carry implicit context about how the pieces relate. An experienced engineer knows that when you change the user schema, you also need to update the cache key format and the WebSocket subscription filter. That knowledge lives in their head, not in the codebase.
AI agents don't carry that implicit context. They have what's in the files. And if the relationship between your schema change and your cache invalidation logic is implicit, mediated through three layers of abstraction across two services, the agent is more likely to miss it. Not because it's incapable, but because the architecture made the dependency invisible.
The direction I see forming is toward stacks that make dependencies explicit, keep surface area manageable, and derive as much as possible from a single source of truth. For some teams that means a BaaS. For others, it means infrastructure-from-code. For teams starting fresh, a unified runtime is worth serious consideration.
This doesn't mean traditional stacks are going away. Postgres isn't dying. Kubernetes still has its place. Microservices will continue to be the right answer for large organizations with complex deployment requirements. But the evaluation criteria for choosing tools are expanding. "How productive is a human developer on this stack" is no longer the only question. "How well does an AI agent perform against this architecture" is becoming a real factor in the decision, and the stacks that score well on both dimensions will increasingly have an advantage.
We're early. The patterns are still forming. But the teams paying attention to this question now will have a head start when the rest of the industry catches up.
Top comments (0)