DEV Community

Cover image for Designing Tech Stacks for AI-Generated Code
Aleks
Aleks

Posted on

Designing Tech Stacks for AI-Generated Code

Something strange is happening in backend engineering. The tools writing our code are getting smarter every quarter, but the infrastructure those tools have to target hasn't changed in a decade. We're pointing increasingly capable AI agents at the same sprawling, multi-service architectures we built for human developers, and then wondering why the output is fragile.

The conversation around AI-assisted development has been almost entirely about the models. Which agent is best. Which IDE integration is fastest. Which model scores highest on SWE-bench. But there's a quieter, more consequential question that almost nobody is asking: what should the target architecture look like when the developer is an AI?

The mismatch nobody talks about

Here's the core tension. Modern backend architecture evolved to solve human organizational problems. Microservices exist because teams needed to deploy independently. ORMs exist because developers didn't want to write SQL. Docker exists because "works on my machine" was destroying release cycles. Kubernetes exists because container orchestration is hard.

None of these tools were designed with the assumption that code would be written by a language model. They were designed for humans working in teams across long time horizons with institutional knowledge about how the pieces fit together.

An AI coding agent doesn't have institutional knowledge. It has a context window. And every additional service in your stack, every configuration file, every implicit dependency between systems, consumes that context window and introduces another surface where the agent can hallucinate.

I've watched Claude Code try to set up a standard production stack: Express, Prisma, Postgres, Redis, a WebSocket server, and Docker Compose. It gets each individual piece maybe 80% right. But the integration between them is where things collapse. Environment variables don't match between services. The ORM generates a migration that conflicts with the seed data. The cache invalidation logic doesn't account for the way the WebSocket server reads from the database. Each bug is small. Together they cost you an afternoon.

The model isn't the problem. The architecture is the problem.

How the industry is responding

Several approaches are emerging to close this gap, each with different tradeoffs.

Backend-as-a-Service platforms

Supabase, Firebase, Appwrite, and Convex all reduce surface area by bundling database, auth, storage, and functions into a managed platform. The developer writes application logic. The platform handles infrastructure.

This works well for AI agents because there's less to configure. An agent writing Supabase code really only needs to know the client SDK and the database schema. It doesn't need to reason about connection pooling or deployment manifests.

The tradeoff is control. You're renting someone else's architecture. When you need something the platform doesn't support, you either work around it or migrate, and migration from a BaaS is notoriously painful. The other tradeoff is performance ceiling. When your database, your auth layer, and your edge functions are all separate services behind a network boundary, there's a latency floor you can't optimize below no matter how good your queries are.

Infrastructure-from-code tools

Encore, SST, and Pulumi's newer offerings let you declare infrastructure as part of your application code. Instead of writing Terraform separately, you annotate your TypeScript with infrastructure semantics and the tool provisions everything.

This is clever because it keeps the infrastructure definition close to the application logic, which means an AI agent reading your codebase can see both at once. Fewer files to reason about. Fewer implicit dependencies.

The tradeoff is that you're still running multiple services. The code might be co-located, but at runtime your database is still a separate process from your API server, which is still a separate process from your cache. The deployment is simplified but the architecture is not. An agent can more easily set things up, but the same class of integration bugs still exists once the system is running.

Declarative frameworks

NestJS, Redwood, and Blitz collapse some of the decision space by being opinionated about project structure. They pick the ORM, the testing framework, the file layout. An agent working in a Redwood project has fewer choices to make, which means fewer wrong choices.

But these are still frameworks, not runtimes. They sit on top of the same multi-service architecture underneath. Your Redwood app still needs a database connection, still needs a deployment target, still needs infrastructure decisions that the framework doesn't make for you.

Unified runtimes

This is the approach I find most architecturally interesting. Instead of bundling services together at the management layer or the code layer, unified runtimes actually fuse them at the process level. Database, cache, application logic, and messaging run in the same memory space.

Harper is the most developed example of this pattern I've seen. Your data model is a GraphQL schema. REST APIs are generated automatically from that schema. Custom endpoints are JavaScript classes that extend your tables. Real-time messaging is built in via WebSockets and MQTT. Caching isn't a separate layer because all data access is already in-memory.

The entire application is three files. A schema, a config, and a resources module. That's not a simplification of the developer experience on top of hidden complexity. That's the actual architecture. There is no separate database process. There is no Redis instance. There is no message broker to configure.

For AI agents, this is a fundamentally different target. The agent doesn't need to reason about how services communicate because there's only one service. The agent doesn't need to manage connection strings because there are no connections. The schema is the source of truth for the data model, the API, and the access patterns simultaneously.

The tradeoff is ecosystem lock-in. You're not using Postgres. You're not using standard ORMs. Your team needs to learn Harper's model, and if you decide to leave, you're migrating data and rewriting your API layer. That's a real cost.

But there's something compelling about where this goes long-term. Harper's deployment platform, Fabric, distributes your application across a global cluster by selecting regions and latency targets. Because the runtime knows everything about your application from those three declarative files, it can make deployment decisions that would require significant DevOps expertise in a traditional stack. The gap between "I wrote the code" and "it's running in production across three continents" collapses to a single command.

When you project this forward into a world where agents are writing most of the initial code, the combination of a minimal-surface-area runtime and an infrastructure-aware deployment platform starts to look like it was designed for this moment even though it predates the current AI wave.

What makes a stack AI-friendly

Across all of these approaches, a pattern emerges. The stacks that work best with AI agents share a few properties:

Fewer files that matter. Every file in your project is context the agent needs to hold. A three-file application is easier to reason about than a thirty-file application. This isn't about lines of code. It's about the number of distinct configuration surfaces.

Explicit over implicit. When your data model is declared in a schema that generates the API, the agent can see the relationship between data and endpoints. When your API is hand-wired through a routing layer that references a service layer that references a repository layer that references an ORM, the agent has to trace four levels of indirection to understand what a GET request returns.

Declarative over imperative. Telling the system what you want rather than how to do it means the agent makes fewer implementation decisions. Fewer decisions means fewer wrong decisions. A schema annotation like @export that generates a REST endpoint is one line the agent needs to write. A hand-coded controller with validation, error handling, and serialization is forty lines the agent needs to get right.

Co-located concerns. When your database schema, your API definition, your caching behavior, and your deployment config all live in the same place or are derived from the same source, changes propagate automatically. The agent doesn't need to remember to update the cache invalidation logic when it changes the data model because they're the same thing.

Deterministic deployment. If the deployment system can derive everything it needs from the application definition, the agent never needs to touch infrastructure config. If deploying requires a separate Dockerfile, Kubernetes manifest, and CI pipeline, the agent needs to maintain consistency across all of them.

The meta-shift

I think we're in the early stages of a larger architectural correction. For twenty years, we've been decomposing backend systems into smaller, more specialized pieces. Separate database. Separate cache. Separate message broker. Separate API gateway. Separate auth service. Each piece is independently excellent. The complexity lives in the composition.

That composition complexity was manageable when humans were doing all the wiring, because humans carry implicit context about how the pieces relate. An experienced engineer knows that when you change the user schema, you also need to update the cache key format and the WebSocket subscription filter. That knowledge lives in their head, not in the codebase.

AI agents don't have that implicit context. They have what's in the files. And if the relationship between your schema change and your cache invalidation logic is implicit, mediated through three layers of abstraction across two services, the agent will miss it. Not because it's stupid, but because the architecture made the dependency invisible.

The stacks that will win in the AI-assisted development era are those that make dependencies explicit, keep surface area small, and derive as much as possible from a single source of truth. Whether that's a BaaS, an infrastructure-from-code tool, or a unified runtime, the direction is the same: less composition, more declaration.

I'm personally most bullish on the unified runtime approach because it solves the problem at the lowest layer rather than papering over it. But the landscape is moving fast, and the right answer probably depends on where your team is starting from and what constraints you're working within.

The one thing I'm confident about: the era of evaluating backend tools purely on human ergonomics is ending. "How well does an AI agent perform against this architecture" is becoming a first-class evaluation criterion, and the stacks that can't answer that question well are going to feel increasingly dated.

Top comments (0)