DEV Community

Cover image for How We Actually Ship Complex Systems with AI Agents
Ali Baqbani
Ali Baqbani

Posted on • Originally published at alibaqbani.hashnode.dev

How We Actually Ship Complex Systems with AI Agents

Part 1 of 3: Why the old playbook doesn't work anymore


Every founder I know has the same story. You estimate two weeks. You ship in six. That "simple" payment integration turns into three weeks of chasing edge cases nobody told you about.

Been there. Done that. Too many times.

For years, we just accepted it: complexity = time. Want to build something solid? Slow down. Need to move fast? Cut corners and hope nothing breaks in prod.

But something changed around 2025. And it's why I'm writing this.

AI agents aren't just fancy autocomplete anymore. If you set them up right (and I mean really constrain them properly) they can build actual working systems. Not snippets. Not boilerplate. Real business logic with tests that pass.

The trick isn't just using AI. It's using it in a way that doesn't fall apart the moment you need something complex. I think of it as "contract-first" development: you lock down what you're building before any code gets written.

Let me explain what I mean.


Why Just Chatting with AI Doesn't Scale

I've watched teams burn weeks on this.

They open ChatGPT or Claude, type "write me a function to process payments," copy-paste whatever comes back, and move on. Seems fine. Then they ask for another piece. And another.

Three weeks later, the checkout flow breaks under any real load. Business logic is scattered across a dozen files. Half of it has no tests. And that "simple function" has grown tentacles into everything.

Look, you can't just chat with an AI and expect production-quality code. It doesn't work that way.

What does work: define everything upfront. What the system does, what it doesn't do, how errors work, what the data looks like. Lock that down first. Then let the agent fill in the implementation inside those walls.

That's really the whole idea. Humans set the boundaries. AI does the work inside them.


Think of It Like a Team, Not One Person

Here's what clicked for me.

Most people picture "using AI for coding" as one conversation, one agent, one output. But that limits you pretty quickly. What actually works is more like having multiple specialists, except they work at machine speed.

How the pieces fit together

Four-stage AI agent pipeline: Planner → Implementer → Reviewer → Tester, each passing concrete artifacts to the next.

The planner figures out what to build. The implementer writes code. The reviewer catches problems. The tester proves it works. Each one hands off something concrete to the next. No assumptions about what came before.

When do you actually need this?

If you're just writing one function in one file, a single agent is fine. But once you're building something with multiple moving parts, you need different agents handling different jobs. Otherwise you end up with an agent that's trying to do too much and doing none of it well.

Quick example

Say you're building an order service:

  1. Planner says: "Break this into API spec, domain models, database layer, transaction handler, test suite"
  2. One agent generates the OpenAPI spec
  3. Reviewer checks it against your security rules
  4. Another agent implements the actual code
  5. Tester writes tests based on the spec
  6. Final review before merge

Each step produces something concrete. No agent has to guess what the last one did.


Lock the Contract First

So what actually works in practice?

Lock everything down before writing code. Sounds obvious, but almost nobody does it.

Before any implementation happens, you generate strict API specs. OpenAPI if you're doing REST, Protobuf if you're doing gRPC. These become the rules the agent has to follow.

Your job as the architect: look at the domain model and make sure it makes sense. Does User connect to Subscription the right way? Does this actually solve the business problem?

The agent's job: take your model and fill in all the boring details. Error responses, edge cases, data types. All the stuff that's tedious for humans but easy to get wrong.

The point is: design the system to handle complexity from day one, instead of bolting it on later when things break.


The takeaway

Lock the contract first. Then let agents fill in the code.


What's next

Part 2: Setting up context, verification layers, and handling failures.

Part 3: When to step in, common mistakes, and tools that work.


This is Part 1 of the "Building Complex Systems with AI Agents" series.

Top comments (0)