TL;DR — AI loves to design "enterprise-grade" systems for you: message queue, distributed lock, state machine service, scheduler, monitoring bus. Half of them aren't real. The cheapest filter I know: before letting AI design anything, walk one concrete scenario through the system. Whatever shows up in the scenario is real. Whatever doesn't — delete. This week it took me from a 5-component design down to 3 — and surfaced one critical component AI had missed entirely.
What I was building
This week I was extending aming-claw (an open-source AI code governance tool I'm building) to support parallel multi-agent development: multiple AI agents working on the same project simultaneously, each on its own branch, all of it merging back into trunk.
I asked AI to help me design it.
It came back fast. Confident. Five components:
- Message queue (so tasks can line up)
- Distributed lock (so agents don't step on each other)
- State machine service (so we track progress)
- Task scheduler (so we know what runs when)
- Monitoring bus (so we see what's happening)
Each component had a paragraph of justification. The diagram looked impressive. The names sounded right.
I almost just said "ok, build it."
Why I didn't
A thing I've learned working with AI on architecture: AI doesn't filter for necessity. It filters for plausibility. The components it lists are real things real systems have — they're just not necessarily things your system needs.
So instead of letting it design the system, I did one thing:
I walked a concrete scenario through the system before agreeing to anything.
Here's an honest framing: nobody can look at a 5-component design and immediately tell you which 2 are load-bearing. AI can't. Most engineers reading this can't, not on inspection.
The good news:
You don't need to know what to design. You just need to walk one scenario.
The scenario does the filtering for you.
Scenario 1: five tasks with dependencies
I started with the most boring scenario I could think of:
Five AI agents working in parallel. Each one on its own branch. The tasks have a dependency chain:
1 → 2 → 3 → 4 → 5. Task 2 needs what task 1 built. Task 5 needs everything before it.
I walked through what the system has to do:
- Five tasks running in parallel — they need to queue for merging. OK, "message queue" was real.
- BUT — they have to merge in dependency order. Not first-come-first-served. So a plain FIFO message queue isn't enough. It has to be an ordered queue.
Already, one component refined. "Message queue" → "ordered merge queue."
Nothing has been deleted yet. Keep going.
Scenario 2: the machine reboots mid-batch
Now the machine reboots. When it comes back up: task 1 already merged. Task 2 tried to merge and failed. Task 3 hadn't started yet. Task 4 was waiting in queue. Task 5 was halfway through executing when the power cut.
I walked it again:
- For the system to even know what state each task is in after a reboot, task state has to be on disk, not just in memory. Not a "state machine service" with its own server — just durable per-task state. (
task_id → status → checkpoint.) That's a column in a database, not a service. - Task 2 failed, but tasks 3-5 are downstream of it. The system has to recognize "upstream failed, downstream blocked" automatically. That's not a separate component — it's a query against the durable state.
- Task 5 was mid-execution when the power cut. When the machine restarts, what stops a second copy from picking it up and racing the half-finished one? Each execution attempt needs a unique token — whoever has the newest token is the live runner, everyone else gets fenced off.
Now two more things have surfaced:
- Durable per-task state (which AI called "state machine service" — but it's not a service, it's a table)
- Fence tokens to prevent zombie reruns
And here's the first thing that got deleted: distributed lock.
A distributed lock is "this resource is held by exactly one agent right now." Fence tokens solve the same problem in a much weaker, much cheaper way: "the latest token wins, all stale tokens are ignored." For agent merge work, that's sufficient. Distributed locks would be massive overkill for the actual scenario.
1 component deleted, 0 lines of code written.
Scenario 3: the ordering itself was wrong
This one wasn't in my original head-list. It only surfaced when I kept walking:
Five tasks ran. Three merged. Then it turns out the dependency order I gave the system was wrong — it should have been
1 → 3 → 2 → 4 → 5, not1 → 2 → 3 → 4 → 5. The three already-merged tasks need to be rolled back as a batch and replayed in the correct order.
This is a scenario most systems never plan for. Per-task rollback is common — undo one merge. Batch rollback with replay is rarer.
- Plain per-task
revertdoesn't work — you can't revert task 2 while leaving task 3 (which depends on task 2's wrong order) intact. - The whole batch has to roll back atomically.
- Then the system has to replay them in the new order, with all the graph artifacts (snapshots, indices, semantic projection, test results) re-derived per merge.
This is the component AI had not mentioned at all. It only surfaced because I walked a scenario nobody told me to walk.
Call it BatchMergeRuntime. It's the rarest kind of architectural decision: not "should we have it" but "do we even know we need it?" — and the answer, for most teams, is not until production.
What the architecture actually became
After walking three scenarios:
| Scenario | What it surfaced |
|---|---|
| 5 tasks with dependencies | Ordered merge queue |
| Machine reboots mid-batch | Durable task state + fence tokens |
| Dependency order was wrong | Batch rollback + replay runtime |
| All of the above untested | Test scenario matrix as P0.0 (highest priority) |
Three real components. The fourth — the test scenario matrix itself — is a meta-component: the dry-run scenarios I just walked became the first acceptance bar for every subsequent PR. Anything that ships has to survive these scenarios before merge.
AI's first design vs what scenarios required
| AI's first list | Reality after scenario walk |
|---|---|
| Message queue | ✅ Needed — but ordered, not FIFO |
| Distributed lock | ❌ Deleted — fence tokens are sufficient |
| State machine service | ✅ Needed — but as a table, not a service |
| Task scheduler | ❌ Deleted — the ordered queue is the scheduler |
| Monitoring bus | ❌ Deleted — each component emits its own events |
| (AI did not propose) | ✅ Batch rollback runtime — surfaced only by scenario 3 |
Net: 5 → 3 components, plus the one critical piece AI had missed entirely.
The win is not "I deleted 2 components." The win is I now know why each remaining component exists, which means I can explain it, scope it, and reject scope creep on it. That's the difference between a system you built and a system you understand.
The method, in 3 steps
❌ Don't: "Hey AI, design me a system that does X."
→ AI returns a plausible-looking inventory of components.
→ Half of them aren't real for your specific case.
✅ Do: Step 1. Write one concrete scenario yourself.
(Or: have AI write the scenario, you evaluate it.
Real numbers, real steps, with crashes,
failures, and orderings going wrong.)
Step 2. Walk the scenario through your design.
At each step, ask: "What does the system need here?"
Step 3. Aggregate "what's needed."
That's your minimal architecture.
Anything not in that list — delete.
That's it. Three steps. No architecture-pattern library required. The scenario does the work for you.
Why this works (and why it's hard to skip)
Three reasons:
1. AI optimizes for plausibility, not necessity. It lists components that sound right for this kind of system, drawing from its training data. It can't know which components are necessary for your specific scenario, because it doesn't see your scenario unless you walk it through.
2. Scenarios surface the negative space. A happy-path design is the union of every component someone might need. A scenario walk is the intersection of components someone definitely needs for that scenario. The intersection is always smaller — and more honest.
3. Scenarios surface what AI missed. The batch-rollback runtime wasn't on AI's list. It surfaced because scenario 3 was a state AI's training data didn't lean on. Whatever your system's weird state is — only your scenarios will find it.
The reason this method is hard to skip is that the pressure to just accept AI's design is enormous. The design looks complete. It uses real words. You feel productive saying "yes, build it." Walking a scenario feels like slowing down. It is. That's the whole point.
What's next in this series
This is part 2 of the AI Collaboration Survival Guide. The previous post was about making AI's claims about completed work auditable via a backlog database. The next ones, lining up:
| Pain | Coming up |
|---|---|
| AI edits one function, breaks 10 callers | Code graph + impact analysis |
| AI modifies code it shouldn't touch | Governance hints as the only authoring surface |
| What did AI even change this week? | Event ledger |
| Every session starts from zero | Project memory layer |
One pain per article. All built around the same open-source project, aming-claw.
About aming-claw
- GitHub: amingclawdev/aming-claw
- What it is: A shared workspace where you and your AI agent see the same dashboard. Backlog database, code graph, event ledger, governance hints — all queryable by AI through MCP.
- Why I'm writing this series: I keep running into the same kind of AI-collaboration pain. Each post fixes one of them. The fixes generalize beyond aming-claw — the scenario-walk method in this post is a 5-minute habit you can adopt in any project.
If the parallel-agent scenario sounded familiar, drop a comment with the architecture decision AI most recently tried to oversell you on — I'll work through it the same way in the comments. Free architectural review, basically. The repo also takes stars and they're free for you to give. 🌟
Part 2 of "AI Collaboration Survival Guide" — practical patterns for the messy reality of shipping with AI agents.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.