The problem with one agent doing everything
I've been coding with AI assistants for over three years now. The pattern is always the same: you give the agent a meaty task, it starts brilliantly, and somewhere around the 40% mark it starts drifting. It forgets constraints from earlier in the conversation. It makes architectural decisions mid-implementation that contradict the plan. It mixes "what should we build" with "how do we build it" until neither is done well.
This isn't a model quality issue. I've seen it with every generation. The root cause is that planning and implementation are fundamentally different cognitive modes, and cramming both into one context window degrades both.
The fix: two roles, one loop
I started doing something simple: one AI conversation to plan (write specs, break work into tickets with acceptance criteria), and a separate conversation to implement (pick up a ticket, write the code, nothing else). The planner never touches code. The coder never makes architectural decisions.
The improvement was immediate. Specs got sharper because the planning agent wasn't distracted by implementation details. Code got cleaner because the coding agent had a clear, bounded task instead of an open-ended goal.
But managing this manually was tedious. Copy-pasting tickets between conversations, keeping track of what's done, making sure the coder's output actually met the acceptance criteria -- it was a lot of overhead.
Automating the loop
So I built cestDone, a CLI that orchestrates this Director/Coder cycle:
- The Director reads your codebase and your goal, then produces a set of tickets with clear acceptance criteria
- The Coder picks up tickets one at a time and implements them
- The orchestrator manages the handoff, tracks status, and keeps the loop going
Each agent stays in its lane. The Director's context is filled with architecture, requirements, and codebase structure. The Coder's context is filled with the specific ticket and the relevant code files. Neither is polluted with the other's concerns.
It also has a daemon mode with a built-in scheduler -- cron schedules, webhook listeners, and pollers -- so you can trigger the cycle automatically against incoming issues without babysitting it.
What I learned
The Director is the harder role to get right. A bad spec produces bad code no matter how good the Coder is. I spent more time tuning the Director's prompts and context than anything else. The key was giving it enough codebase context to write realistic tickets, but not so much that it lost focus on the goal.
Acceptance criteria are everything. Vague tickets like "improve the login flow" produce vague implementations. Tickets like "add rate limiting to POST /auth/login: max 5 attempts per IP per 15-minute window, return 429 with Retry-After header" produce working code on the first pass.
Small tickets beat big tickets. When tickets take more than 15-20 minutes of AI coding time, quality drops. Better to have 10 small tickets than 3 large ones.
Why not a bigger framework?
There are plenty of multi-agent orchestrators out there with dashboards, parallel worktrees, and complex infrastructure. cestDone is intentionally minimal. The separation of planning and implementation was the single change that made the biggest difference in my workflow, and I wanted a tool that did just that without the overhead.
Open source, MIT licensed, Node.js: github.com/olkano/cestDone
I'd be curious to hear if others have landed on similar patterns.
Top comments (0)