Piotr Zygadlo

Posted on Mar 2

I stopped trusting my coding agents, so I built a system to not trust them

#ai #agents #devtools #productivity

As it happens, the last year for me was just big projects: Orome.ai, Relayn.sh. Multiple elements, shifting requirements, ever-growing codebases. Even having multiple layers of documentation did not help. I kept losing context, agents kept losing context, and at some point I started wondering if any of us actually knew what we were doing.

(Spoiler: the agents did not.)

This is about what I built to manage that, and why most of it exists specifically because agents will misbehave the moment you give them an opportunity.

History: tools, sprints, and learning the hard way

About a year ago I started to take AI-assisted programming seriously. The boost from Cursor and Claude was immediate. Windsurf appeared and then disappeared. I tried to always use "the best tool right now," and I lost an unreasonable amount of time re-calibrating to new tools, re-explaining my codebase, rebuilding configuration files.

Lesson one: stop chasing the best tool. Pick one. Get good at it. (I stayed with Cursor. All of this is about what I do with Cursor.)

I started running long sessions. Non-stop coding pushing twenty hours. Multiple tabs open, many agents running in parallel on different parts of the codebase. My take-out budget did not appreciate this. The codebase did. Sort of.

The next obvious upgrade: MCP memory. I hooked it in as an additional layer on top of local documentation. The agents used it maybe 30% of the time, when they felt like it. Great feature in theory. Agents kept forgetting it existed. I kept reminding them in CLAUDE.md. We were not making progress together on this one.

The CLI leap and the scale problem

Then Claude Code appeared. Then Cursor shipped its own CLI agent: much more responsive, much easier to control than IDE tabs. I moved most of my work there.

The CLI unlocked something real: I could spin a dedicated, remote dev machine with whatever memory I needed, keep all sessions live at the same time. No more fighting my laptop. Finally, no limit.

Then came the actual limit: context switching.

Twenty agents running in parallel sounds like a dream. In practice it means twenty things competing for your attention, and human context switching is exactly as bad as everyone says. I went from 20 agents to 8, then to 4. Four is manageable. Twenty means you are the bottleneck, which you are.

Duck is born: the prompt-as-document idea

I read an article on Hacker News. I cannot find it anymore to reference properly, which is exactly the kind of thing that happens when you are running too many things at once.

The idea: one markdown document serves as both the prompt for the agent and the communication channel between sessions. The agent reads it, writes its findings back into it, and the next session picks up exactly where it left off. No shared memory needed. No transcript parsing. The document is the memory.

That aligned with what I was already thinking about. I started building around it.

I call it Duck, as in Rubber Duck. You talk to it, it does not talk back in the way you expect, but having to explain the problem clearly enough for it to understand actually helps. (The bad joke budget is limited. I will spend it carefully.)

The last upgrade: a system, not a prompt

The Hacker News idea was the seed. What grew from it was something more structured.

I needed something that survived connectivity issues. Not everyone works with fiber optic under every chair; some of us run agents on a development machine in another region and the SSH session drops at 2 AM. I needed tmux. One session per project, one window per agent. I needed a research-to-implementation pipeline that could pause, resume, and not lose its mind between restarts.

The current system is called projects_control. It follows a two-directory model:

Directory	Contents	Who works here
`projects_control/<project>/`	Prompts, context, system config, session history	Human and orchestration scripts
`/home/ubuntu/projects/<project>/`	Actual codebase	Agents

These two directories never mix. Agents never work inside projects_control. Control artifacts never live in the codebase. This sounds obvious until you watch an agent helpfully create a planning document inside your src/ folder.

The system registers projects in a SQLite registry. Each project has a context.md (project architecture, credential references, current state) and a system.md (infrastructure config, MCP service list). Every prompt run syncs the agent's MCP config from system.md before it starts. Per-project MCP: the agent only sees the services that project actually needs.

Prompt types and their critics:

Type	What the actor does	Critics
research	Investigate, produce findings	completeness, sourcing, conclusion_validity, actionability
plan	Produce implementation tasklist	architecture, contracts, backward_compatibility, dependencies, actionability, notation
implement	Write code	plan_compliance, correctness, code_quality

Three types, three separate actors, three sets of critics. Research agents do not implement. Plan agents do not implement. Implement agents do not plan. In theory.

The critic loop

The runner for each prompt type is a bash script. Not an agent. Not an orchestrator with opinions. A bash script.

Here is what it does, in order:

Step	What runs	Model	Output
Prompt gate	Checks if prompt is ready: clear task, bounded scope, proper structure	composer-1.5	`GATE_VERDICT: pass/fail`
Actor	Reads prompt, writes Session Report	sonnet-4.6-thinking	Findings / plan / code changes
length_gate	Checks Session Report is not empty or suspiciously short	composer-1.5	`VERDICT_length_gate: ok/fix`
Critics (sequential)	Type-specific evaluation	composer-1.5	`VERDICT_<name>: ok/fix`
inject_feedback	Appends Critics Feedback section to the prompt file	bash	Prompt file updated in place
Repeat	Up to MAX_ITERATIONS=6	--	Final pass or cap notification

The actor writes its output into the ## Session Report section of the prompt file. Critics read the same file and write verdict lines. inject_feedback appends their feedback as a new numbered section. The actor runs again, sees the feedback, improves. Up to six times. After six the runner sends a failure notification and stops.

The prompt file accumulates everything: actor output, critic feedback, iteration history. After six iterations it is a complete record of what happened. No need to dig through logs.

Each session is fully specified by two files: a context file (shared across all prompts in a project) and a prompt file (specific to one task). "The prompt file works as both an execution instruction and a final document." That phrase came out of documentation I wrote for someone else. It is accurate.

Guardrails: do not trust agents to not implement things

Here is the part I am not proud of. I had to build seven separate layers of protection to stop research and plan agents from implementing code.

Seven.

The previous system had one: a --mode plan CLI flag that was supposed to restrict the agent to read-only. The agent respected it sometimes.

Guardrail	What it does	Failure mode prevented
`--mode plan` CLI flag	Sets read-only file access	Basic rogue writes
Worktree for research/plan	Agent works in a temp git branch, discarded on exit	Rogue edits contaminating main branch
No-implement instruction in actor prompt	Explicitly tells actor: do not edit codebase files	Agent creating files during "analysis"
Framing note in all plan critics	Critics phrase feedback as doc corrections, not step-execution commands	Critics accidentally triggering implementation
inject_feedback reminder	After every feedback round: "update Session Report only, do not implement"	Actors implementing in response to critic feedback
Prompt gate	Before actor starts: checks task is clear, scoped, and structured	Vague prompts leading to improvised scope expansion
check_readonly_guard downgrade	Worktree active: skip the old guard. Overlap case: log warning only	Brittle revert behaviour causing partial write issues

The worktree approach is the honest solution. If the agent implements code in the worktree during a research session, the worktree is discarded on exit. No git checkout, no partial revert, no contaminated history. Containment instead of cleanup.

The no-implement instruction had to appear in three places: the actor prompt, the critic framing note, and the inject_feedback reminder. One place was not enough. Two was also not enough. I am not complaining. I am documenting. Or am I? ;) Buy me a glass of good red and I will tell you a sad little story how routine database backup ended up with one famous model shrinking database by 95% because it was too large to fit in proxy timeout window.

We had to explicitly tell research agents not to implement code. Repeatedly. This is the correct level of trust to extend to an agent on a large project.

The prompt file as artifact

The most useful thing in this system is not the critic loop or the guardrails. It is the prompt file itself.

Agents have no memory between sessions. The prompt file does. This is not a limitation. It is a design choice.

Building a complete history across many sessions without requiring any of them to have memory of each other: that is the goal. The prompt file accumulates actor output and critic feedback across every iteration. The human reads the same file to understand what happened. Context is pre-packed at design time: context.md plus the prompt file give the agent everything it needs to run without any external state.

No memory services required. No transcript parsing. The document is the memory.

What is actually hard

The human prompt.

If you want the system to work, every prompt has to be precise. You are planning structure, thinking about external systems, anticipating edge cases, security, optimisation. You are articulating an entire implementation in a couple of minutes. Do that for several tasks in quick succession and your brain starts to burn. Sometimes after ten minutes of back-to-back planning I cannot think straight.

But that is the job now. The system took away the busy work. What is left is the hard thinking. Token usage dropped by at least 75%. Sessions are compact. Even on a bad day nothing sprawls, because the discipline is in the structure, not in my attention span. The trade-off is that the work that remains is dense.

When a task outgrows what one prompt can carry, I bring in Shotgun -- a tool built by a befriended team -- and I have a second pair of cyber-eyes for big refactoring or entangling a new feature through legacy code. Between the critic loop and an external review tool, the system scales further than I expected.

Conclusion

I stopped trusting my coding agents. So I built a system to not trust them.

Agents are good at the task you give them, when you give them exactly that task and nothing else, run a critic to check they did it right, and remove every possible way for them to expand scope on their own. That is not a limitation. That is the entire design.

Bash scripts, prompt files, a SQLite registry. Not glamorous. It works because it does not pretend agents are collaborators. They are executors. The system around them does the collaborating.

Four agents. Clear prompts. Dumb runners. Critics that push back. One human who does the hard thinking.

That is the stack.

DEV Community