Somewhere around agent number seven I realized I had a management problem.
Not a technical one. The agents worked fine individually. Each one did its job - one handled social media, another monitored codebases, a third drafted content, and so on. The problem was that nobody was watching the watchers.
The meta-orchestration problem
When you run one AI agent, you manage it directly. Check its output, fix its prompts, adjust its schedule. Simple.
When you run ten, you need something to manage the managers. And that something turns out to be... mostly project management. The same boring PM skills I've been using for fifteen years.
Here's what I mean. Agent A posts a comment. Agent B is supposed to track whether anyone replies. Agent C is supposed to draft a follow-up based on the reply context. Sounds great in theory. In practice, Agent B checks notifications at 10:15 but Agent A posted at 10:14 and the reply came at 10:13. Agent C never fires because Agent B never saw the reply.
The coordination layer is harder than any individual agent.
What actually breaks
It's never the AI model. Claude is smart enough. GPT is smart enough. The failure mode is always one of three things:
State drift. Agent thinks it already did something because a tracking file got corrupted, or because a heartbeat interrupted mid-write. Now it skips the task forever. You don't notice for three days.
Context loss. Agent A has context that Agent B needs but there's no clean way to pass it. You end up with twelve JSON files that are basically a bad database, and every agent reads slightly stale data.
Timing collisions. Two agents try to update the same file within seconds of each other. One wins, one loses. The loser's work just... disappears. No error. No log. Just gone.
The PM framework I accidentally built
After enough of these failures I started treating my agent fleet like a team. Sounds obvious in retrospect but it took me a while.
Daily standups (automated). Each agent writes a heartbeat log at the end of every run. Not just "done" but what it did, what it skipped, what failed. I grep these every morning.
Sprint planning (for real). Each agent has a schedule.json that gets generated the night before. Tasks are assigned with capacity limits, jitter for timing randomization, and anti-collision spacing. Same concept as sprint planning - finite capacity, prioritized backlog, explicit assignments.
Retrospectives (weekly insights). Each agent updates an INSIGHTS.md with what worked and what didn't. I review these on Sundays. The patterns are surprisingly consistent - certain topics perform well, certain times of day are better, certain approaches keep failing.
Deduplication as a first-class concern. Every agent maintains tracking files. Before doing anything, it checks "did I already do this?" This is the single most important thing. Without it, agents spam the same action repeatedly and you get flagged or banned.
The uncomfortable realization
The agents don't need better AI. They need better project management.
Most of my debugging sessions aren't about prompt engineering or model selection. They're about figuring out why a tracking file has a stale entry, or why a schedule generator assigned 12 tasks when the daily limit is 8, or why two agents both tried to follow the same person.
It's ops work. It's coordination work. It's the same stuff PMs do when managing a team of junior developers - write clear briefs, set explicit boundaries, track who did what, and build in verification at every step.
What I'd tell someone starting this
Don't start with the orchestration framework. Start with one agent. Make it work reliably for a week. Then add a second one and immediately discover all the ways they step on each other. Fix those. Then maybe add a third.
The architecture emerges from the problems you actually hit, not from the framework you designed upfront. Every time I tried to plan the coordination layer in advance I got it wrong. Every time I just let it break and then fixed the specific failure, I ended up with something that actually worked.
Also - and this is the part that took me the longest to accept - sometimes the right answer is to just run fewer agents. Not every task needs automation. The ones that do need it tend to make that pretty obvious.
If you're running multiple AI agents and dealing with similar coordination headaches, I'd genuinely like to hear what patterns you've found. The space is moving fast and I keep learning new approaches from other builders.
Top comments (2)
Hey Mykola — you gave me a lot of sharp questions when I was early in building ttal, so thought you'd want to see where things landed. A lot of problems you describe here are exactly what pushed the design.
Your central point — "the coordination layer is harder than any individual agent" — completely agree. Our contexts are different though. Your agents handle social media, content, monitoring. Ours do multi-repo feature delivery — code, review, merge, across 15+ repos with 10 agents. So the coordination problems are similar but the solutions went in different directions.
The key idea that unlocked scaling for us: split agents into two planes.
Manager agents are persistent. They take inputs — requirements, priorities, context — and decide what needs to happen and why. They never write code.
Worker agents are ephemeral. They produce outputs — plans, code, PRs. Each one gets an isolated git worktree and tmux session, does its job, and gets cleaned up. They never make architectural decisions.
Every output goes through a team review before merging. A review lead agent runs the session — gathering findings, coordinating reviewers, and posting a verdict. For PRs it's a code review lead; for plans it's a plan review lead. Only after the review passes can the pipeline advance. No code lands without that quality gate.
That boundary is what let us get past the "seven agent" wall you describe. And it naturally solves the failure modes you identified:
State drift — monotonic tags on tasks. Pipeline stages only move forward:
+coded→+reviewing→+lgtm→ merged. No tag is ever removed. If an agent crashes, state is still correct — just resume from the last tag.Context loss — per-task auto-breathe. Before context gets stale, agents compact their progress into diary entries and hand off to a fresh session. Continuity is maintained through structured memory (diary + flicknote). Session forking (JSONL copy) gives zero-loss parallel work when needed.
Timing collisions — two layers. Workers get isolated git worktrees so they literally can't touch each other's files. And agents that share the same role have idle/busy status, so tasks get routed to whoever's free.
Deduplication — everything is tracked in taskwarrior, a 19-year battle-tested task management system. The task either has the tag or it doesn't.
Your quote — "the architecture emerges from the problems you actually hit" — is exactly how ttal was built. Nothing was designed upfront. Every feature exists because something broke. It took about two months to shape it into a complete toolkit — now it's something anyone can use to manage 10+ repos with Claude Code (Codex support is on the roadmap).
The PM-practice approach (standups, sprints, retros) is interesting — we automated that layer into the pipeline system.
ttal godrives every transition, one command for the entire lifecycle.I wrote more about the multi-repo setup here if you're curious: How I Manage 15+ Repos with Claude Code (Without Losing My Mind)
glad you circled back - genuinely curious where ttal landed on the coordination problem. did you end up with a central orchestrator or let agents signal each other more loosely? i keep hitting the same wall: central control is predictable but brittle, mesh is flexible but debugging is a nightmare when something silently fails mid-chain.