How to Debug 6 AI Agents Running Simultaneously
Something is broken. You have six agents running. You do not know which one caused it.
This is the debugging guide I wish I had when we started.
The Multi-Agent Debugging Problem
Single-agent debugging is familiar: read the error, trace the call, fix the issue.
Multi-agent debugging is different:
- The error may have happened in a previous wave
- The agent that failed may have already terminated
- The corrupted state may be downstream of the actual bug
- Multiple agents may have contributed to the failure
Standard debugging instincts break. You need new ones.
Step 1: Isolate the Wave
First question: which wave did this break in?
Every orchestrator tick should log:
- Wave number
- Agents dispatched
- Timestamp start/end
- Status per agent
If you do not have this log, build it before anything else. You cannot debug a wave you cannot identify.
Our heartbeat file format:
Wave 14 | 2026-04-14 14:23:11
Agents: Apollo, Athena, Hermes, Hephaestus
Apollo: COMPLETE (3 drafts saved)
Athena: COMPLETE (2 blockers cleared)
Hermes: BLOCKED (DNS pending)
Hephaestus: COMPLETE (deploy confirmed)
Next: Wave 15 dispatch
From this, Hermes BLOCKED is immediately visible. That is where you start.
Step 2: Read the Agent Session Log
Every god agent should maintain its own session log — a running record of what it did, what it found, what it returned.
When Hermes is blocked, open the agent session file and read backward from the last entry.
Critical rule: agent logs must include the actual API response, not just the status. "DNS verification failed" is useless. The raw error from the DNS provider is useful.
Enforce this in your agent prompts:
"Log the exact error message, not a summary of it."
Step 3: Check State Corruption First
Most multi-agent bugs are not logic bugs. They are state bugs.
Common patterns:
Stale context packet: The orchestrator dispatched a wave with outdated state. The agent acted on old information.
Fix: Orchestrator reads fresh state at wave dispatch time, not at session start.
Competing writes: Two agents wrote to the same file simultaneously. One overwrote the other.
Fix: Each agent owns its output path exclusively. Orchestrator merges, never agents.
Partial completion logged as complete: Agent reported COMPLETE but a subtask failed silently.
Fix: Agents must validate their own deliverables before reporting COMPLETE. Check file exists, API returned 200, etc.
Step 4: Reproduce in Isolation
Once you have identified the failing agent and the failing wave, reproduce the failure with that agent alone.
Give it the exact context packet from the failed wave. Run it solo. See if it fails the same way.
If it does: logic bug in the agent. Fix the agent prompt or the underlying tool.
If it does not: the failure was environmental — another agent state, a race condition, a network issue during the wave.
Step 5: The Three-Question Check
For every multi-agent bug, ask:
- Did the agent receive correct context? (state bug)
- Did the agent tool actually execute? (tool/API bug)
- Did the agent correctly interpret the result? (prompt bug)
Most bugs fall into one of these three. Identify the category, fix the category.
Tooling That Actually Helps
tmux session-per-agent
Each god agent runs in its own named tmux window. When debugging, split-screen the orchestrator log against the specific agent log.
tmux new-window -n apollo
tmux new-window -n athena
tmux new-window -n hermes
Visual separation makes it immediately obvious when an agent has gone quiet.
The heartbeat check
Orchestrator pings each agent every N seconds. If an agent misses two heartbeats, it is flagged automatically. No manual monitoring.
Structured completion reports
PAX Protocol status fields give machine-readable state across all agents at a glance:
Apollo: COMPLETE
Athena: COMPLETE
Hermes: BLOCKED — DNS propagation pending (ETA: 30min)
Hephaestus: COMPLETE
Vs. reading four separate prose summaries. Structured format wins every time.
The Failure Mode Taxonomy
After 30 days and hundreds of waves, here are the failures we hit most:
| Failure | Frequency | Cause |
|---|---|---|
| Agent BLOCKED on external dependency | High | DNS, API rate limits, credential expiry |
| Stale state in context packet | Medium | Orchestrator not refreshing at dispatch |
| Agent reports COMPLETE prematurely | Medium | No self-validation on deliverables |
| Context window overflow | Low | Too much history in agent session |
| Race condition on shared file | Low | Eliminated with exclusive output paths |
The Rule That Changed Everything
Never parrot stale logs. Verify APIs and credentials before reporting blockers.
This one rule eliminated an entire class of ghost bugs — situations where an agent would report failure based on a previous session error, not the current state.
Agents must test the actual thing, not assume from history.
When All Else Fails
Nuke the agent session, rebuild its state from the heartbeat log, and re-dispatch from the last successful wave.
This is why wave-level checkpointing exists. You never lose more than one wave of work.
Running a multi-agent system in production? Drop your hardest debugging story in the comments.
Top comments (0)