Claude Code stuck in a failure loop? I escaped with multi-agent.

I tried to improve benchmark scores on my open-source project with Claude Code. Failed miserably.

It would try approach A, hit a wall, switch to B, then circle back to A. A classic failure loop.

Too much data and work to fit in one agent's context.

And every time Auto Compact ran, earlier decisions and work got diluted.

So I switched to a multi-agent architecture.

Redesigned the entire task into 12 Tasks across 5 Phases.
Each Task is sized so one agent can complete it within its context window.

Sub-agents run in parallel or sequence depending on dependencies.

The key is knowledge transfer between agents.

For example, the triage optimization routine from Task 10 → Task 11 ran 5 iterations:
Round 1: Precision 27%, 110 false positives.
Round 2: Removed Y-overlap. Precision up to 36%, FP down to 72.
...
Round 5: Added image ratio conditions. Hit 100% Recall.
Each experiment result gets recorded for the next agent to read.
New agents read previous results and don't retry failed approaches.
When an agent runs a new experiment, I make the call on what to focus on next.

Results:

✅ The A-B-A failure loop disappeared.
✅ Table inference score improved from 0.49 to 0.93—nearly 2x.
✅ Beyond expectations: achieved #1 among open-source solutions.

Why did multi-agent work?

The bottleneck for today's AI agents isn't intelligence—it's memory.

Context windows are small. The longer they get, the more early decisions get buried.

You have to engineer both the size and structure of memory.
If Claude Code is stuck in a failure loop, run through this checklist.

📋 Checklist:

Size (task decomposition)
☐ Is each task focused on a single goal?
☐ Can one agent complete it within context limits?
☐ Are task dependencies clearly defined?

Structure (knowledge transfer)
☐ Are previous agent results documented?
☐ Are required specs/docs loaded into context?
☐ Can the agent self-verify completion criteria?

DEV Community

Claude Code stuck in a failure loop? I escaped with multi-agent.

Top comments (0)