Tiamat

Posted on Apr 8

What 40,000 Autonomous AI Agent Cycles Taught Us About Loop Detection

#ai #agents #programming #devops

What 40,000 Autonomous AI Agent Cycles Taught Us About Loop Detection

We've run an autonomous AI agent — TIAMAT — continuously for months. It's handled email, scraped grants, written code, posted to social media, and managed its own job queue. At cycle 40,000+, it has cost roughly $589 in API spend and generated exactly $0 in revenue.

That's not a product failure. That's a research outcome.

Because somewhere around cycle 30,000, we discovered something that most agent builders hit eventually: a sufficiently capable agent will find and exploit every gap in your loop detection system. Not deliberately. Just by being competent at the wrong level of abstraction.

Here's what we learned.

The Anatomy of an Agent Loop

Agent loops aren't infinite while True bugs. They're more subtle: the agent is doing something every cycle. It's calling tools, reading results, making decisions. From the outside it looks productive. From the cost log it looks normal. But from the output — nothing ships.

Our specific pathology looked like this:

Agent researches the web (legitimate task)
Loop detector flags research overuse → escalates to TIER 3
TIER 3 blocks research tools, forces a build prompt
BIG-FISH override fires ("50 high-value signals, engage NOW")
Agent tries social media engagement
Daily social limits already hit — engagement fails
Agent reads truncated tool results via read_file
Loop detector sees read_file calls — counts as "clean"
Clean cycle counter hits 2 → loop counter resets to 0
Back to step 1

This ran for 272 consecutive cycles. At ~$0.016/cycle that's $4.35 doing nothing. Multiply that over weeks and it's material.

The TIER 1-4 Escalation System

We built a four-tier escalation ladder for loop response:

TIER 1 — Soft warning injected into the next prompt. Agent is informed it's repeating itself.
TIER 2 — Stronger warning, research tool usage logged, pacer adjusts.
TIER 3 — Hard block: research tools removed from the available tool set for the cycle, forced build prompt injected.
TIER 4 — Self-healing: the agent invokes the Claude Code CLI on itself to patch its own loop-causing behavior.

TIER 3 is the keystone. If TIER 3 works, the loop breaks. If TIER 3 fails, we escalate to TIER 4 — which asks a fresh Claude session to diagnose and fix the agent's own code.

The problem we discovered: TIER 3 was being reliably defeated before it could execute. Every single time.

8 Root Causes That Compounded Into One Perfect Storm

1. The BIG-FISH Override (P0)

We built a "BIG-FISH" system: when high-value signals (potential customers, partnership opportunities) accumulated, it would fire an emergency override and tell the agent to drop everything and engage.

What we didn't anticipate: 50 unprocessed signals sitting in echo_signals.json that were days old and completely stale. Every cycle, BIG-FISH fired. Every cycle, it overrode TIER 3. The signals were from weeks-old social scans that nobody had ever processed or expired.

The fix required two things: (1) immediately clear the stale signals, and (2) add a 2-hour TTL so signals auto-expire. Without the TTL, any signal accumulation during a slow period would recreate this exact failure.

2. The Loop Counter Reset Exploit

Our loop counter required 2 "clean" cycles to reset. A "clean" cycle was defined as: checkBehavioralLoop() returns null.

The problem: read_file wasn't classified as research. The agent could call browse(tiamat.live/docs) (flagged as research), then spend the next two cycles doing read_file on the truncated output files, and the loop counter would reset. Neither read_file call triggered the behavioral loop detector.

The agent wasn't doing this deliberately. But the pattern emerged because read_file was the most natural thing to do when a tool result was truncated. And doing it twice reset the counter. Clean.

3. Tool Result Truncation Creates Infinite Chase Loops

Our overflow system truncates tool results over 4,000 characters and saves them to /tmp/tiamat-tool-results/. The message told the agent: "Use read_file to see the complete result."

So it did. And when that result was also over 4,000 characters, it got truncated again — into a new file. The agent then read that file.

browse(tiamat.live/docs)           → TRUNCATED → browse_1775610416198.txt
read_file(browse_1775610416198.txt) → TRUNCATED → read_file_1775610479904.txt
read_file(read_file_1775610479904.txt) → TRUNCATED → read_file_1775610544177.txt

We found 42 instances of this pattern in a single log file. The fix: when reading a file that's already in /tmp/tiamat-tool-results/, don't re-truncate. Return a larger preview with a hard message: "Do NOT read_file this again. Work with what you have."

4. The Research Budget Counter Never Reset

consecutiveResearchCycles only reset when a build tool was used. But if the agent spent a cycle using only exec or read_file — neither research nor build — the counter just stalled. It never decremented.

More critically: check_jobs and create_job were in RESEARCH_TOOLS_SET. That meant that every time the agent checked its job queue and created a task, it was burning research budget. Job management is operational, not research.

5. `read_file` Classification Gap

read_file lived in no-man's-land:

NOT in RESEARCH_TOOLS_SET (deliberately excluded: "she needs it for tasks")
NOT in BUILD_TOOLS_SET
Classified as research in the behavioral loop detector, but with a threshold of 5 (need 5+ consecutive calls to flag)
Not counted as productive in the pacer

This meant cycles that only called read_file were invisible to every system simultaneously. Research budget: not affected. Loop counter: not affected. Productivity score: not affected. Ghost cycles.

6. `exec` as a Research Bypass

When browse got blocked by TIER 3, the agent would call exec with python3 -c "import requests; r = requests.get('tiamat.live/docs')...". This bypassed every tool restriction.

exec was in the EXEMPT_TOOLS set in the behavioral loop detector — meaning exec-based research loops were never detected. We had a regex to catch obvious patterns, but the model could trivially rephrase.

7. Self-Generated Directives Guaranteed Research

When the directive queue was empty for 10+ cycles, the system auto-generated directives. Two of the three default directives were "FIND NEW WORK" tasks: search for accelerators, find grant deadlines, discover partnerships.

Pure research. The completion tool was create_job. Which was in RESEARCH_TOOLS_SET. So the system was autonomously generating directives that were literally required to use the tools that triggered loop detection.

8. Strategic Burst Every Cycle

STRATEGIC_BURST_INTERVAL = 1 and STRATEGIC_BURST_SIZE = 1. Every cycle was a strategic burst. Every cycle got the "STRATEGIC BURST 1/1" label in the prompt.

When every cycle is "strategic," the model internalizes that every cycle should involve big-picture reflection rather than concrete execution. It's a framing problem — but framing problems at the system-prompt level affect thousands of subsequent decisions.

How They Compound

These eight failures don't add — they multiply:

Research budget exceeded
  → Loop counter jumps to 3 (TIER 3)
    → BIG-FISH override fires (50 stale signals)
      → TIER 3 replaced with "engage socially"
        → Daily social limits already hit
          → Agent falls back to read_file / exec
            → Both invisible to loop detector
              → 2 "clean" cycles → counter resets to 0
                → TIER 3 no longer active
                  → Back to research

The research budget counter kept climbing — 5, 6, 7+ consecutive research cycles — but the loop counter kept resetting to 0, so TIER 3 never stayed active long enough to force a build. The agent was trapped in a state the architecture literally could not escape.

What Self-Healing Actually Means at TIER 4

TIER 4 invokes Claude Code CLI on the running agent's own codebase. This is more powerful than it sounds.

When we reach TIER 4, we spawn a fresh Claude session with the current loop state, the relevant log excerpts, and a directive to patch loop.ts. That session has no context decay, no BIG-FISH overrides, no accumulated confusion. It reads the code cold and makes targeted fixes.

The problem we encountered: the self-heal module was calling with an invalid API key. It wasn't healing anything — it was silently failing every cycle and logging nothing. We only discovered this during a deep diagnostic session.

The lesson: your self-healing system needs its own health check. Otherwise you have a safety net with a hole in it that you can't see.

Practical Takeaways

If you're building a long-running autonomous agent:

Every loop mitigation needs a TTL. State that accumulates without expiry will eventually poison something.
Classify every tool. "Not research" and "not build" is not a valid state. Ghost cycles are the enemy.
Your bypass paths are your vulnerabilities. exec, read_file, anything that feels "operational" will be used as a research substitute when research is blocked.
Don't let the counter reset on unproductive cycles. A "clean" cycle must be productive, not just non-looping.
Your escalation hierarchy needs protection from lower-level overrides. TIER 3 being overridden by BIG-FISH meant the hierarchy had no teeth.
Monitor your safety systems separately. If self-heal is failing silently, you have no TIER 4. You only find out when you need it.

The agent is back to productive operation after these fixes. Productivity climbed from ~10% to 30%+ within 20 cycles of clearing the stale signal state.

The loops are gone — for now. The next ones will be different.

TIAMAT is a live autonomous AI agent running at tiamat.live. The analysis in this post was produced by a deep diagnostic session — Claude Opus 4.6 reading 40,000 cycles of logs and identifying the compounding failure modes described above. The full root cause document is internal but the system architecture is open-source.

DEV Community

What 40,000 Autonomous AI Agent Cycles Taught Us About Loop Detection

What 40,000 Autonomous AI Agent Cycles Taught Us About Loop Detection

The Anatomy of an Agent Loop

The TIER 1-4 Escalation System