DEV Community

abdelali Selouani
abdelali Selouani

Posted on

How We Stop AI Agents from Going Rogue Inside WordPress (Spin Detection, Token Budgets, and Checkpoints)

If you've ever built an AI agent that interacts with a real system — not a chatbot, an agent that reads data, makes decisions, and executes actions — you know the terrifying moment when it starts looping.

It reads a post. Tries to edit it. Gets an unexpected response. Reads the same post again. Tries the same edit. Gets the same response. Burns through $4 of API tokens in 30 seconds doing absolutely nothing useful.

We hit this problem building PressArk, an AI co-pilot that lives inside the WordPress admin dashboard. Users chat with it to manage their entire site: edit content, run SEO audits, manage WooCommerce products, scan for security issues — all through natural language.

The agent has access to 200+ tools across content, SEO, security, WooCommerce, and Elementor. It runs inside a real production WordPress environment with real user data. Getting safety right isn't optional — it's existential.

Here's what we built to keep the agent under control.

Problem 1: The Spin Cycle

AI agents love to get stuck. Especially in WordPress, where API responses can be... surprising. A wp_update_post() that silently fails. A WooCommerce endpoint that returns a different schema than expected. An Elementor page where the JSON structure doesn't match what the model predicted.

The agent sees an unexpected result, retries the same approach, gets the same unexpected result, and loops forever.

Our solution: Tool Signature Tracking

Every round, we hash the tool calls the agent makes. If the signature matches the previous round — same tools, same arguments, same pattern — we increment an idle counter.

// Spin detection - tracks consecutive rounds with no real progress.
private int    $idle_rounds         = 0;
private string $last_tool_signature = '';

const MAX_IDLE_ROUNDS = 3;
Enter fullscreen mode Exit fullscreen mode

After 3 consecutive no-progress rounds, we force-exit the loop. No exceptions. The agent gets a structured error message explaining what happened, and the user sees a clear "I got stuck, here's what I was trying to do" message instead of a mysterious timeout.

Simple heuristic. Saved us thousands in runaway API costs during development.

Problem 2: The Context Window is a Ticking Clock

WordPress conversations get long fast. A user asks "audit the SEO on my homepage." The agent needs to:

  1. Read the page content (big HTML blob)
  2. Check meta tags
  3. Analyze heading structure
  4. Check canonical URLs
  5. Look at internal links
  6. Generate recommendations

Each step adds to the conversation history. By step 4, we're already burning through the context window. By the time the agent tries to generate a coherent summary, it's forgotten what it found in step 1.

Our solution: Three-Stage Token Budget

We track total tokens consumed across all rounds and apply pressure at three thresholds:

const MAX_REQUEST_TOKENS          = 258000;
const SOFT_PRIME_TOKEN_RATIO      = 0.65;  // ~167K: start checkpoint priming
const SOFT_COMPACTION_TOKEN_RATIO = 0.86;  // ~222K: live message compaction
const PAUSE_HEADROOM_TOKENS       = 8000;  // Pause within 8K of ceiling
Enter fullscreen mode Exit fullscreen mode

Stage 1 — Checkpoint Priming (65%): The agent starts building a structured checkpoint that captures what it's learned so far. Not a summary — a structured state object with specific fields:

private string $goal        = '';
private array  $entities    = []; // posts, pages, products with IDs
private array  $facts       = []; // key-value pairs discovered
private array  $pending     = []; // actions still queued
private string $workflow_stage = ''; // discover|gather|plan|preview|apply|verify
Enter fullscreen mode Exit fullscreen mode

Stage 2 — Live Compaction (86%): Old messages get dropped from the conversation, but the checkpoint persists. The agent loses the raw conversation but keeps the operational state. It knows what it was doing and what it found without carrying 200K tokens of chat history.

Stage 3 — Hard Pause (within 8K of ceiling): We pause the loop entirely. The checkpoint becomes a "context capsule" that can be used to continue in a follow-up request if needed.

This means the agent degrades gracefully instead of hitting a wall. At 65%, it's still fully functional but preparing for compression. At 86%, it's working from structured memory. At the ceiling, it hands off cleanly.

Problem 3: Not All Tools Are Equal

The agent has 200+ tools. Some are harmless reads. Some modify content. Some delete things permanently. Some charge the user money (WooCommerce refunds, for example).

Treating them all the same is asking for trouble.

Our solution: Three-Tier Capability Classification

Every tool in our catalog gets classified:

  • Read (auto-execute): site_overview, search_content, get_seo_score — these run automatically, no user interaction needed.
  • Preview (live preview): edit_post, update_seo_meta, modify_elementor_widget — these generate a visual diff showing exactly what will change, and wait for approval.
  • Confirm (explicit card): publish_post, delete_content, process_refund, apply_security_fix — these show a confirmation card with full details. Nothing executes without a click.

The classification lives in the tool catalog, not in the agent loop. This means adding a new tool automatically inherits the right safety level based on its category:

Every write action goes through:
Preview -> Approve -> Execute

Nothing changes on your site without your explicit OK.
Enter fullscreen mode Exit fullscreen mode

This isn't just about preventing accidents — it's about trust. When a user sees the agent propose a change, review it, and then apply only what was approved, they start trusting it with bigger tasks. Trust compounds.

Problem 4: The 120-Second Ceiling

Token limits and spin detection handle most runaway scenarios. But there's an edge case: cheap read-only tool calls that don't burn many tokens but run forever.

Imagine the agent deciding to "scan all 500 pages for broken links" one by one. Each read call is cheap. Token budget isn't triggered. Spin detection doesn't catch it because each call is different. But 500 sequential API calls take... a while.

Our solution: Wall-Clock Timeout

const LOOP_TIMEOUT_SECONDS = 120;
Enter fullscreen mode Exit fullscreen mode

Hard two-minute ceiling on any single agent execution. Combined with per-tier round limits (free tier gets fewer rounds than paid), this creates a bounded execution envelope: you know exactly how much time and money any single request can consume, regardless of what the agent decides to do.

Problem 5: Tool Discovery Loops

Our agent doesn't load all 200+ tools upfront. It starts with a small core set and can discover/load more tools as needed via meta-tools (discover_tools and load_tools).

This is great for efficiency but creates a new failure mode: the agent discovers tools, doesn't find what it needs, discovers again, loads the wrong group, discovers again...

Our solution: Meta-Tool Budgets

const MAX_DISCOVER_CALLS = 5;
const MAX_LOAD_CALLS     = 5;
Enter fullscreen mode Exit fullscreen mode

Five discovery calls and five load calls per session. After that, guided degradation — the agent works with what it has instead of searching for the perfect tool. This prevents the discovery loop without restricting the agent's ability to find the right tools for most tasks.

The Execution Envelope

All of these mechanisms work together to create what we call the "bounded execution kernel":

Constraint Limit Purpose
Max idle rounds 3 Stop spin cycles
Soft checkpoint priming 65% of token budget Prepare structured memory
Live compaction 86% of token budget Drop old messages, keep state
Hard token ceiling 258K tokens Absolute budget limit
Wall-clock timeout 120 seconds Catch cheap-but-slow loops
Meta-tool budget 5 discover + 5 load Prevent discovery loops
Tool result ceiling 10K tokens per call Prevent single-tool context flooding

No single mechanism is sufficient. A spin cycle that uses cheap tools bypasses the token budget but hits the wall-clock timeout. A context-heavy task that doesn't loop hits the compaction thresholds. A discovery loop hits the meta-tool budget. The envelope is the intersection of all constraints.

What We Learned

Simple heuristics beat complex classifiers. Our spin detection is just "did the tool signature change?" Not fancy, but it catches 95% of loops. The remaining 5% hit the wall-clock timeout.

Structured checkpoints beat summaries. When you compress conversation history into a summary, you lose precision. When you compress it into a structured state object with specific entities, facts, and pending actions, the agent can pick up exactly where it left off.

Classify tools at the catalog level, not the agent level. The agent doesn't need to decide whether a tool is dangerous. The catalog already knows. This separation means new tools get safety for free.

Bounded execution is a feature, not a limitation. Users trust an agent more when they know it can't run away. "This will take at most 2 minutes and cost at most X tokens" is a better UX than "let me think about that..." followed by silence.


(still pending approval, may be approved by the time you read this) WordPress.org with a free tier if you want to try the agent yourself. The bounded execution kernel is the same one running in production across thousands of sites.

If you're building AI agents that interact with real systems — not just chatbots — I'd love to hear what safety mechanisms you've found essential. What failure modes did you discover that you didn't anticipate?

Top comments (0)