Three-Layer Safety for Autonomous Agents: Stopping the Infinite Loop

#aiagents #llmops #mcpservers #ollama

I watched an autonomous agent spend three hours and 40,000 tokens trying to close a GitHub issue that had an open dependency, only to fail because it kept hallucinating a force_close flag that didn't exist in the API. It didn't just fail; it entered a perfect infinite loop: it would call the tool, get a 400 error, interpret the error as a "temporary network glitch," and try again with the exact same payload.

If you've built agents that actually touch production systems, you know this feeling. Prompting the agent to "be careful" or "follow the schema" is a placebo. When you move from a chat window to an autonomous loop, the gap between the LLM's intent and the system's reality becomes a canyon where agents go to die (and burn through your API credits).

For anyone running agent orchestration in a homelab or production environment, you need a safety architecture that doesn't rely on the model's "good behavior." I've moved to a three-layer safety model: Token-Level Enforcement, Pre-Execution Gates, and Execution Isolation.

What I tried first

My first instinct was to lean heavily on PydanticAI. The idea of using Pydantic for type-safe tool calling seemed like the silver bullet. I spent a week building out complex schemas, thinking that if the code validated the output, the agent would simply "learn" to provide the correct format.

I was wrong. I hit a wall where the agent would produce a JSON object that was almost correct, but it would miss a closing brace or add a trailing comma. Pydantic would throw a ValidationError, the agent would see that error in its history, and then it would attempt to "fix" the JSON by adding even more commentary around the code block. This created a feedback loop of ValidationError $\rightarrow$ Apology $\rightarrow$ Broken JSON.

Then I tried adding a "supervisor" agent to review the actions of the "worker" agent. This just doubled my latency and doubled my token cost without actually solving the root cause. The supervisor often hallucinated the same API capabilities as the worker because they were using the same base model.

The real problem wasn't the logic; it was the lack of deterministic boundaries. I was treating the LLM as a reliable software component when it's actually a probabilistic engine. To make it safe, I had to stop trying to "convince" the model to be safe and start forcing it to be safe at the infrastructure level.

Layer 1: Token-Level Schema Enforcement

The first layer of safety happens before the agent even finishes its sentence. If you're using Ollama v0.5.0 or newer, you can stop relying on the model to "try its best" with JSON.

Most people use the OpenAI-compatible API layer provided by frameworks, but that often just wraps the prompt in "Please return JSON." Ollama now supports a native format parameter that enforces the schema at the token-sampling level. This means the model physically cannot sample a token that violates the JSON schema.

Here is how I implemented this for my homelab health reports using qwen2.5:14b-instruct. I switched from the 32B model to the 14B variant because the 32B was causing 502 timeouts on my Tesla P40s due to VRAM pressure.

import httpx
from pydantic import BaseModel, Field

# Define the strict structure we want
class HomelabHealthReport(BaseModel):
    node_status: dict[str, str]
    critical_alerts: list[str]
    storage_utilization: float = Field(description="Percentage 0-100")

# Extract the JSON schema for Ollama
schema = HomelabHealthReport.model_json_schema()

def get_safe_report():
    # We bypass the high-level wrappers and hit the API directly
    # to ensure the 'format' parameter is actually passed.
    response = httpx.post(
        "http://ollama:11434/api/chat",
        json={
            "model": "qwen2.5:14b-instruct",
            "stream": False,
            "format": schema, # This is the magic: token-level enforcement
            "prompt": "Generate a health report for the homelab based on current metrics."
        },
        timeout=30.0
    )

    if response.status_code != 200:
        print(f"API Error: {response.status_code}")
        return None

    return response.json()["message"]["content"]

# Result is guaranteed to be valid JSON matching HomelabHealthReport

By moving the constraint to the sampler, I eliminated the ValidationError loops entirely. The model no longer "guesses" the JSON; it is constrained by the grammar of the schema.

Layer 2: The Pre-Execution Gate (ActionGate)

Even with perfect JSON, an agent can still decide to do something stupid. Token-level safety ensures the format is right, but it doesn't ensure the intent is safe.

I implemented an ActionGate. This is a deterministic middleware layer that sits between the agent's tool-call and the actual execution. It doesn't use an LLM. It uses hard-coded business logic and state checks.

If an agent tries to close a ticket, the ActionGate checks if there are open dependencies. If it tries to reboot a node, it checks if that node is currently the only one running a critical service.

class SafetyException(Exception):
    pass

def check_action_safety(action_name, params, context):
    """
    Deterministic safety check. 
    No LLMs allowed here.
    """
    # Prevent closing issues that have blocking dependencies
    if action_name == "close_issue":
        issue_id = params.get("issue_id")
        if context.get(f"issue_{issue_id}_has_dependency"):
            raise SafetyException(
                f"Safety Violation: Cannot close issue {issue_id} while dependencies are open."
            )

    # Prevent destructive actions on production nodes during peak hours
    if action_name == "reboot_node":
        node_id = params.get("node_id")
        if context.get("is_production") and context.get("peak_hours"):
            raise SafetyException(
                f"Safety Violation: Reboot of {node_id} forbidden during peak hours."
            )

    return True

# Usage in the agent loop
try:
    if check_action_safety(tool_call.name, tool_call.args, current_context):
        result = execute_tool(tool_call)
except SafetyException as e:
    # We feed the specific error back to the agent so it can pivot
    result = f"Action rejected by Safety Gate: {str(e)}"

This prevents the "infinite loop of failure" I mentioned earlier. Instead of the agent getting a generic 400 error from an API and thinking it's a network glitch, it gets a clear, human-readable explanation: "You cannot do this because X." This forces the agent to change its strategy rather than just retrying the same failed request.

Layer 3: Execution Isolation and Shell Safety

The final layer is where the rubber meets the road. I've spent too many hours debugging "quoting hell."

When you have an agent generating a command that needs to run over SSH, inside a Proxmox container (pct exec), as a specific user (su), and then executing a Python script, you have four layers of shell interpretation. If you use f-strings to build these commands, a single single-quote in the agent's output will break the entire pipeline.

I saw this happen when an agent tried to pass a complex JSON string as an argument to a script. The shell interpreted the quotes, the su command stripped another layer, and by the time it hit Python, the syntax was mangled.

The fix is to stop passing code as shell arguments. Instead, pipe the code directly into the stdin of the remote process.

The wrong way (prone to quoting errors):

# This will break the moment the agent adds a ' or " to the payload
ssh node-a "pct exec 101 -- su - user -c 'python3 -c \"print(\"Hello World\")\"'"

The right way (Shell-safe piping):
I wrote a helper that writes the agent's intended Python logic to a temporary file or pipes it directly. This avoids the shell's interpretation of the string entirely.

# We pipe the actual script content into the remote shell
cat ~/bin/helpers/scout-ideas-helper.py | \
  ssh node-a "pct exec 101 -- su - user -c 'python3 -'"

In this setup, python3 - tells Python to execute the code coming from stdin. The shell only sees the command to start Python, not the code itself. This completely eliminates the quoting nightmare.

To manage the tools themselves, I've moved away from custom boilerplate and started using FastMCP. It allows me to wrap my MSAM (Multi-Agent System Architecture) tools into a standardized server that the agents can discover and use without me having to manually update the tool definitions every time I add a new function. I've detailed the setup for this in my post on Building MCP Servers with FastMCP.

Why this works

This architecture works because it acknowledges that the LLM is the most unreliable part of the system.

Token-level enforcement removes the "formatting" problem. The agent can no longer fail because it forgot a comma.
The ActionGate removes the "logic" problem. The agent can no longer perform an action that is fundamentally unsafe, regardless of how confident it is.
Execution Isolation removes the "infrastructure" problem. The agent's output is treated as data (stdin) rather than as a command (shell argument).

When you combine these, you move from a system that is "mostly working" to one that is "predictably bounded."

Lessons Learned

The biggest surprise was how much the format parameter in Ollama reduced the need for complex prompt engineering. I spent weeks refining a "System Prompt" to ensure JSON compliance, only to find that a single API parameter did the job better than 500 words of instructions.

If I were to do this over again, I would have implemented the ActionGate much sooner. I spent too much time trying to make the agent "smarter" when I should have just made the environment "stricter."

A few caveats:

Latency: Each layer adds a small amount of overhead. The ActionGate is negligible (milliseconds), but the token-level enforcement can slightly increase the time to first token because the sampler has to do more work.
VRAM: As I noted, model size matters. Qwen 2.5 14B is the sweet spot for my hardware. If you're running on limited VRAM, don't chase the 32B or 70B models just for the sake of "intelligence" if it leads to 502 timeouts and unstable inference.
Memory Drift: Ensure your agent's memory is cleaned up. I use a six-layer memory architecture to prevent the agent from getting confused by outdated context, which is often the root cause of why it tries to perform unsafe actions in the first place.

Building autonomous agents isn't about finding the perfect model; it's about building the perfect cage for that model to operate in.