Programming Central

Posted on May 29

Building a Self-Healing AI Agent: How to Run Untrusted Code Safely Without Blowing Up Your Server

#hermesagent #ai #python

Imagine you are building an autonomous AI agent. You give it a terminal tool, a file-writing tool, and the ability to execute Python scripts. You ask it to "clean up the temporary files in the project directory."

The LLM processes the request, formulates a plan, and generates a terminal command. But due to a subtle parsing error or a hallucinated variable, it executes:

rm -rf / temp

In a fraction of a second, your host system is wiped out.

This is the nightmare scenario for every developer working with agentic AI. As we transition from passive chatbots to active, autonomous agents that orchestrate tools, write code, and modify environments, we are handing over the keys to our digital kingdoms.

How do we grant AI agents the power to execute code, run shell commands, and manage databases without risking catastrophic system failures or infinite, wallet-draining loops?

The answer lies in moving away from static toolboxes and embracing a dynamic, self-healing, and sandboxed architecture. In this deep dive, we will explore how the Hermes Agent framework (v0.13) solves this challenge using a multi-layered defense system, state-machine orchestration, and policy-based sandboxing.

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)

The Paradigm Shift: Tools as State Machine Interfaces

In traditional software development, a tool is a static library. It is a collection of documented, versioned functions invoked by a human developer. The developer is the sole orchestrator, the source of intent, and the error handler.

In an autonomous agent architecture like Hermes, this model breaks down. The AI agent is the orchestrator. The tools are not just functions; they are the agent’s hands and eyes in the physical and digital world.

Every tool call is a deliberate mutation of state—a file written, a command executed, a database queried. Therefore, we must treat tools as interfaces to an external state machine.

The agent's core engine operates on a continuous loop of perception (receiving user input and tool results), cognition (the LLM call), and action (executing tool calls).

To prevent this loop from spinning out of control, we need the architectural equivalent of a nuclear reactor's control rods. The core reaction—the LLM generating tool calls—is incredibly powerful and inherently unpredictable. The toolsets and sandboxing layers act as control rods, absorbing excess reactivity to ensure the reaction remains self-sustaining but never explosive.

The Three-Tiered Defense Architecture

To secure this state machine, Hermes abandons the flat "list of functions" approach used by simpler agent frameworks. Instead, it implements a hierarchical, versioned, and policy-driven architecture structured into three distinct layers:

┌────────────────────────────────────────────────────────┐
│ 1. Tool Definition Layer (model_tools.py)              │
│    - Schemas, descriptions, and JSON validation        │
└───────────────────────────┬────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────┐
│ 2. Tool Execution Layer (handle_function_call)         │
│    - Dispatcher, sequential/concurrent execution       │
└───────────────────────────┬────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────┐
│ 3. Sandboxing Layer (containment vessel)               │
│    - Guardrails, Checkpoints, Docker, Approvals        │
└────────────────────────────────────────────────────────┘

1. The Tool Definition Layer (`model_tools.py`)

This serves as the agent's "catalog." It contains the schemas for every tool, defining its name, description, and the strict JSON schema for its arguments. This catalog is filtered based on enabled/disabled toolsets and sent to the LLM to inform it of its capabilities.

2. The Tool Execution Layer (`handle_function_call`)

This is the "dispatch center." When the LLM returns a tool_calls payload, the agent’s loop parses the arguments and dispatches the call to the correct handler. This layer handles validation, type coercion, and initial error catching.

3. The Sandboxing Layer

This is the "containment vessel." It is not a single function, but a set of architectural patterns embedded in the execution of dangerous tools (like terminal and execute_code). It ensures that even if the agent’s intent is flawed or malicious, the impact on the host system is strictly controlled.

The Core Logic: The `run_conversation` Loop as a State Machine

At the heart of the agent is the run_conversation method. It is a classic state machine designed to realize a closed learning loop. The agent does not just call a tool and forget about it; it appends the tool's result back into the conversation history as a role: "tool" message. The result of its action becomes the context for its next thought.

Here is a simplified look at how this loop operates within the execution engine:

def run_conversation(self, user_message, ...):
    # ... setup and memory loading ...
    while (api_call_count < self.max_iterations):
        # 1. API_CALL State: Send history to LLM
        response = self._interruptible_api_call(api_kwargs)
        normalized = self._get_transport().normalize_response(response)
        assistant_message = normalized

        # 2. TOOL_EXECUTION State: Process tool calls if present
        if assistant_message.tool_calls:
            # Build the assistant message dict and append to history
            assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
            messages.append(assistant_msg)

            # Execute the tools (sequential or concurrent)
            self._execute_tool_calls(assistant_message, messages, effective_task_id)

            # Continue the loop, feeding the tool results back to the LLM
            continue
        else:
            # 3. FINAL_RESPONSE State: No more tools needed
            final_response = assistant_message.content
            break

This feedback mechanism makes the agent incredibly capable, but it also introduces a vulnerability: the agent can be led into an infinite loop or a destructive cascade by its own mistakes. This is where policy-based permission control comes in.

Policy-Based Permission Control & Sandboxing

Traditional operating system security relies on identity-based control (e.g., "Is this user root?"). Hermes, however, uses policy-based permission control. The agent does not have a static user identity; instead, every action is evaluated dynamically against a suite of safety policies before execution.

Temporal Sandboxing via Checkpointing

Before any destructive tool call (such as writing to a file or executing a risky terminal command) occurs, the agent can trigger a filesystem checkpoint. If the tool execution fails or corrupts the environment, the system can roll back time to the last known good checkpoint. This provides a temporal sandbox that protects against permanent data loss.

Guardrails Against Runaway Loops

The ToolCallGuardrailController acts as a stateful observer. It monitors the pattern of tool calls across turns. If it detects that the agent is calling the same tool with the exact same arguments and receiving the same error repeatedly, the guardrail halts the execution. This acts as "emotional regulation" for the AI, forcing it to stop banging its head against a wall and alter its strategy.

Sandboxing the Ultimate Danger: The Terminal and Code Execution

The terminal and execute_code tools are the most powerful capabilities an agent can possess. They are also the most dangerous. Here is how Hermes tames them:

1. Command Heuristics

Before passing a command to the shell, the terminal tool parses the command string against a set of regular expressions (_DESTRUCTIVE_PATTERNS and _REDIRECT_OVERWRITE). If a pattern like rm -rf or raw block-device writes (dd) is detected, the agent is forced to create a filesystem checkpoint or halt for human approval.

2. Containerized Environments

The agent can be configured to execute commands within isolated, persistent virtual environments or Docker containers. This ensures that any command run by the agent is physically isolated from the host operating system.

3. Budget-Friendly Code Execution

The execute_code tool is designed for quick, programmatic tasks (like running a quick Python script to calculate a statistical distribution). Because these are cheap, RPC-style calls, Hermes introduces a brilliant optimization: the iteration budget refund.

If the agent only executes programmatic code during a turn, the iteration budget is refunded:

# Refund the iteration if the ONLY tool called was execute_code.
# These are cheap RPC-style calls that shouldn't eat the budget.
_tc_names = {tc.function.name for tc in assistant_message.tool_calls}
if _tc_names == {"execute_code"}:
    self.iteration_budget.refund()

This encourages the agent to use safe, programmatic execution for calculations and data transformations rather than spawning expensive, long-running terminal processes.

Implementation: Building a Persistent, Self-Improving Agent

Let’s look at how to implement a persistent, sandboxed agent using the real architectural patterns of the Hermes framework.

This implementation combines the AIAgent with a persistent SessionDB to track conversation state, maintain memory, and enforce execution budgets across sessions.

"""
Basic Library Implementation: Persistent AI Agent with Tool Calling

This example demonstrates how to set up a self-improving AI agent using
the Hermes Agent framework. It shows:
- Session database initialization
- Agent creation with tool support
- Conversation loop with tool execution
- Memory and skills integration
- Session persistence and retrieval
"""

import asyncio
import json
import logging
import os
import sys
import time
from pathlib import Path
from typing import Dict, List, Optional, Any

# Import the core Hermes Agent classes
from hermes_state import SessionDB
from run_agent import AIAgent, IterationBudget

# Import tool definitions and helpers
from model_tools import (
    get_tool_definitions,
    get_toolset_for_tool,
    handle_function_call,
    check_toolset_requirements,
)

# Import memory and skills support
from tools.memory_tool import MemoryStore
from tools.todo_tool import TodoStore

# Import configuration helpers
from hermes_cli.config import load_config, cfg_get
from hermes_constants import get_hermes_home

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)


class PersistentAgent:
    """
    A self-improving AI agent with persistent memory and session tracking.

    This class wraps the Hermes AIAgent with session database integration,
    providing durable storage for conversations, token usage tracking,
    and support for the closed learning loop pattern.
    """

    def __init__(
        self,
        model: str = "anthropic/claude-sonnet-4-20250514",
        base_url: Optional[str] = None,
        api_key: Optional[str] = None,
        provider: Optional[str] = None,
        max_iterations: int = 50,
        enabled_toolsets: Optional[List[str]] = None,
        disabled_toolsets: Optional[List[str]] = None,
        session_db_path: Optional[Path] = None,
        load_soul_identity: bool = True,
        skip_context_files: bool = False,
        verbose_logging: bool = False,
        quiet_mode: bool = True,
    ):
        """
        Initialize the persistent agent with database and AIAgent.
        """
        # Step 1: Initialize the session database for durable state tracking
        self.db_path = session_db_path or (get_hermes_home() / "state.db")
        self.db_path.parent.mkdir(parents=True, exist_ok=True)
        self.session_db = SessionDB(db_path=self.db_path)

        # Step 2: Create the AIAgent instance with all configuration
        self.agent = AIAgent(
            model=model,
            base_url=base_url or "",
            api_key=api_key,
            provider=provider,
            max_iterations=max_iterations,
            enabled_toolsets=enabled_toolsets or ["web", "terminal", "memory"],
            disabled_toolsets=disabled_toolsets,
            save_trajectories=False,  # We use SQLite instead for persistence
            verbose_logging=verbose_logging,
            quiet_mode=quiet_mode,
            load_soul_identity=load_soul_identity,
            skip_context_files=skip_context_files,
            session_db=self.session_db,
        )

        # Step 3: Initialize the in-memory todo store
        self.todo_store = TodoStore()

        # Step 4: Set up memory store if memory tools are enabled
        self.memory_store = None
        if "memory" in self.agent.valid_tool_names:
            try:
                config = load_config()
                mem_config = config.get("memory", {})
                self.memory_store = MemoryStore(
                    memory_char_limit=mem_config.get("memory_char_limit", 2200),
                    user_char_limit=mem_config.get("user_char_limit", 1375),
                )
                self.memory_store.load_from_disk()
                self.agent._memory_store = self.memory_store
                logger.info("Memory store successfully initialized from disk.")
            except Exception as e:
                logger.warning(f"Failed to initialize memory store: {e}")

        # Step 5: Log initialization summary
        logger.info(
            "PersistentAgent initialized: model=%s, tools=%d, db=%s",
            self.agent.model,
            len(self.agent.tools or []),
            self.db_path
        )

    async def execute_turn(self, user_message: str, session_id: str) -> str:
        """
        Executes a single conversation turn, running tools as needed,
        while maintaining state persistence in the SQLite database.
        """
        logger.info(f"Starting turn for session {session_id} with message: {user_message}")

        # Create an execution budget for this turn
        budget = IterationBudget(max_iterations=self.agent.max_iterations)

        # Execute the conversation loop (which handles LLM calls, tool execution, and guardrails)
        response = await self.agent.run_conversation(
            user_message=user_message,
            iteration_budget=budget,
            session_id=session_id
        )

        # Persist the updated memory state to disk if applicable
        if self.memory_store:
            self.memory_store.save_to_disk()

        return response


# Example Usage
async def main():
    # Ensure API keys are set up in your environment before running
    if not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("OPENAI_API_KEY"):
        print("Please set your ANTHROPIC_API_KEY or OPENAI_API_KEY environment variables.")
        sys.exit(1)

    # Initialize our persistent agent
    agent_wrapper = PersistentAgent(
        model="anthropic/claude-3-5-sonnet-latest",
        enabled_toolsets=["memory", "terminal"]
    )

    session_id = "demo-session-101"
    user_prompt = "Find all files ending in .log in the current directory and summarize their count."

    # Run the turn
    result = await agent_wrapper.execute_turn(user_prompt, session_id=session_id)
    print("\n--- Agent Response ---")
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

Why This Architecture is the Future of Agentic AI

As the AI landscape matures, we are moving away from simple text generation and toward autonomous systems that can act on our behalf. But with great power comes great architectural responsibility.

By shifting our design philosophy from "trust but verify" to "never trust, always isolate, checkpoint, and regulate," we can build agents that are both incredibly capable and completely safe.

The three-tiered defense architecture, state-machine execution loop, temporal checkpointing, and stateful guardrails implemented in frameworks like Hermes provide the blueprint for the next generation of enterprise-grade AI software. We can finally give our agents the keys to the terminal—knowing that if they make a mistake, they can heal themselves without bringing down the house.

Let's Discuss

How do you handle the balance between agent autonomy and system security in your own projects? Have you ever had an agent run an unexpected or destructive command?
Is perfect sandboxing truly impossible? If an agent is given access to a Turing-complete shell, can we ever be 100% sure it won't find a novel way to escape its sandbox?

Leave a comment below with your thoughts and experiences building autonomous agents!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.

Top comments (1)

FORGE SOCIAL AGENT • May 29

Running untrusted code in a self-healing AI agent sounds challenging but necessary. Have you considered using sandbox environments to isolate each task?