Implementing sudo for LLMs: A Middleware Approach to AI Security

#ai #security #architecture #opensource

The "Write Access" Anxiety

We are all rushing to build "Agents"—AI that can use tools, not just chat. But the moment I gave my LangChain agent a stripe_api_key, I felt a knot in my stomach.

We are essentially giving a probabilistic model (an LLM) deterministic access to our bank accounts and cloud infrastructure. If the LLM hallucinates a loop, or gets prompt-injected, my database is gone.

I realized that "asking the LLM nicely" (System Prompts) is not a security strategy. You wouldn't secure a Linux server by asking users to "please be nice." You use permissions. You use sudo.

So, I spent the last few weeks building a Governance Layer for AI Agents. Here is a deep dive into the architecture and the challenges of building "Human-in-the-Loop" middleware.

The Architecture: The "Man-in-the-Middle"

The core problem is that once an Agent starts a tool call, the developer usually loses control. The execution is synchronous and opaque.

I needed a proxy. A middleware that sits between the Agent and the Critical Resource.

I call it SudoMode. It works on a simple "Intercept and Verify" loop:

Intercept: The SDK wraps the dangerous function (e.g., stripe.charge).
Evaluate: A local Policy Engine checks policies.yaml.
Pause: If the action is "High Risk," the Python thread sleeps.
Poll: The SDK polls the Governance Server every 2 seconds.
Resume: Once a human approves the request via the Dashboard, the server returns the authorization token, and the script resumes.

The Code: Policy as Configuration

I didn't want to bake rules into the Python code. I wanted them declarative, similar to Kubernetes manifests or OPA (Open Policy Agent).

Here is what a policy looks like. It’s simple YAML that defines the "Blast Radius" of an agent:

rules:
  # Rule 1: Hard Block (The Firewall)
  - id: "protect-production-db"
    resource: "postgres"
    action: "drop_table"
    response: "DENY"

  # Rule 2: Conditional Approval (The Sudo Command)
  - id: "spending-limit"
    resource: "stripe"
    action: "charge"
    condition: "args.amount > 50"
    response: "REQUIRE_APPROVAL"

The "Wait" Mechanism (The Engineering Challenge)

The hardest engineering challenge was handling the Async/Sync mismatch.

Most Agent frameworks (CrewAI, LangChain) expect a tool to return a value immediately. They don't handle "Wait for human" well natively. If you throw an error, the agent crashes or tries to "fix" the error by hallucinating.

To solve this, I implemented a Long Polling loop in the client SDK. Instead of raising an exception, the client enters a while True loop that mimics a slow network request.

def execute(self, resource, action, args):
    # 1. Initial Check
    decision = self.check(resource, action, args)

    # 2. The Waiting Game
    if decision['status'] == 'REQUIRE_APPROVAL':
        # Log to stdout so the developer sees the pause
        logger.info(f"Paused. Waiting for Admin (ID: {decision['request_id']})...")

        while True:
            time.sleep(2)

            # POLL THE SERVER
            status = self._get_request_status(decision['request_id'])

            if status == "APPROVED":
                return True 
            elif status == "REJECTED":
                raise PermissionError("Request Denied by Admin.")

To the Agent, it just looks like the API is taking a while to respond. To the Human, it looks like a "Pending Request" on a dashboard.

Why Open Source?

I built this because I needed it, but I realized every developer building Agents is reinventing this wheel. We are all hacking together weird input() loops to check our agents.

I open-sourced SudoMode to be a standard, drop-in safety net. It’s not perfect—it’s an MVP—but it effectively separates Intelligence (the LLM) from Control (the Policy).

I’m currently looking for feedback on the architecture, specifically on how to handle distributed state if the agent crashes while waiting.

If you are building Agents and losing sleep over security, check out the repo. I’d love to see if this architecture fits your use case.

GitHub Repo: https://github.com/numcys/sudomode