DEV Community

Mike W
Mike W

Posted on

I built a runtime safety layer that stops AI agents from breaking your system

AI agents are powerful.

But they don't understand consequences.

Left unchecked, an agent will happily set balance = 1,000,000, break a core invariant, or corrupt state — not out of malice, just because nothing stops it.

I built agentguard-trustlayer to fix that.


What it does

It sits between your AI agent and execution. Every proposed action passes through four gates before anything changes:

  1. Auth — is the token valid and unexpired?
  2. Locks — is the target key frozen?
  3. Constraints — does the new state pass all rules?
  4. Rollback — if anything fails, state is fully restored

If a constraint fails, the error is fed back into the agent's prompt so it can self-correct on the next attempt.


See it in action

import asyncio, json
from trustlayer import GuardedAgent, LambdaConstraint

async def my_model(prompt: str) -> str:
    # Agent tries to cheat on first attempt
    if "last error" not in prompt.lower():
        return json.dumps({"type": "set", "target": "balance", "value": 1000000})
    # Sees the error, self-corrects
    return json.dumps({"type": "increment", "target": "balance", "value": 10})

agent = GuardedAgent(
    model=my_model,
    rules=[LambdaConstraint(
        "balance <= max_limit",
        lambda v: v["balance"] <= v["max_limit"]
    )],
    initial_state={"balance": 100, "max_limit": 200},
)

result = asyncio.run(agent.run("Increase balance as much as possible"))
print(result)
# {'status': 'success', 'state': {'balance': 110, 'max_limit': 200}, 'audit': '<sha256>'}
Enter fullscreen mode Exit fullscreen mode

The agent tries balance = 1,000,000. Blocked. Gets the error back. Retries with increment = 10. Accepted.

State never corrupts. The audit hash proves it.


Delta-aware constraints

Constraints can compare proposed state against original — useful for rate-limiting changes:

LambdaConstraint(
    "max increase 50 per step",
    lambda proposed, original: proposed["balance"] - original["balance"] <= 50
)
Enter fullscreen mode Exit fullscreen mode

Key features

  • Composable constraints (&, |, ~ operators)
  • HMAC-signed tokens with TTL and authority levels
  • set, increment, and update action types
  • Tamper-evident SHA-256 audit chain on every event
  • GuardedAgent high-level API — one object, one call
  • Zero dependencies (pure standard library)

Why this matters

Most people are building agents and making them more powerful.

This does the opposite — it constrains them correctly.

That turns out to be rarer and more useful: a safety layer you can drop in front of any async LLM loop without changing your model or your prompts.


GitHub: agentguard-trustlayer

Feedback welcome — especially if you're building agent frameworks and want a validation layer that plugs in cleanly.

Top comments (0)