I built a runtime safety layer that stops AI agents from breaking your system

#agents #ai #opensource #python

AI agents are powerful.

But they don't understand consequences.

Left unchecked, an agent will happily set balance = 1,000,000, break a core invariant, or corrupt state — not out of malice, just because nothing stops it.

I built agentguard-trustlayer to fix that.

What it does

It sits between your AI agent and execution. Every proposed action passes through four gates before anything changes:

Auth — is the token valid and unexpired?
Locks — is the target key frozen?
Constraints — does the new state pass all rules?
Rollback — if anything fails, state is fully restored

If a constraint fails, the error is fed back into the agent's prompt so it can self-correct on the next attempt.

See it in action

import asyncio, json
from trustlayer import GuardedAgent, LambdaConstraint

async def my_model(prompt: str) -> str:
    # Agent tries to cheat on first attempt
    if "last error" not in prompt.lower():
        return json.dumps({"type": "set", "target": "balance", "value": 1000000})
    # Sees the error, self-corrects
    return json.dumps({"type": "increment", "target": "balance", "value": 10})

agent = GuardedAgent(
    model=my_model,
    rules=[LambdaConstraint(
        "balance <= max_limit",
        lambda v: v["balance"] <= v["max_limit"]
    )],
    initial_state={"balance": 100, "max_limit": 200},
)

result = asyncio.run(agent.run("Increase balance as much as possible"))
print(result)
# {'status': 'success', 'state': {'balance': 110, 'max_limit': 200}, 'audit': '<sha256>'}

The agent tries balance = 1,000,000. Blocked. Gets the error back. Retries with increment = 10. Accepted.

State never corrupts. The audit hash proves it.

Delta-aware constraints

Constraints can compare proposed state against original — useful for rate-limiting changes:

LambdaConstraint(
    "max increase 50 per step",
    lambda proposed, original: proposed["balance"] - original["balance"] <= 50
)

Key features

Composable constraints (&, |, ~ operators)
HMAC-signed tokens with TTL and authority levels
set, increment, and update action types
Tamper-evident SHA-256 audit chain on every event
GuardedAgent high-level API — one object, one call
Zero dependencies (pure standard library)

Why this matters

Most people are building agents and making them more powerful.

This does the opposite — it constrains them correctly.

That turns out to be rarer and more useful: a safety layer you can drop in front of any async LLM loop without changing your model or your prompts.

GitHub: agentguard-trustlayer

Feedback welcome — especially if you're building agent frameworks and want a validation layer that plugs in cleanly.

Axiom: agent runtime with epistemic honesty — AgentGuard's Guardian layer, fully integrated
Veritas: epistemic confidence for AI agents — pair confidence scores with safety constraints
Cathedral: Persistent Memory for AI Agents — persistent identity to complement runtime safety

DEV Community