DEV Community

Cover image for After the Guardrail That Saved My Infrastructure: My Autonomous Agent Architecture in Production
Juan Torchia
Juan Torchia Subscriber

Posted on • Originally published at juanchi.dev

After the Guardrail That Saved My Infrastructure: My Autonomous Agent Architecture in Production

After the Guardrail That Saved My Infrastructure: My Autonomous Agent Architecture in Production

Why do we assume autonomous agents are going to fail in a contained way? I've been asking myself that question for a while, but it wasn't academic until one of my agents nearly destroyed my infrastructure on Railway. What came after — the redesign, the permission architecture, the observability layer I built from scratch — is the stuff that never shows up in the Twitter threads celebrating "the agent that did everything by itself."

This is the day after. The incident hangover. What's left when the guardrail stops the chaos and you have to build something that won't fail the same way again.


Autonomous Agent Architecture in Production: What I Broke and What I Rebuilt

I documented the original incident in the guardrails post. I'm not going to rehash the whole story, but the operational summary is this: an agent with write access to my Railway API executed a sequence of individually valid steps that, in combination, nearly wiped a production Postgres volume. The guardrail stopped it. I sat there staring at the log with my heart in my throat.

What bothered me wasn't that it failed. It's that I had assumed the permission scope was sufficient. I had the agent limited to certain endpoints. What I hadn't modeled is that the combination of valid endpoints could produce destructive effects.

That forced me to think differently. Not about flat permissions — "this agent can do X" — but about permission graphs with temporal context and sequence.

My architecture before the incident looked something like this:

Agent → API Gateway → Services
         (auth token)   (no sequence context)
Enter fullscreen mode Exit fullscreen mode

Clean. Simple. Wrong.


The Permission Graph I Built Post-Incident

The first thing I did after catching my breath was draw the real graph of what the agent could do. Not what I thought it could do: what it could actually execute given the token and exposed endpoints.

The result was uncomfortable. There were 14 possible paths from "list volumes" to "irreversible destructive operation," and I had only blocked 3.

I redesigned with three layers:

Layer 1: Atomic Permissions with Declared Intent

// Before: the agent had a token with scope "read:volumes write:volumes"
// After: each action declares intent and context

interface AgentAction {
  type: 'read' | 'write' | 'delete';
  resource: string;
  intent: string; // human-readable description of why
  reversible: boolean;
  requiresConfirmation: boolean;
}

const actionAllowed = (action: AgentAction, context: ExecutionContext): boolean => {
  // A write action after two reads on the same resource
  // in the same session triggers mandatory manual review
  if (action.type === 'write' && context.recentReads.includes(action.resource)) {
    if (context.actionsInSession > 3) return false; // hard cutoff
  }

  // Deletions are never automatic, no exceptions
  if (action.type === 'delete' && !action.requiresConfirmation) return false;

  return true;
};
Enter fullscreen mode Exit fullscreen mode

Layer 2: Session State with a Sliding Window

// The agent doesn't just have permissions: it has an action budget per window
interface SessionBudget {
  totalActions: number;        // max 20 per session
  writes: number;              // max 5 per session
  criticalActions: number;     // max 1 per session (require approval)
  windowMinutes: number;       // 30 minutes by default
  lastAction: Date;
}

// If the agent hits 80% of budget, it enters read-only mode
// If it hits 100%, the session closes and logs for review
Enter fullscreen mode Exit fullscreen mode

Layer 3: Forbidden Transition Graph

This is the one that took the longest to model and the one that changed how I think most fundamentally. Blocking individual actions isn't enough: you have to block sequences.

// Transitions that can never occur in direct sequence
const FORBIDDEN_TRANSITIONS = [
  ['list_volumes', 'unmount_volume'],           // too direct
  ['scale_service', 'modify_production_env'],   // destructive combination
  ['rotate_secrets', 'restart_service'],        // no verification pause
] as const;

const validateSequence = (history: string[], nextAction: string): boolean => {
  const lastAction = history[history.length - 1];
  const isForbidden = FORBIDDEN_TRANSITIONS.some(
    ([from, to]) => from === lastAction && to === nextAction
  );
  if (isForbidden) {
    logger.warn(`Forbidden transition detected: ${lastAction}${nextAction}`);
    return false;
  }
  return true;
};
Enter fullscreen mode Exit fullscreen mode

This forbidden transition pattern is what would have stopped the original incident before it ever reached the last-resort guardrail. The guardrail is a safety net; this is the scaffolding that should have been there from the start.


The Observability Layer I Built from Scratch

Before the incident I had logs. After the incident I have intent traceability.

The difference is subtle but fundamental. A log says "the agent executed DELETE /volumes/xyz at 23:47". Intent traceability says "the agent declared it was going to 'clean up orphaned volumes', executed these 7 actions in sequence, and action 5 deviated from the declared intent by 40%."

That's what I implemented:

interface AgentTrace {
  sessionId: string;
  declaredIntent: string;           // what the agent said it was going to do
  executedActions: ActionTrace[];
  intentDeviation: number;          // 0-100, calculated by semantic similarity
  alertsGenerated: string[];
  totalTime: number;
  finalState: 'completed' | 'blocked' | 'cancelled' | 'error';
}

interface ActionTrace {
  timestamp: Date;
  action: string;
  parameters: Record<string, unknown>;
  result: 'success' | 'blocked' | 'error';
  tokenCost?: number;               // if the action involves an LLM call
  latencyMs: number;
}
Enter fullscreen mode Exit fullscreen mode

This runs on Postgres (the same stack I documented in the Docker Compose in production post) and gives me an agent session table I can audit. Not glamorous. It's a SQL table with indexes. But in the two weeks it's been running, it already caught three sessions where the agent deviated from its declared intent before it could do anything harmful.

Concrete numbers from those two weeks:

  • 47 agent sessions executed
  • 3 sessions blocked for intent deviation > 60%
  • 1 session cancelled for budget exhaustion
  • 0 production incidents

That 0 matters to me. But so does the fact that the system generated 11 alerts I reviewed manually, and in 4 of those cases the agent was right and I was being too conservative. Tuning those thresholds is weeks of work.


The Mistakes I Made Redesigning (So You Don't Have To)

Mistake 1: I modeled permissions as if the agent were a human

When I designed the permissions, I thought in terms of "what would a human dev do with this access." The agent is not a human. It can execute 20 actions in 8 seconds without fatigue, without doubt, without the intuitive brake of "wait, this doesn't feel right." The mental model has to change.

Mistake 2: I confused observability with logging

I had Datadog, I had structured logs. I thought that was observability. It's not — at least not for agents. Agent observability requires understanding intent and measuring the distance between what the agent said it would do and what it actually did. Without that dimension, logs are a damage record, not a prevention tool.

This connects to something I worked through when diagnosing deadlocks in production: the problem wasn't that I didn't have data. It was that the data I had didn't show me the system's state at the moment that actually mattered. Same thing with agents.

Mistake 3: I assumed the agent's context was stable

An agent executing 15 steps doesn't have the same "understanding" at step 1 as at step 15. Accumulated context changes its behavior. I designed the permissions for the agent at step 1, not for the agent that's already processed 14 actions and has the full session context loaded. That asymmetry is dangerous.

Now I have context windows with decay: older actions in the session lose weight in the calculation of "how aligned is the agent with its declared intent." Not perfect, but more honest than assuming context is linear.

Mistake 4: I didn't model the cost of false positives

The first system I deployed was so conservative it blocked the agent every three actions. I shut it down after a day because it generated more friction than value. Security that creates excessive friction gets disabled. That's also a security failure — just a slower one.

Related to what I found when simulating supply chain attacks on dependencies: protection that hurts too much gets removed. You have to calibrate so the cost of the guardrail is lower than the cost of the incident it prevents.


FAQ: Autonomous Agent Architecture in Production

What is a permission graph for agents and why is it better than flat permissions?

A permission graph models not just what an agent can do, but in what sequence and under what context conditions. Flat permissions say "can read and write." The graph says "can write, but only if it hasn't read the same resource more than twice in the last window, and only if the declared intent includes a write operation." The difference is the temporal and sequential dimension. For autonomous agents executing long action chains, it's the difference between a containable system and one that fails in ways you didn't anticipate.

How much latency does this add per agent action?

In my stack, permission validation + sequence check + session state update adds between 8 and 23ms per action, depending on whether it needs to query the full session history. For most use cases, that's acceptable. If the agent is doing things that take seconds (external API calls, LLM generation), 20ms is noise. If it's doing in-memory read operations that take microseconds, then you need to think about whether the overhead is worth it.

What if the agent needs to make a transition I have forbidden but for a legitimate reason?

In my architecture, forbidden transitions can be unlocked with explicit out-of-band approval. The agent cannot self-approve; it has to emit an unlock request that lands in a queue I review. In practice, this happened three times in two weeks and all three times the agent was right. That tells me some of my forbidden transitions are too restrictive. I'm iterating. The alternative — letting the agent self-approve — I don't consider.

How do you measure the agent's "intent deviation"?

I calculate semantic similarity between the intent declared at the start of the session and a textual description of the actions executed so far. I use embeddings with a lightweight model (not Claude for this — the cost doesn't scale). If similarity drops below a threshold, it enters review mode. I calibrated the threshold empirically over the first two weeks: started at 70%, dropped to 55% after too many false positives. It's still a heuristic; it's not a mathematical guarantee.

Does this work with any agent framework or is it specific to your stack?

The three layers — atomic permissions with intent, session budget, forbidden transition graph — are conceptually framework-agnostic. I implemented this by hand on top of my API Gateway because no framework I evaluated had these primitives natively in 2025. If you're using LangGraph or AutoGen, you can implement the same pattern as middleware between graph nodes. The specific code changes; the mental model doesn't.

What about agents that create sub-agents? Do permissions get inherited?

This is the question that worries me most and the one I still don't have fully figured out. In my current stack, sub-agents inherit a subset of the parent's budget, never the full budget. If the parent agent has 20 actions available and creates a sub-agent, that sub-agent starts with a maximum of 5. The forbidden transition graph is inherited in full. But declared intent doesn't propagate automatically: the sub-agent has to declare its own intent, which I then validate against the parent's. It's imperfect. What I'm clear on is that full permission inheritance — which is what most frameworks do by default — is a time bomb. I touched on this from a different angle in the post on agents that create accounts and deploy on their own.


What I Learned and What Still Doesn't Sit Right

My thesis, after all of this: autonomous agents don't fail from lack of capability, they fail from overconfidence in the permission model. And the permission model we inherited comes from systems where the actor has emotional state, fatigue, and situational judgment. Agents have none of the three.

The redesign I did isn't elegant. It's layers upon layers of formalized distrust. Sequence validation, session budgets, intent traceability. Together they add up to an architecture that's slower, more complex, and harder to maintain than what I had before.

And yet: zero incidents in two weeks. Three deviations caught before they could do damage. Infrastructure I'm still trusting with real work.

What still doesn't sit right is scale. This system works for one agent, for three agents running in parallel. I don't know how it behaves with twenty. The session budget becomes a shared resource that needs coordination, the transition graph gets more complex, the traceability starts to weigh on Postgres. That's the next problem. For now, I'm solving the one I have.

If you're coming from the async Rust edge cases in production post, you know my tendency is to validate in my real codebase first before adopting a pattern. This was no different. Two weeks of real data is worth more than any architecture on a whiteboard.

The system is running. It's failing in ways I can see. For now, that's enough.


This article was originally published on juanchi.dev

Top comments (0)