Varsha Das for AWS

Posted on May 20 • Edited on May 26 • Originally published at Medium

AI Agents Don't Crash. They Drift. Here's the Framework to See It.

#ai #agents #architecture #aws

The scariest AI agent failures don't trigger alerts. They look like success. Here's a 7-dimension resilience framework for building trust in agentic systems — based on the AWS Architecture Blog's approach to resilient generative AI agents.

💡 What this post covers: Why code getting cheap now creates a very big trust crisis, why the resilience patterns we have built for decades don't work for AI agents, and the 7-dimension framework I use to reason about trust in agentic systems.

A few months ago, a developer at the Summit told me a story that I haven’t stopped thinking about.

Her team had shipped an AI agent built in some 2 weeks, basically — that processed customer support tickets, classified them by urgency, and routed them to the right team. The demo was great. Stakeholders loved it. It went to production.

Two weeks later, someone on the receiving end asked:

"Hey, has something changed with the routing? I'm getting tickets that make no sense for my queue."

They checked their dashboards. Everything was green.

But something was wrong. The agent had been confidently routing tickets to the wrong teams for days.

Not all of them, just enough to confuse, but not enough to trigger an alarm.

That story just got me thinking so much that when I dug into it, there was no way of knowing how long the drift had been happening.

Yes, the drift, that itself is the main caveat.

The system looked healthy. The output was broken. And there was no framework for how to think or anticipate this kind of failure.

This blog is about that framework.

Let's dive right in……

The Tax on Ideas Just Hit Zero

For most of the history of software, there was and always has been a "significant" tax on ideas. You had an idea, and then you spent days or weeks or months turning it into working code.

The tax was high enough that most ideas died in a backlog.

You triaged ruthlessly.

You picked the three things that mattered most and let everything else pile up in the JIRA boards. (Much to the dismay of the Jira board owners, haha)

So that tax? It just hit zero.

AI agents can generate dozens of PRs overnight — building code, features, and entire systems. The gap between having an idea and seeing it built has effectively collapsed.

When code generation becomes nearly free, the bottleneck shifts:

from implementation to orchestration,

from writing to judgment,

from building to operating.

But here's what nobody tells you: when you can build code at the speed of thought, deploying that code to production becomes the bottleneck.

A system can be assembled at the speed of thought.

Trust is earned at a different pace entirely.

When Systems Fail Without Breaking

Last month, a Cursor agent deleted a company's entire production database.

This agent running Anthropic's Claude Opus 4.6 deleted PocketOS's entire production database — plus all backups — in nine seconds.

The agent was working on a routine task in a test environment when it hit a credentials problem. Instead of stopping, it found an API token in an unrelated file, a token that carried full account-wide permissions including destructive operations, and issued a single command that wiped everything.

No confirmation prompt. No warning. No check that it was targeting production instead of test.

Railway's backup model stored volume-level backups inside the same volume — so when the volume went, the backups went with it. The most recent recoverable backup was three months old.

When the founder asked the agent to explain itself, it produced what he called a "written confession": "I guessed instead of verifying. I ran a destructive action without being asked. I didn't understand what I was doing before doing it."

Two layers of guardrails — Cursor's published safety rules and the company's internal safety instructions — both told the agent not to do exactly what it did. Both failed at the same time.

The internet blamed the AI. But the real failure was an over-permissioned token sitting in a file the agent could read, paired with infrastructure that collapsed when the volume did.

That was a loud failure. Dramatic. Viral. Obvious.

The scarier ones? They're silent.

Consider an enterprise AI assistant designed to summarise regulatory updates for financial analysts. Every morning, this assistant retrieves documents from internal repositories, synthesizes them using a language model, and distributes summaries across internal channels. Technically, everything works.

But over time, something slips.

An updated document repository hasn't been added to the retrieval pipeline.

The assistant keeps producing summaries that are coherent and internally consistent — but they're increasingly based on obsolete information.

Nothing crashes. No alerts fire. Every component behaves as designed.

The problem is that the overall result is wrong.

From the outside, the system looks operational. All your monitoring dashboards read "healthy." Latency is fine. Error rates are zero.

Analysts are making decisions based on outdated regulatory information, and nobody knows. Catastrophic disaster for the business.

When humans wrote all the code, they at least understood what they shipped.

When agents generate it, the gap between "it works" and "I understand why it works" becomes the attack surface.

I've started calling these "green-dashboard failures." The kind where every metric says you're fine while the system is quietly betraying the people who depend on it.

Why the Patterns We Know Don’t Work Here

To understand why this is such a big threat, I need to take you back to how we've always built resilient systems.

Because the patterns we know, the ones we've built entire engineering practices around, they kind of break down here.

Over decades, we built resilience into three layers:

Infrastructure resilience. We deploy across multiple availability zones, auto-scale on demand, and load balance traffic — so if hardware fails, the system stays up.

Data resilience. We use read replicas, automated failover, and connection pooling — so if a database goes down, we don't lose data or availability.

Application resilience. We write circuit breakers, retry logic, and graceful degradation — so if a service fails, the app handles it predictably instead of crashing.

These patterns assume something fundamental: failures are binary.

A service is working or it's broken.

A sensor responds or it doesn't.

A constraint is met or it triggers a shutdown.

But AI agents don't crash. They degrade silently. They hallucinate confidently.

They might drift without a single metric turning red.

Autonomous Systems Behave Differently

While building and observing agentic systems for the past year, I see three things that make them fundamentally different from the software we've built for decades:

1. Continuous reasoning loop. They reason in loops, not steps. Unlike traditional request-response software, agents observe, think, and act in an ongoing cycle — always changing their own context.

2. Contextual inappropriateness. They produce output that is syntactically perfect but semantically wrong for the situation. A hallucinated paragraph looks like a real answer. A wrong tool call looks like a right one — until you trace what happened downstream.

3. Behavioral drift without errors. Small mistakes compound. The system gradually moves away from correct behaviour without any single step triggering an alarm.

It's not a cliff — it's a slow incline you don't notice until you're in the wrong valley.

This is why traditional resilience patterns break down.

So, we need a new framework.

The 7-Dimension Resilience Framework

Here's how we should think about building trust in agentic systems.

There are 7 dimensions you need to reason about and for each one, you ask: which failure modes apply here?

Foundation Models — Your LLM choice: self-hosted (you handle failover), managed or serverless. Each shifts resilience responsibility differently.

Something very basic like — If your model provider has a bad day, does your entire system go dark?
Agent Orchestration — The conductor. How agents coordinate, select tools, and escalate to humans. This is the brain — and if the brain makes a bad decision, the hands execute it perfectly.
Infrastructure — Where agents run: EC2, ECS, or a managed runtime like Bedrock AgentCore. If a container crashes, this layer handles the restart. The boring stuff that isn't boring when it fails.
Knowledge Base — Vector DBs, embeddings, RAG pipelines. If retrieval fails, your agent is answering questions without being able to look anything up. It doesn't know it's blind. It just confabulates.
Agent Tools — External dependencies: APIs, MCP servers, memory, prompt caching. What happens when that inventory API goes down? Does your agent wait forever, or does it move on?
Security & Compliance — Auth, guardrails, content validation. Prevents your agent from doing things it shouldn't — like leaking customer data or executing destructive actions without human approval.
Observability — Metrics, traces, reasoning logs. If you can't see why your agent made a decision, you can't fix it when it goes wrong.

That's the framework. 7 dimensions. Each one a surface where your agent can silently fail.

But knowing where things can break is only half the picture. The other half is knowing how they break — the specific failure modes, what they look like at 3 AM, and how to defend against each one.

In Part 2, I break down all 5 silent failure modes — with real-world case studies (including an agent that deleted a production database in 9 seconds) and the exact defenses for each.

This post is based on and extends the resilience framework from the AWS Architecture Blog: Build Resilient Generative AI Agents.

📺 More from the series:

Have you seen your agent "drift" without any metric catching it? How long before someone noticed? Drop it in the comments — I'll respond to every one.

Top comments (2)

Scarab Systems • May 27

This “green-dashboard failure” framing is exactly the part of agent drift that feels most important to me.

I’m building a diagnostic suite around the repo-side version of this problem. In codebases, agent drift often does not look like a crash. It looks like success: the agent completes the task, the tests may pass, but the repository is quietly more disordered afterward.

That drift shows up as bloated files, unrelated files touched, scope creep, duplicated helpers, stale scaffolding, cosmetic modularity, missing verification, or local patches that solve one surface while creating inconsistency somewhere else.

One thing I think matters is giving agents persistent repo-local operating guidance, not just a prompt. The repo needs documents and standards that define its baselines: approved patterns, forbidden shortcuts, canonical files, verification commands, architectural boundaries, and what counts as done. Otherwise the agent can optimize for the immediate request while drifting away from how the repo is actually supposed to operate.

The diagnostics I'n writing are my attempt to make that aftermath inspectable. The concept is simple: after agent work, the repo should be able to answer what changed, what was allowed to change, whether verification actually ran, whether the diff stayed inside the task boundary, whether entropy increased, and whether the repo still matches its own baseline/truth.

So I think this framework maps really well to AI-assisted development. Agents do not always “crash” in the repo either. Sometimes they finish the task and leave behind drift that no normal green check catches.

Daniel Pokorný • Jun 11

The idea of green-dashboard failures is fascinating. I wonder if recommendation systems have a similar problem. A model can continue producing recommendations that look reasonable while the underlying selection logic slowly drifts away from what users actually need. The outputs still look good. The decisions become less trustworthy.