DEV Community

Cover image for What Happens When an AI Agent Goes Rogue
Nnaa
Nnaa

Posted on • Originally published at truthlocks.com

What Happens When an AI Agent Goes Rogue

Originally published on Truthlocks Blog

It is 2:47 AM on a Tuesday. Your monitoring dashboard lights up. One of your AI agents, a customer service bot that normally handles 200 requests per hour, has suddenly spiked to 5,000 requests per hour. It is accessing customer records it has never touched before. It is making API calls to services outside its normal workflow. Something is very wrong.

What do you do?

If your agents use shared API keys, your options are limited and painful. You can rotate the API key, but that kills every agent using that key, not just the one that is misbehaving. You can try to identify and block the specific container or IP address, but in a containerized environment with dynamic networking, that is whack a mole. You can shut down the entire service, but that takes everything offline.

None of these options are good. They are all too slow, too broad, or too destructive.

The Kill Switch

With machine identity, you have a better option: the kill switch. One click in the console, one API call from your code, or one automated policy trigger, and the specific agent's identity is revoked. Here is what happens in the next few seconds:

The agent's DID is marked as revoked in the trust registry. This is an atomic operation that takes milliseconds.

All active sessions for the agent are immediately invalidated. Any system that validates the agent's session token will get a "revoked" response on the next check.

A revocation event is broadcast through the transparency log's real time synchronization channel. Connected systems receive the revocation notification within seconds.

The revocation is recorded in the transparency log with the timestamp, the identity of whoever triggered it, and the reason. This creates a permanent, tamper evident record for post incident investigation.

Every other agent in your environment continues operating normally. The kill switch is surgical. It targets one identity, not a credential shared by many.

Triggering the Kill Switch

There are three ways to trigger the kill switch, designed for different scenarios.

Manual activation from the console. A security analyst sees something suspicious and clicks the kill switch button on the agent's detail page. The analyst provides a reason, confirms the action, and the revocation executes immediately. This is the human in the loop option for situations where judgment is needed.

API activation from code. Your monitoring system detects anomalous behavior and calls the revocation API programmatically. This enables integration with your existing incident response automation.

await client.agents.revoke(agentId, {
  reason: 'anomalous_behavior_detected',
  triggeredBy: 'automated_monitoring',
});

Enter fullscreen mode Exit fullscreen mode

Policy based automatic activation. You define rules in advance: "if any agent's trust score drops below 20, revoke immediately" or "if any agent exceeds 10 scope violations in an hour, revoke immediately." The platform evaluates these rules continuously and triggers the kill switch automatically when conditions are met. This is the fastest response option because there is no human delay.

After the Kill Switch

Stopping the agent is step one. Understanding what happened is step two.

The transparency log contains a complete record of the agent's activity leading up to the revocation. Security teams can query the log by agent DID and time range to see every action the agent took: which APIs it called, which data it accessed, which scopes it used, and at what timestamps.

The trust score history shows when the agent's behavior started deviating from its baseline. The individual trust factors pinpoint which dimension changed: was it scope violations? Anomalous request patterns? Session management issues? This narrows the investigation to the specific behavioral change that triggered the concern.

If the agent was compromised through a prompt injection attack, the log entries show the moment the agent's behavior shifted. If a key was stolen, the authentication patterns may reveal unusual source characteristics. If it was a software bug, the trust factor breakdown will show which aspect of behavior changed.

Recovery

Once the root cause is identified and remediated, you can register a new agent identity with fresh cryptographic keys. The old identity remains permanently revoked in the trust registry. This is by design: revocation is irreversible because you can never be fully certain that a compromised identity is clean. A new identity is always safer.

The new agent starts with a fresh trust score and builds its reputation from scratch. This is the trust model working as intended: a new agent should not inherit the trust of an old agent, because the old agent's history includes the period when it was compromised.

Building Your Response Plan

Every organization deploying AI agents should have an incident response plan that covers agent compromise scenarios. At minimum, the plan should define who has authority to trigger the kill switch, what thresholds trigger automatic revocation policies, what the investigation workflow looks like after a revocation, and how recovered agents are re deployed with new identities.

The kill switch is not just a feature. It is the last line of defense in your agent security architecture. Everything else, from trust scores to scope enforcement to behavioral monitoring, is designed to prevent the situation where you need to use it. But when you do need it, you need it to work instantly and decisively.

Configure your kill switch policies in the Truthlocks Console under Agent Management. The documentation includes a complete guide to incident response for AI agent compromises.


Truthlocks provides machine identity infrastructure for AI agents. Register, verify, and manage non-human identities with trust scoring and instant revocation.

Top comments (0)