`AI agents are no longer just generating text. They're sending emails, pushing code, updating CRM records, and modifying databases.
And when they go wrong, they really go wrong.
I've seen this pattern repeatedly: an agent works perfectly in testing, gets deployed, and then sends 200 emails to the wrong list. Or deletes the wrong GitHub issues. Or overwrites 3 months of CRM data.
The model didn't fail. The prompt was fine. There was just no safety net.
The problem isn't the agent. It's the execution layer.
Most teams handle this with logging. They add Langfuse or Helicone, watch the traces, and hope they catch mistakes before they happen.
But logging tells you what went wrong after it happened. What you actually need is the ability to undo it.
What reversible execution looks like
The core idea is simple: before any action executes, you log it. After it executes, you store enough information to reverse it. If something goes wrong, you unwind in LIFO order.
For every connector, you need a compensation handler, a function that defines what "undo" means for that specific action:
`typescript
// Email sent: can't unsend, but can send correction
compensate: async (action) => {
await sendCorrection(action.payload.to, action.payload.subject)
await flagThread(action.payload.threadId)
}
// CRM record updated: revert to snapshot
compensate: async (action) => {
await crm.records.update(action.payload.recordId, action.snapshot.before)
}
// GitHub issue created: close it
compensate: async (action) => {
await octokit.issues.update({
issue_number: action.result.number,
state: 'closed'
})
}
`
Compensation isn't symmetric. "Undo send email" is not the same as "delete sent email." The action already had consequences. So each handler has to be action-aware, not generic.
Approval gates for high-risk actions
Not every action needs rollback. Some need prevention.
The pattern that works: define a risk threshold per action type. Actions above the threshold pause and wait for human approval before executing.
typescript
const session = await agentrein.newSession({
agentId: 'email-agent',
intent: 'Send follow-up emails to leads from last week',
approvalRules: [
{ action: 'gmail.send', requireApproval: true },
{ action: 'gmail.draft', requireApproval: false }
]
})
The audit trail problem
Even with rollback and approval gates, you need to know why the agent took each action, not just what it did.
Most logging tools capture the API call. What you actually need is the intent at the time of execution: what was the agent trying to accomplish, and did this action match that goal?
What we built
We built AgentRein to solve exactly this, a drop-in SDK that wraps your existing tools and adds rollback, approval gates, and audit logs.
`typescript
import { AgentRein } from 'agentrein'
import { Octokit } from '@octokit/rest'
const agentrein = new AgentRein({ apiKey: 'YOUR_API_KEY' })
const octokit = new Octokit({ auth: 'GITHUB_TOKEN' })
const session = await agentrein.newSession({
agentId: 'onboarding-agent',
intent: 'Create GitHub issue and notify Slack'
})
const agentOctokit = agentrein.wrap(octokit, session, { connector: 'github' })
const issue = await agentOctokit.issues.create({
owner: 'my-org',
repo: 'my-repo',
title: 'Onboarding task'
})
await agentrein.completeSession(session)
`
Pre-built compensation handlers for GitHub, Stripe, Slack, Gmail, Notion, HubSpot. Free tier available.
If you're running agents in production that touch real systems, I'd love your feedback: agentrein.com`
Top comments (2)
This is a very practical framing. Once agents move from answering questions to changing CRM records, sending emails, or triggering workflows, undo/compensation becomes part of the architecture, not a nice-to-have. I also think rollback pairs naturally with execution tracing because teams need to understand which step caused the compensating action. Curious how you’re thinking about representing the full action timeline for debugging and audit review.
Exactly right. The timeline is the core of how we represent this in AgentRein. Every action gets a node in the session graph: timestamp, intent at the time of execution, payload, result, and compensation status. When a rollback happens, the graph shows which step triggered it and what the compensation did.
The insight you're pointing at is important: rollback without tracing is blind. You need to know not just "action X was compensated" but "action X was compensated because action Y failed, which happened because the agent drifted from its original intent." That chain is what makes post-incident review actually useful.
What's your current approach for action tracing in production agents?