Your AI Agent Just Changed a Financial Record. Who Stopped It?

#aisafety

Your finance team has been running an AI agent to help with month-end close. It identifies exceptions, pulls evidence from multiple systems, and drafts commentary. The pilot went smoothly. Then one day, without warning, the agent posts a material adjustment that should never have been executed without a manager's review. The financial statements shift. Panic follows.

This isn't a story about a bad model. The model worked perfectly. The problem was control: no mechanism stopped the agent before it took an action that permanently changed business state.

This is the question every company must answer before giving an agent access to production systems: how do you prevent the wrong action before the damage occurs? Observability can only see and explain after something happens. To prevent before it happens, you need three components working together: guardrails, a policy engine, and a human approval workflow.

Three layers of control that work together at runtime, not just in documentation.

Guardrails Are Not Just Output Filters

The most common mistake is treating guardrails as a content filter at the end of the process: the model generates a response, and the system checks if it's safe. That might work for a simple chatbot. For agentic systems, it's too late. If the agent has already accessed a document it shouldn't have, called the wrong tool, or executed an action that changes a transaction, filtering the final output solves nothing.

In practice, enterprise guardrails need to work at five points:

Input. Check what the user or triggering event is asking. Is the intent aligned with the agent's use case? Is the request trying to bypass official processes? In procurement, a requester shouldn't be able to create a purchase order directly if the process requires intake and category classification first.

Context retrieval. Control what documents, data, and memory the agent can access. A finance agent can pull relevant accounting guidance, but not all sensitive cross-entity memos. A customer service agent can see the current customer's ticket history, but not another customer's data just because it's semantically similar.

Tool access. Not every available tool should be usable in every situation. An IT operations agent can run diagnostic tools and open tickets, but shouldn't automatically execute production changes. A customer operations agent can check entitlements and prepare a refund, but shouldn't execute refunds above a certain threshold.

Action execution. This is the most critical point. Does the action change business state? Creating a new vendor, posting a journal entry, modifying a credit limit, releasing a payment block, closing an incident as resolved—all of these need controls. This is where companies must clearly distinguish between read, recommend, draft, and execute.

Output. Only after the four points above does output filtering remain relevant. It prevents data leaks, ensures appropriate language, and checks that the final response is supported by evidence. But it must be understood as the last layer, not the primary guardrail.

The Policy Engine: Where Permission Decisions Live

If guardrails are the control points, the policy engine is the decision maker at runtime. It answers questions like: can this agent call this tool, in this user or workflow context, for this business object, at this transaction value, with this risk level, and does it need human approval before proceeding?

Without a policy engine, controls end up scattered across prompts, application code, tool configurations, and team habits. The result is inconsistent and hard to audit.

For enterprise use, policy decisions typically need to consider several factors together: the agent's role and delegated authority, the business context (vendor, invoice, order, ticket, contract, employee data), the transaction value or materiality, the risk level (reversible or not, local or cross-system impact), and any regulatory or compliance requirements.

Not all policies need to be built the same way. Deterministic rules work best for clear, rigid conditions: transaction values above a threshold, specific vendor categories, production changes during certain hours, or sensitive data that must never be accessed. They're easy to audit, test, and explain, but they become unwieldy when business contexts vary widely.

For more ambiguous situations, a model-based classifier can assess request sensitivity, case risk level, fraud likelihood, or whether the user's intent falls outside scope. It's more flexible but harder to explain, needs periodic evaluation, and shouldn't be the sole control for high-risk actions.

The healthiest pattern is usually a combination: the classifier assesses context or risk signals, then deterministic rules make the final decision. In customer operations, a classifier might flag a case as sensitive or potentially disputed, then deterministic rules decide that all sensitive cases or those above a certain value must go to approval.

One essential principle: every policy decision must leave an auditable trail. The company should be able to explain which policy was evaluated, what context was used, the result (allow, deny, escalate, or require approval), and when the decision was made. When a user asks why the agent refused an action, the team shouldn't answer "because the system said no." They should show the logic and context.

Human Approval: Selective, Not Automatic

In an agentic enterprise, human-in-the-loop doesn't mean humans check everything. That would destroy the value of agentic AI. What's needed is a selective, risk-based approval workflow.

Human approval is typically needed when an action is high-value, sensitive, irreversible or difficult to reverse, or regulated. This isn't a sign of agent failure. It's a sign that the company understands the boundaries of autonomy in a healthy way.

Some patterns that almost always warrant approval: transactions above a materiality threshold, changes to critical master data, decisions affecting employee rights, customer actions with dispute potential, high-risk production changes, and decisions requiring formal professional judgment.

The most common mistake is creating an approval workflow that simply sends a notification: "Agent recommends action X. Approve?" This is terrible. The reviewer is confused, needs to open multiple systems, or ends up approving blindly out of fatigue. A healthy approval workflow gives the reviewer sufficient context: the agent's recommendation, the evidence used, the relevant policies, the key risks, the confidence level or escalation reason, and alternatives if any.

A supervisor receiving a refund approval request shouldn't just see the refund amount. They need the customer's history, the refund reason, the applicable entitlements, whether similar cases have occurred before, any abuse signals, and why the agent didn't execute automatically. With this context, approval becomes a meaningful decision, not a formality.

But there's an equally important trade-off: if too many cases go to approval, cycle time worsens, supervisors become bottlenecks, users lose trust, and the agent becomes a queue-making machine. Approval thresholds should be designed based on risk tiers, not excessive caution. A healthy approach typically looks like: low risk executes with monitoring, medium risk executes with post-review or sampling, high risk requires approval, very high risk stays human-led with agent assistance only.

Escalation and Rollback: Knowing When to Stop

A good agent knows not just when to act, but when to stop. Escalation is needed when the agent faces conditions like low confidence, conflicting data sources, policy ambiguity, inconsistent tool results, or situations outside its defined scope. In these conditions, the correct behavior isn't "keep trying until it works." It's to stop, explain the reason, and hand off to a human or another workflow.

For certain actions, control doesn't end with approval. Companies also need to think about what happens if the agent's action turns out wrong. Three common patterns exist: rollback if the system supports direct reversal, compensation action if the action can't be directly undone, and manual remediation for more complex cases where a clear path is needed for who takes over, how the incident is logged, and how the learning feeds back into policies or guardrails.

Without a rollback or remediation path, organizations tend to either become too afraid to grant autonomy or, conversely, too confident without a safety net.

What This Means in Practice

The most practical way to close this discussion is with an autonomy matrix. Not every use case should operate at the same level:

Assist: The agent only helps find context, summarize, or provide insights. Best for ambiguous domains, unstable data, or processes that still heavily depend on human judgment.
Draft: The agent prepares recommendations, documents, or actions, but humans still execute. Best for early transformation phases, domains with high control needs, or processes that need acceleration without execution rights.
Execute with Approval: The agent can prepare and execute actions after human approval. Best for high-value actions, regulated workflows, or areas needing formal control evidence.
Execute with Monitoring: The agent executes automatically within clear policy boundaries, monitored through observability and sampling. Best for high volume, low-to-medium risk, reversible actions, and domains with mature policies.

This matrix helps companies avoid two extremes: granting full autonomy too quickly, or keeping agents in assist mode long after the process is ready for more.

The next time your finance team's agent reaches for a material adjustment, you'll know exactly what should stop it—and whether your system is ready.

This article was originally published on ariefwara.github.io.