DEV Community: Sathananthan S

Building Enterprise-Ready AI Agents with Guardrails and Human-in-the-Loop Controls

Sathananthan S — Tue, 24 Mar 2026 08:36:45 +0000

A few months ago I wired up an AI agent for an internal procurement workflow. The agent was supposed to review purchase requests, check them against spending policies, and either approve or escalate. It worked great in testing. In production, it approved a $40,000 software license that should have gone to a manager for sign-off, because the policy document it was referencing had been updated the day before and the agent's retrieval still had the old version cached.

Nobody caught it for two days. The agent was confident. The output was well-formatted. The approval email looked like every other one.

That's when it clicked for me: building the agent is the easy part. Making it safe enough to trust with real business decisions is a completely different problem. This post walks through how I think about guardrails and human-in-the-loop controls for agents that need to operate in enterprise environments.

How an Agent Actually Works (the Short Version)

If you haven't built one yet, here's the mental model. An agent runs in a loop.

Something triggers it. Could be a customer email, could be a flagged transaction. The agent reads that input and goes looking for context. It might hit a database for the customer record, pull a policy document from a vector store, or check what happened the last time this customer filed a similar request. Based on all of that, it decides what to do.

Sometimes the first thing it decides is that it needs more information, so it calls a tool to get it. Other times it's ready to act right away. Either way, at some point it does something: calls an API, updates a record, sends a notification. That's the moment the agent stops thinking and starts changing things in the real world.

Then it looks at what happened. Did the API return what it expected? If yes, it moves on. If the response is weird or the call failed, it goes back to the reasoning step and tries something else. On my procurement agent, a single purchase request would sometimes go through this loop five or six times before the agent was satisfied it had the right answer. Each pass through the loop is another chance to catch a mistake, but also another chance to introduce one.

Where Guardrails Fit In

Guardrails are constraints you put around the agent's behavior. They're the rules that say "you can do this, but not that" or "you can do this, but only if these conditions are met." Without them, you're trusting the LLM's judgment on everything, which, if you've worked with LLMs for more than a week, you know is a bad idea.

The way I think about it: guardrails go in three places.

Before the agent starts reasoning, you validate the incoming request. Is it well-formed? Does it contain prompt injection attempts? Should the agent even be processing this? I had a case where someone embedded hidden instructions in a support ticket. Without input validation, the agent would have followed those instructions as if they were legitimate. Catching that at the gate is the cheapest defense you'll ever build.

In the middle, you constrain what the agent is allowed to decide. Spending limits are the simple version: the agent can approve purchases under $5,000 on its own, anything above that gets routed to a human. But it extends to other boundaries too. Which APIs can the agent call? What data can it access? Some teams maintain these constraints as versioned config files so there's a clear audit trail when policies change.

After the agent produces its output, you check the work before it reaches anyone. Did the response leak PII? Does the answer contradict the source documents the agent was supposed to rely on? Is the agent claiming something it doesn't have evidence for? This last check is what catches those confident-but-wrong responses that make everyone nervous about deploying agents in the first place.

Here's a pseudo-flow for a procurement agent with all three layers:

request comes in

→ validate format, scan for injection
→ agent reasons: pull policy docs, check budget, evaluate request
→ check: amount > $5,000? → route to human
→ agent acts: call approval API, update records
→ verify: response matches policy, no PII leaked
→ send confirmation

Think of each check as middleware. The agent doesn't skip them. They're part of the execution flow.

Human-in-the-Loop Isn't One Pattern, It's Several

"Human-in-the-loop" gets thrown around as if it's a single concept. In practice, I've used at least four different patterns, and picking the wrong one for your situation creates either unnecessary bottlenecks or insufficient oversight.

The most common is approval gates. The agent does its analysis, prepares a recommendation, then pauses and waits for a human to sign off before executing. This works for high-stakes actions: transactions above a threshold, production deployments, customer-facing communications. The problem is latency. If your approver is in a meeting, the workflow sits there.

Escalation routing is different. The agent doesn't pause on every action. The agent runs on its own until it hits something confusing. Maybe the request doesn't match any pattern it knows. Maybe two internal policies say opposite things. At that point, the agent stops, writes up what it found and what it's stuck on, and hands the whole thing to a person. Ideally the person's resolution gets recorded and used to improve the system for next time, though in my experience that feedback loop is the part most teams build last.

At scale, sampling review is what I've found most practical. The agent processes everything on its own, but 5-10% of completed interactions get routed to a human reviewer after the fact. The reviewer scores them and flags anything off. This is how you catch drift over weeks and months without slowing down every individual transaction. A global payments organization used this approach after deploying agentic systems in customer support. Operational workload dropped by nearly 50%, but the review loop kept running to catch quality issues before they compounded.

There are also real-time override controls, where a dashboard shows the agent's reasoning live and a human operator can pause or redirect if things go sideways. This is expensive to staff. Most teams use it during the first couple weeks of a new deployment and then dial it back.

Start with approval gates on everything, then gradually move actions into autonomous mode as you build confidence through sampling. This graduated approach is how you turn a cautious pilot into a production system without anyone losing sleep over it.

Failure Modes That Guardrails Actually Catch

The procurement agent story I opened with is a good example of policy drift. The agent was referencing a cached policy document that had been superseded. It made decisions based on rules that no longer applied, and the outputs looked perfectly normal. Input-level checks on document freshness would have caught it. I didn't have those yet. Lesson learned.

Hallucinated tool calls are another one I've hit. The agent decides to call an API that doesn't exist, or passes arguments that don't match the expected schema. If you validate tool calls against a registry of available tools and their schemas before execution, you catch this. If you don't, the agent sends garbage to your production APIs and you find out from your ops team at 2 AM.

I also set a hard step limit on every workflow now. Usually 8, sometimes 10 depending on the complexity. Before I started doing this, I had an agent on a document processing pipeline that got stuck in a retry loop for 45 minutes. It kept failing on the same API call, backing off, trying again, burning credits the whole time. We didn't notice until the bill spiked. Step limits are boring. They also save you from that kind of thing.

And then there's the groundedness problem: the agent produces a well-formatted answer that sounds right but doesn't actually come from the retrieved documents. It's fabricated. Output checks that verify whether the answer traces back to a source document catch this before it reaches a customer. Those checks add latency, maybe 200-300ms per response, but compared to the cost of sending fabricated information to a customer, it's nothing.

Practical Advice for Getting Started

What’s worked best in my experience is starting with a single workflow that has well-defined inputs, manageable volume, and safe-to-reverse outcomes if something goes wrong. Avoid tackling your most critical process right away. You need space to experiment, make errors, and learn from them.

This might seem counterintuitive, but set up your guardrails before writing any agent logic. Establish your policy limits, approval checkpoints, and output validation criteria first. Design the agent to work within those boundaries from the start. If you build the safety mechanisms up front, they become part of the foundation. Every team I’ve seen skip this ends up scrambling to bolt on guardrails after their first production issue, and that always leads to more headaches.

Logging deserves its own mention. You want to be able to reconstruct exactly what the agent did: which tools it called, what those tools returned, and how the agent interpreted the results before deciding its next move. When something breaks (and it will, probably during a demo), that trace is the only thing standing between you and "I have no idea why it did that." A UK automotive manufacturer that went from 200+ RPA bots to intelligent automation learned this early. Process mining surfaced over £10M in procurement problems nobody knew about, and regulatory doc processing costs dropped 80%. None of that would have been possible without visibility into what the systems were actually doing. Visibility came first. The savings followed.

Start with tight autonomy. Approval gates on everything feels slow, and it is, but it builds the dataset you need to evaluate agent quality. Once you have a few hundred reviewed interactions showing the agent gets it right 95%+ of the time, you can start removing gates on low-risk actions with actual evidence behind the decision.

The agents are getting better fast. But "better" doesn't mean "safe to deploy without guardrails". The teams I've seen ship reliable agent systems are the ones that treated safety as architecture from day one.

Most automation breaks when something unexpected happens. AI agents do not. They observe, adjust, and move toward the goal. I broke down how this works in real enterprise workflows. #EnterpriseTech #AIAgents #DigitalTransformation

Sathananthan S — Thu, 19 Feb 2026 07:41:36 +0000

AI Agents Explained: How They Automate Enterprise Workflows

Sathananthan S ・ Feb 18

#ai #machinelearning #automation #saas

AI Agents Explained: How They Automate Enterprise Workflows

Sathananthan S — Wed, 18 Feb 2026 12:27:30 +0000

Last year I spent a few months helping a team wire up an AI agent to their claims processing pipeline. The first version was embarrassingly simple. It was a loop that read a claim, called three APIs, and decided what to do next based on the responses. It looked nothing like the diagrams in the "agentic AI" blog posts I'd been reading.

But it worked. And the reason it worked is that AI agents aren't actually that exotic if you've spent time building with APIs. You already know how to call a function, read the response, and branch based on what comes back. An agent does the same thing. The difference is that the branching logic is handled by an LLM instead of your if/else statements. The model reads the result, decides what to call next, and keeps going until the job is done or it runs out of steps.

That's the whole concept. Everything below is about what that looks like when you're building it for real: the loop, the moving parts, and the things that go wrong when you move from a prototype to an actual enterprise workflow.

The Loop
A chatbot takes an input and returns a response. One shot. An agent takes a goal and works toward it over multiple steps.

Here's the pattern, called ReAct (Reason + Act):

Goal: "Process this refund request"

Thought: I need the customer record. Let me look it up by ID.
Action: customer_lookup(id="12345")
Result: { error: "not found" }

Thought: ID didn't work. The original request has an email. I'll try that.
Action: customer_lookup(email="jane@example.com")
Result: { id: "67890", name: "Jane Smith", status: "active" }

Thought: Got the record. Now I can process the refund.
Action: issue_refund(customer_id="67890", amount=49.99, reason="defective item")
Result: { success: true, transaction_id: "TXN-882" }

Done.

Notice what happened on the second step. The lookup failed, and the agent didn't crash or throw a 500. It looked at the error, decided to try the email field instead, and continued. That's the whole point. RPA follows a fixed script. If step 2 breaks, the process breaks. An agent reasons through the failure and tries something else.

This loop (think, act, observe, adjust) repeats until the goal is met or a step limit stops it. Everything else in this article is built on top of that loop.

What An Agent is Made Of

Four pieces. Every framework organizes these differently, but they're always there.

Persona. The system prompt. It tells the agent what it is, what it can do, and what it must never do. Think of it as a job description combined with a policy manual:

You are a refund processing agent. You may look up customer records
and issue refunds up to $500. You must NEVER delete customer accounts.
You must ALWAYS confirm the refund amount before executing.

If you skip this, the agent will improvise. If it has access to a delete_account tool and no rule against using it, it can and eventually will call it. The persona is where you set the guardrails.

Memory. Two kinds. Short-term memory is just the conversation context: what's happened so far in this session, held in the LLM's context window. Long-term memory is external: a vector database, a knowledge graph, a regular Postgres table. When the agent needs information beyond the current session (customer history, compliance rules, product specs), it queries long-term memory. This is how RAG works in practice: the agent pulls relevant context from your data before it reasons.

Planning. The reasoning engine. This is where the ReAct loop lives. The agent breaks a goal into steps, executes them, and adjusts as it goes. More advanced patterns exist (Plan-and-Solve generates the full plan upfront; Tree-of-Thought explores multiple paths before committing), but ReAct handles most enterprise use cases fine.

Tools. Functions the agent can call, defined with typed schemas:

Tool: issue_refund
Description: Issues a refund to a customer's original payment method.
Parameters:

customer_id: string (required)
amount: number (required, max 500)
reason: string (required) Returns: { success: boolean, transaction_id: string }

Here's something most people learn the hard way: the quality of your tool schemas matters more than which model you use. The agent picks tools based on their descriptions. Vague description → wrong tool selected. Missing parameter constraint → runtime error. Teams that build reliable agents spend most of their development time on schema definitions, and it shows.

Single Agent in Action: Document Processing

A compliance team gets hundreds of regulatory filings per week. Each one needs to be classified, checked against policies, and routed to the right reviewer.

Filing arrives (PDF)

agent extracts text, classifies document type
agent checks text against compliance ruleset
result: non-compliant, missing disclosure section
agent routes to senior reviewer with findings attached, priority high

Four steps. The agent handled classification, policy checking, and routing. Those were tasks that used to involve three different people and a shared spreadsheet. And when a filing comes in with a format the agent hasn't seen before, it doesn't silently misclassify. It flags the uncertainty and escalates.

This is the same pattern behind a pharmaceutical audit analytics system that Ciklum built to process over 400,000 audit findings. Manual categorization had been error-prone, and those errors were undermining leadership decisions. The ML pipeline replaced it with automated, context-driven tagging. Every tag was traceable back to the original data.

Multiple agents: lead-to-cash automation

Enterprise processes almost never fit inside one agent. A lead-to-cash workflow spans demand generation, quoting, order fulfillment, and invoicing. Different data sources, different rules, different teams.

Multi-agent systems handle this the way microservices handle a monolith: by splitting responsibilities.

Orchestrator
├── Demand Agent → qualifies leads, scores opportunities
├── Quote Agent → generates pricing, checks inventory
├── Fulfillment Agent → triggers provisioning, tracks delivery
└── Invoice Agent → generates invoices, monitors payment

The orchestrator holds the workflow state and decides which agent runs next. Each sub-agent handles its own domain, calls its own tools, and reports back. When the Quote Agent can't find pricing for a custom configuration, it doesn't guess. It escalates to the orchestrator, which routes the exception to a human.

I have seen Ciklum, a leading AI-powered Experience Engineering firm, help a cloud computing company redesign its entire lead-to-cash pipeline using this pattern. The system combined 40+ automation bots, intelligent document processing, and process mining into a coordinated pipeline. The company serves 100,000+ enterprise customers. At that scale, a single-agent approach wouldn't hold.

What Breaks and How to Prevent It

Here's where the gap between demo and production shows up.

Agents call tools that don't exist. Or they pass the wrong argument types. This happens more often when tool descriptions are vague. Fix: validate every tool call against the schema before executing it. Feed validation errors back to the agent so it can self-correct.

Agents get stuck in loops. A ReAct agent that gets a confusing observation can retry the same action endlessly, or bounce between two actions without progress. Fix: set a max_steps limit. 10–15 steps works for most workflows. If the agent hits the ceiling, it escalates to a human instead of spinning.

Agents trust bad input. Indirect prompt injection is a real risk in enterprise settings. A malicious instruction hidden in a document (for example, white text on white background or a comment in a PDF) can redirect the agent's behavior. Fix: treat all external content as untrusted. Scan it with a separate model or classifier before passing it to the agent.

Context windows overflow. Long-running workflows accumulate token history that degrades the model's attention and inflates cost. Fix: prune the context. After each major step, summarize what's done and drop the raw history. The agent works from the summary plus the current step.

Nobody can explain what happened. When an agent makes a bad decision, standard application logs (200 OK, 43ms response time) tell you nothing about why. You need the full reasoning trace: the system prompt, the input, each thought-action-observation cycle, and the final output. Without this, debugging agent behavior is guesswork. Connecting AI to your business systems through standardized protocols like MCP helps here because logging becomes consistent across every data source instead of needing per-connector instrumentation.

Getting started without Over-Engineering

If you're building your first agent, start small. Pick one workflow that currently involves a human doing the same sequence of steps repeatedly. Map out the tools that workflow needs (probably 3–5 API calls). Write the tool schemas with obsessive detail. Set a strict persona. Set a step limit. Wire up the ReAct loop and see what happens.

Don't start with a multi-agent system. Don't start with a complex orchestration layer. Get one agent working reliably on one workflow, understand where it fails, and expand from there.

The engineering discipline is familiar even if the technology feels new. Define clear interfaces. Handle errors by feeding them back into the system. Set boundaries. Validate outputs. The same instincts that make you a good API developer make you a good agent developer. The twist is that you have a probabilistic reasoning engine in the middle of your control flow now, so you have to plan for the cases where it's confidently wrong.