Paul Okhrem on what we learned deploying AI agents inside Elogic Commerce

#webdev #ai #programming #productivity

AI agents are having a moment. Every platform has them now, or claims to. Every conference has a track on them. Every vendor deck includes a slide that says "agentic AI" above a diagram with a lot of arrows.

We've been building and deploying them inside Elogic Commerce for client projects over the past year. Here's what we actually learned — including the parts that didn't work the way we expected.

What we mean by "AI agents" in this context

The term gets used loosely, so let me be specific. In our implementations, an AI agent is a system that can take a sequence of actions — not just generate a response — to complete a task. It can query systems, make decisions, execute operations, and loop until it reaches a defined outcome or hits a defined limit.

In a commerce context, this might look like: an agent that monitors incoming RFQs, looks up the requester's account history, checks current stock and pricing, drafts a quote, routes it for approval if it exceeds a threshold, and sends it when approved. Each step involves a decision. The agent handles the sequence. A human approves at the right moment.

This is meaningfully different from a chatbot that generates a response. The distinction matters because the failure modes are different — and because the value proposition is different.

Where agents worked well

Routine but multi-step operational tasks. The strongest use cases we found were tasks that were well-defined, happened frequently, and required pulling from multiple systems. Quote generation, order exception handling, supplier communication drafts — these fit the pattern. The agent does the legwork; the human does the judgment.

Tasks with clear success criteria. Agents work best when you can define "done" precisely. A quote is done when it has a price, a lead time, and has been routed to the right approver. The agent can check these conditions. Vaguer tasks — "improve this customer relationship" — are not good agent territory.

Reducing the cost of scaling operations. One client was handling a significant increase in order volume. Rather than hiring proportionally, they used agents to handle the routine exception queue — orders flagged for address mismatches, payment holds, stock discrepancies — while their team focused on exceptions that required actual judgment. The agent resolved roughly 60% of the queue automatically. The team handled the rest with better context than before.

Where agents didn't work the way we expected

Reliability degrades with sequence length. Each step in an agent's chain introduces potential for error or unexpected state. A five-step workflow is more reliable than a twelve-step one — not linearly, but noticeably. We ended up deliberately breaking some complex workflows into shorter sub-agents with human checkpoints between them, which felt less elegant but was significantly more reliable in production.

Context management is a constant problem. An agent working through a multi-step task needs to carry relevant context across steps. When that context gets large, or when the task requires information from earlier steps that's no longer in the active window, things go wrong in ways that are hard to debug. We spent more time on context architecture than we expected — what information to carry forward, what to summarize, what to re-query.

Error handling has to be designed explicitly. When a human does a multi-step task and hits an unexpected state, they pause and figure it out. An agent will attempt to continue unless told otherwise. Without explicit error handling — defined behavior for when a system returns unexpected data, when a threshold is exceeded, when a required input is missing — agents fail in ways that can be hard to detect and expensive to clean up. We learned this the hard way early on and now treat error state design as a core part of every agent specification.

Human-in-the-loop is not optional for anything consequential. There's a temptation to automate fully and add oversight later. In our experience, the right time to design the oversight is before deployment, not after an incident. For any agent that touches orders, pricing, customer communication, or financial records, we now require defined checkpoints where a human reviews before the agent proceeds to irreversible steps.

The organizational pattern that worked

The most successful agent deployments shared an organizational pattern that wasn't obvious at the start.

They had an owner who understood both the operational domain and the AI system — not deeply technical, but able to read logs, recognize when an agent was doing something unexpected, and articulate what "correct" looked like. This person wasn't the AI engineer. They were the operations manager or team lead who was willing to become fluent in how the agent worked.

Without this, agent systems tend to drift. They get deployed, they mostly work, nobody watches them closely, and over time the edge cases accumulate. With an operational owner who's monitoring actively, issues get caught early and the system improves over time.

Where this is going

The agent implementations that are working today are mostly narrow: well-scoped, well-defined tasks in specific operational contexts. The pitch for autonomous agents that can handle broad, open-ended work is not what we're seeing in practice — at least not reliably, at least not yet.

What we are seeing is that the narrow implementations compound. Each time you automate a well-defined routine task, you free up the team's capacity for the next layer of judgment-intensive work. The value isn't one big agent that does everything. It's a steady expansion of what can be handled automatically, making space for humans to operate at a higher level.

That's a less dramatic story than the one vendors are telling. But it's what's actually happening.

Paul Okhrem is a co-founder at Elogic Commerce and advises companies on AI strategy. More at paul-okhrem.com