Kevin

Posted on Jun 10

Stop Building AI Agents. Build Workflows With AI Steps Instead.

#ai #agents #automaton #coding

TL;DR: Half the "AI agents" in production are expensive, fragile reimplementations of workflows. If you know the sequence of steps your business process needs, you do not need an agent. You need a state machine with LLM calls in the right places. Agents are for genuinely open-ended problems. Everything else is a workflow in disguise.

I have spent the last year watching companies build "AI agents" that are really just if/else chains wrapped in a chat loop. They spend six figures on prompt engineering to make a workflow reliable when a graph with three nodes and one LLM call would have done the job for fifty bucks a month.

The agent hype made us forget a basic engineering principle: when you know the steps, do not use a system that has to figure them out.

The Real Test: Do You Actually Have an Open-Ended Problem?

An agent is the right tool when the sequence of steps cannot be known in advance. "Book me the cheapest flight to Berlin next week" is an agent problem. The number of sub-steps is large, the data sources are varied, and the optimal path changes with context.

"Extract the invoice number, date, and total from this PDF, validate it against our PO database, and route it to accounting if the total is under 5,000 EUR" is not an agent problem. The steps are fixed. The data sources are known. The decision is a single boolean.

I see teams building the second case as an agent. They give it tools, a system prompt, maybe a ReAct loop, and they pray. Then they spend weeks debugging why it sometimes emails the wrong person or hallucinates a non-existent PO number.

What a "Workflow With AI Steps" Actually Looks Like

Here is the structure I push clients toward. It is not novel. It is just what we should have been building all along:

[Trigger: new document uploaded]
    ↓
[Step 1: extract_text  (deterministic PDF library)]
    ↓
[Step 2: llm_extract  (structured output, schema = invoice fields)]
    ↓
[Step 3: db_lookup    (PO matching, deterministic SQL)]
    ↓
[Step 4: if-else on threshold → [approve] or [human_review]
    ↓
[Step 5: signed_action  (HITL approval before any external write)]
    ↓
[Step 6: emit_event]

The LLM is one step. It has a defined input (the text). It has a defined output (a JSON object matching a schema). It does not get to decide what happens next. The workflow engine does that.

This is boring. Boring is good. Boring systems are auditable, testable, and cheap.

Why Agents Are Expensive When You Do Not Need Them

An agent that solves a known workflow pays a tax on every step:

Pattern	Cost per task	Latency	Failure rate (typical)
Agent with ReAct loop, 5-10 tool calls	10-50× LLM tokens	15-60 seconds	High variance, hard to bound
Workflow with 1-2 LLM steps	1-3× LLM tokens	2-8 seconds	Tightly bounded by validation
Pure deterministic code	0 LLM tokens	<1 second	Near zero

The agent re-reads the system prompt on every step. It re-decides the next action. It writes tool calls in natural language and parses natural language responses. Every cycle burns tokens, adds latency, and introduces a place where the model can go off-script.

When the task is well-defined, this is pure waste.

When You Actually Need an Agent

I am not against agents. I build them. The question is when.

Use an agent when:

The goal is defined but the path is not
The system needs to discover tools, data sources, or sub-tasks
The cost of a wrong decision is recoverable (it can re-try)
The user is willing to accept variable latency and cost

Do not use an agent when:

You can write the steps down in a numbered list
The output must be a specific schema every time
An auditor needs to trace what happened
A wrong step has irreversible consequences (deleting data, sending money, legal exposure)

The last point is the one nobody talks about. An agent in a high-stakes pipeline is a liability machine. It is a workflow that decided to be unpredictable. Your auditor will not accept "the model thought it was the right thing to do" as a control.

The Real Cost: Debugging

Production debugging is where the workflow-vs-agent decision gets sharper.

When a workflow with one LLM step fails, you know exactly which step failed. You have the exact input, the exact output, the exact schema. You can replay it. You can fix the prompt, the schema, or the upstream data. The blast radius is one step.

When an agent fails, you have a 2000-token trace of an LLM that decided to do something unexpected four steps ago. You re-run it. The agent makes a different decision. You have not reproduced the bug. You have learned nothing.

Most teams I have seen build agents eventually build tracing on top. Then they build evals. Then they start adding guardrails. Then they build approval steps. At some point they realize they have rebuilt a workflow. Badly, with extra steps.

A Practical Migration Path

If you have an existing "agent" that is really a workflow:

Trace the last 100 successful runs. Mark each decision point. You will see the agent making the same choices in the same order, 90% of the time.
Extract the happy path. Those repeated decisions are your workflow. The 10% of unusual cases are where you still need a human or a smaller agent.
Replace tool-call loops with explicit edges. A graph node per step, with typed inputs and outputs.
Keep the LLM where it adds value. Use it for the messy parts (extraction, classification, summarization) and deterministic code for everything else.
Add a HITL approval before any irreversible action. This is not optional. Even in a workflow.

The result is a system that runs in seconds, costs a fraction, fails in known ways, and is auditable end to end. It also happens to be what most teams need.

What I Build With

At centerbit, the default architecture is a workflow with LLM steps, not an agent. Facio is the agent runtime for the cases that genuinely need one. Placet is the HITL inbox for the steps that need a human. The composition is what we ship. The agent is one tool in the box, not the whole system.

This is the boring, reliable, profitable way to put LLMs in production. The agents-first crowd will tell you it is not ambitious enough. They are usually the same people whose demos crash on stage.

Build the workflow first. Promote to an agent only when you can prove the workflow cannot do the job.

I build production AI systems for German SMEs at centerbit. My bias is toward systems that run reliably in regulated industries, not toward what demos well at conferences. HITL is not a workaround, it is the interface between autonomous computation and real-world responsibility.

Top comments (3)

Alex Shev • Jun 11

This framing is much healthier. Most useful systems do not need a free-roaming agent; they need a workflow with a few AI steps, clear inputs, and deterministic checks around the risky parts.

That is also how I think about terminal skills: package the repeatable workflow first, then let AI assist inside the boundaries.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.