DEV Community

Cover image for AI Agents in Practice — Part 5: Workflow, Agent, or Single LLM Call — How to Decide
Gursharan Singh
Gursharan Singh

Posted on

AI Agents in Practice — Part 5: Workflow, Agent, or Single LLM Call — How to Decide

Part 5 of 8 — AI Agents in Practice series.

Previous — Five Agent Patterns and the Control Surfaces That Make Them Safe (Part 4)


The mistake: starting with agents instead of task shape

Imagine TechNova had started its support system with one assumption: "Let's build an agent."

The team gives a single agent access to everything it might need: order lookup, shipping status, cancellation, refund rules, warranty checks, customer messaging, and human approval. The demo works. The agent reads the customer's message, checks the order, reasons through the policy, decides what to do next, and drafts a response.

Six months later, the same system is in production. It is slow, expensive, hard to debug, and brittle in ways nobody can quite explain. Some requests take two seconds. Others take forty. The on-call runbook has a page called "agent stuck in a loop."

The uncomfortable part is that the model did not fail. The prompts are fine. The tools work. The architecture was wrong before the first prompt was written.

That is the mistake this article is about: not using an LLM, but choosing the most flexible shape before checking how much flexibility the task requires. Flexibility you do not need is not free — you pay for it in tokens, latency, debugging time, and on-call hours, every request, forever.

The architecture choice is the first decision in any project, and the most expensive one to reverse later. This article walks through five shapes a system can take, the one question that organizes the choice among them, the factors that sharpen it, and the warning signs that you reached too high on the ladder.


Five shapes for the same work

There are five practical architectures available to most production teams. They are not equally attractive options. They are a ladder. Most systems should live in the bottom half.

Single LLM call. One model call, one response. No agent loop, no dynamic tool choice. The model takes input, returns output, and the system either uses the output or doesn't. The surrounding code may add validation, retries, or formatting, but the model itself is doing one task in one turn. This is the simplest possible shape and it solves more production problems than most engineers think — summarize this case, classify this ticket, draft a first reply.

Predefined workflow. A sequence of steps the developer designed. Steps may include LLM calls, code, tool calls, API requests, database lookups, retries, validation gates, parallel branches, and conditional routing. The graph of possible paths is fixed at design time. The model may make decisions inside steps, but the structure of the graph is the developer's. Think of the state transitions from Part 3 with no agentic next-step choice: the same flow, but every edge drawn by a developer.

Hybrid workflow with one agentic step. A predefined workflow with one bounded decision point where the model is allowed to choose dynamically among predefined options. The workflow handles the predictable parts — authentication, data fetching, validation, the steps that have to happen in order regardless of input. The agent handles the one decision in the middle that doesn't have a deterministic rule. Then the workflow takes over again.

Single agent. A loop where the model decides the next step at runtime based on what it has seen so far. The developer defines the available tools, the stopping condition, the budget, and the boundaries. The model decides the sequence. Each turn observes the state and chooses an action. The path emerges from the interaction between the model and the environment.

Multi-agent system. Multiple agents, each with its own scope, coordinating to solve a task that no single agent could solve cleanly alone. Specialization is the cost-justifying property — different domains, different tools, different memory, different review responsibilities. The coordination layer is itself a design problem and is rarely free.

D1 — The Architecture Ladder
Architecture ladder showing five system shapes stacked from simplest to most complex. Bottom: single LLM call, a model with a single output. Second: predefined workflow, a five-box chain with one LLM step. Third: hybrid workflow with one agentic decision step, a chain with a purple

A reader looking at the ladder for the first time might assume the goal is to climb as high as possible. The opposite is closer to true. The cost of operating each rung — in tokens, latency, debuggability, reliability, audit difficulty, and the engineer-hours required to keep the thing healthy — increases up the ladder. The expressive power increases too, but expressive power that exceeds the requirements of the task is just expensive.

This ladder is not a list of features. RAG, tools, databases, queues, and APIs can appear inside several of these rungs. The same retrieval step can appear as a context fetch before a single call, a node in a predefined workflow, or a tool an agent calls inside its loop. The ladder isn't about what a system contains; it's about who controls the next step and how much runtime freedom the system has.

The goal is not to use the most agentic shape you can justify. It is to use the lowest rung that still handles the task honestly. Hybrid is a legitimate steady-state shape for a meaningful fraction of cases; single agent is correct for fewer; multi-agent for fewer still. If your sense of the distribution runs the other way, the warning-signs section at the end of this article is for you.


The real question: who decides the next step?

The deciding factor isn't complexity. It's who decides the next step.

Suppose TechNova has a rule: if the refund amount is over $500, route to human approval. The model might summarize the case, classify the reason, or draft the reply — but the next step was chosen by code, before the system ever ran. That point is a workflow. Now suppose the order data, the customer's message, the warranty language, and the shipping status all conflict, and the system can't know in advance whether the right next move is to ask for photos, check inventory, escalate to warranty, or draft a replacement offer. If the model picks that next step at runtime, based on what it just saw, that point is agentic.

That is the whole distinction, and it survives every complication you can throw at it. A system with three LLM calls, parallel branches, retries, and a conditional router is still a workflow if the developer drew the graph and the model only chooses among predefined paths. A system with one LLM call and one tool is still an agent if the model decides whether to call the tool, what to pass it, and what to do with the result. Tool use doesn't settle it; making decisions doesn't settle it; calling the model many times doesn't settle it. Only one question settles it: when the next step is unclear, who chooses — the developer at design time, or the model at runtime?

In Part 2 language, we are asking who owns the observe → decide → act loop at that point — code, or the model.

If the developer chose at design time and wrote the choice into code, the system is a workflow at that point. The choice may be conditional ("if the ticket is unresolved after 24 hours, escalate"), branching ("classify into one of three categories"), even probabilistic ("retry up to three times"). It is still the developer's choice — encoded once, executed every time.

If the model chooses at runtime based on what it has just observed, the system is agentic at that point. The choice cannot be enumerated upfront, because the inputs that would inform it don't exist until the system is running. The model looks at the state, weighs the options, picks one, takes the action, observes the result, and decides again.

Everything else — complexity, cost, tool use, branching, latency — follows from that choice.

Many systems sit in mixed territory. The workflow decides most things; the model decides one thing. That is the hybrid case. Hybrid is just naming that split: workflows own the predictable edges; the model owns one bounded decision where the edges cannot be drawn cleanly. The clean shapes at the top and bottom of the ladder are simpler. The middle is where most deployed systems actually live.

The decision does not need a complicated framework. Start by asking who owns the next-step choice — code, or the model. The diagram below is the practical version of that question; the five decision factors that follow just sharpen it.

D2 — Who Decides the Next Step?
A decision flowchart titled Who Decides the Next Step. A task arrives and branches: if the whole path can be drawn before runtime, it is a predefined workflow. If not, and it is mostly predictable with one messy bounded decision, it is a hybrid workflow — workflow outside, one bounded agentic step inside. If not, and the model needs to observe, choose, act, and repeat, it is a single agent — a runtime loop bounded by tools, budget, stopping conditions, and escalation. A single LLM call sits off to the side as one turn in, one turn out. A dashed path leads from single agent to multi-agent system, used only when coordination earns its cost.

In short:

  • If code chooses the next step before runtime, that point is a workflow.
  • If the model chooses the next step at runtime, based on what it just observed, that point is agentic.
  • Most real systems are hybrid: code owns the predictable edges; the model owns one bounded decision.

Workflows are graphs, not just pipelines

A common confusion makes the workflow option look weaker than it is. Many engineers picture a workflow as a linear pipeline — step one, then step two, then step three. Real production workflows are not linear. They are graphs.

A workflow can branch. A workflow can run in parallel. A workflow can route. A workflow can include retries, validation gates, error-handling paths, human review steps, conditional logic based on intermediate results, and fan-out / fan-in patterns. A workflow can call LLMs for classification at one node and for summarization at another, and use the LLM output to choose which branch to follow.

What a workflow cannot do well is decide a new path that was not designed into the system.

That last sentence is the boundary. A workflow operates within a graph the developer drew. The graph can be dense, branching, and rich. But every edge in the graph existed before the system ran. When a workflow encounters an input it doesn't know how to handle, it can route to a default path, escalate to a human, fail with an error, or pattern-match imperfectly — but it cannot choose a path that isn't already there.

An agent can — within the tools and boundaries you gave it. The developer defines the tool set, the budget, the stopping condition, and the rules of the environment. Within those constraints, the agent can choose an action sequence that wasn't drawn in advance. That is the agent's distinctive move.

One nuance worth naming. The question isn't whether a system is implemented on a workflow engine, a graph framework, or a custom loop. A workflow engine can host an agent, and a custom loop can host a workflow. The implementation is downstream. The question is who owns the next-step decision — code, or the model. A workflow engine can implement an agent; that does not mean every workflow is an agent.

Consider a customer-support system that handles refund requests, order-status questions, technical issues, and complaints. A routing workflow classifies the incoming message and dispatches to the right handler. Each handler is itself a small graph. The system can be deeply branching and still be a workflow — because every category, every handler, and every step within each handler was designed at build time.

A production RAG system makes the same point in a different domain. A question router classifies the user's query and sends it to one of several backends — vector store, SQL database, document store, graph database, external API — then a synthesizer assembles the result. The system has classification, branching, multiple LLM calls, and conditional logic. It is still a workflow if the branches are known ahead of time and the router chooses among predefined paths. Branching does not automatically make a system an agent.

Now consider a different customer-support system that receives a message it cannot cleanly classify — a request that mixes a refund question, a technical complaint, an emotional concern, and a deadline pressure. The workflow can fall back to a default handler, route to manual review, or pattern-match on the most prominent signal. What it does not have is a clean designed path for every messy combination of those signals. An agent could choose what to do next based on which concern is most urgent, what information is missing, and what action would help most — within the tool set the workflow could have called too, but without needing each combination drawn in advance.

The workflow handles the cases that fit its graph cleanly, and falls back to defaults or manual review when they don't. The agent handles the cases where falling back isn't enough — where the system needs to actually choose a path, not just pick a default. The question is not which approach is "better." The question is what fraction of your real traffic needs that judgment, and whether the cost of putting an agent in front of all of it is worth what you gain on those ambiguous cases.

For business workflows handling structured processes, the answer is "almost none of the traffic needs runtime judgment." The graph fits, and a workflow is sufficient — often with one agentic decision point at the place where the graph genuinely can't enumerate the options. That hybrid case is common enough that it deserves its own rung on the ladder, which we will come to.

For systems handling open-ended exploration — research tasks, debugging an unfamiliar codebase, conducting an investigation — the graph doesn't fit, and an agent is the right shape.

The mistake is reaching for an agent because workflows feel old-fashioned. Workflows aren't old-fashioned. They're the right tool for any problem whose shape can be drawn in advance.


Five decision factors that organize the choice

The choice of architecture rarely turns on a single criterion. It turns on several factors weighed together. Five factors organize most of the decision space:

Path predictability. Can you draw the decision tree before runtime? If yes, a workflow can encode it. If no, the model has to choose paths at runtime.

Input variability. Is the input shape known and bounded? A bounded input space (orders, tickets, structured forms) favors workflows. An open-ended input space (natural-language conversations, exploratory research questions) favors agents.

Action range. How many distinct actions does the task need to choose among? A small fixed set fits a workflow. A large or open-ended set — especially when the choice depends on intermediate results — favors an agent.

Reliability and auditability. How badly does the system need to do the same thing every time? Regulated domains, financial transactions, anything with compliance or audit requirements: workflows give you traceability that agents don't, by default. If you need to prove what the system did and why, the workflow's predetermined graph is the answer.

Cost and latency tolerance. Agents typically run more LLM calls, more tool calls, and longer loops than workflows. A single LLM call is one round trip; an agent loop can easily become five to fifteen model/tool round trips before the user sees an answer. If the task budget is tight — chat-facing latency under two seconds, cost per request under a fraction of a cent — agents may be priced out before they are evaluated on capability.

The five factors don't combine into a formula. They combine into a sense of which shape fits the task. A useful heuristic: if four of the five factors point toward "workflow," it's almost certainly a workflow. If four point toward "agent," it's probably an agent. If they split, you are likely in hybrid territory — most of the system is predictable, but one decision point isn't.

The table below shows how the five shapes compare on each factor. Treat it as directional, not scientific. A system can rank "low" on input variability and still benefit from an agent for other reasons; a system can rank "high" on cost tolerance and still choose a workflow for auditability. The table is a starting point for the conversation, not the conclusion of it.

Factor Single LLM call Predefined workflow Hybrid Single agent Multi-agent
Path predictability High High Mostly Low Low
Input variability Low Low–medium Medium High High
Action range None Fixed, small Fixed + one decision Dynamic Dynamic + delegated
Reliability / auditability Medium High High if bounded/logged Lower Hardest
Cost / latency Lowest Low Medium Higher Highest

Read the table top-down, factor by factor. Path predictability is high on the left and low on the right. Auditability follows a similar pattern, with hybrid holding up only when the bounded decision and its inputs are logged. Cost and latency move in the opposite direction. The broad trend is consistent: more expressive power costs more, in most dimensions you care about in production.

Single LLM calls are auditable at the input/output level, but they do not give you the same step-by-step path trace that a predefined workflow does. Agents can approach workflow-like auditability only when you invest in richer traces, strict control surfaces, and explicit decision logs.

This is also why the architecture choice matters before code is written. Reversing it later is expensive. Going from agent to workflow means giving up flexibility you've built tooling around. Going from workflow to agent means rewriting the parts of the system that previously assumed deterministic paths. The cheapest version of the decision is the one made before construction.


Hybrid: the shape most production systems actually want

A customer writes in to TechNova:

"I bought the TechNova SmartHub two weeks ago. After the firmware update it stopped connecting. I threw away the box, but I need this working before Monday. Can you help or send a replacement?"

This is a real-shape support request. It is not a clean refund question, not a clean technical question, and not a clean replacement question. It is partly all three. It has a deadline. It has a customer with a thrown-away box. It has a firmware update as the suspected cause.

A pure workflow handles part of this case well. The system needs to authenticate the customer, fetch the order, check the purchase date, check the return window, check warranty status, and check for known issues with the firmware update. All of these steps are predictable. Every support request needs them. The graph is the same regardless of what the customer wrote.

An agentic decision step handles a different part well. Given the gathered facts, what should the system actually do? Process a return? Send a replacement? Offer troubleshooting? File a warranty claim? Ask for clarification because the box is gone and the proof-of-purchase chain is now harder? Escalate to a human because the deadline pressure raises the stakes?

The first part is rule-based. The second part isn't. Six branches with overlapping conditions, and the choice depends on the conversation context, the customer's tone, the deadline, the firmware history, and how the previous steps resolved. You could try to enumerate the rules. You would build a decision matrix with thirty rows and find it still doesn't cover real cases. The branching logic isn't simple enough for code and isn't open-ended enough to need a full agent.

The hybrid shape splits the difference cleanly. The predictable steps run as a workflow. The messy decision runs as one bounded agentic step. Then the workflow resumes.

D3 — A Hybrid System in Practice
A horizontal workflow diagram showing a customer-support example. The left side has three workflow steps in gray boxes: customer message, authenticate plus fetch order, and run predictable checks. The center shows a single purple decision box labeled

A hybrid system does not give the agent the whole process. It gives the agent one bounded decision where rules become messy, then hands control back to the workflow.

Two things make this work in production. First, the agentic step is bounded — the model chooses among a known set of next paths, not from an open space. The choice is "which of these six branches" not "what should we do." Second, the agentic step's output is structured — the model returns a path identifier, not free text that the next system has to interpret. The workflow downstream can route deterministically based on that identifier. In Part 4 language, the output schema is a control surface: it limits the agentic step to approved path IDs and lets the workflow treat the result as a deterministic input.

This is the shape most production support, customer service, claims processing, and routing systems actually want. The vast majority of the work is predictable. One decision point in the middle is genuinely ambiguous. A pure workflow forces you to enumerate every rule, and you will get it wrong on edge cases. Giving the model the whole process — including the parts that don't need its judgment — means paying for that judgment on every request.

Hybrid is not a clever trick or a transitional state on the way to a "real" agent. For a meaningful fraction of customer-facing systems, hybrid is the steady-state design. It is the shape worth reaching for when one part of the problem is messy and the rest isn't.

The cost of hybrid is operational. You now have two runtimes inside one system, and the handoff between them needs to be solid — what state the workflow passes in, what the agent is allowed to return, what happens if the agent fails or exceeds its budget. For example: the workflow may pass order status, warranty status, firmware version, known-issue flag, and customer deadline; the agent may return only a structured path such as RETURN, REPLACEMENT, TROUBLESHOOT, WARRANTY, ASK_CLARIFICATION, or MANUAL_REVIEW. These aren't glamorous engineering problems, but they're the difference between a hybrid that ships and one that gets quietly replaced six months later.


When you genuinely need an agent

A single agent — without a workflow shell — is justified when the path itself cannot be designed in advance. At the start of the task, you can list the tools the system might use, the kinds of decisions it might face, and the goal. What you cannot do is draw the graph, because the graph emerges from interaction with the environment.

Four conditions usually appear together when a real agent is the right shape. The next step cannot be fully predicted — step four depends on what step three observed. The tool or action choice depends on intermediate results, in a space too large to enumerate as conditional branches. The task needs repeated observe → decide → act loops, with the stopping condition depending on what the system discovers along the way. And the environment gives feedback the model must react to — error messages, unexpected response shapes, missing data — that requires changing approach rather than just retrying. When all four hold, an agent is probably the right shape. When some hold and some don't, hybrid is likely better.

Coding is a good example because it gets used both ways. Coding is the domain. Control flow is the architecture. The same coding task can be solved by a single LLM call ("explain this function"), by a workflow ("read issue → fetch likely files → generate patch → run tests → report"), or by an agent ("read issue → choose which files to inspect → search → open files → edit → run tests → inspect failures → choose next action → repeat"). The architecture isn't determined by the fact that the task involves code; it's determined by whether the next step can be designed upfront.

Agents are powerful where they fit, and more expensive across most dimensions that matter once they're running. The boundaries an agent operates within — tools available, budget allowed, stopping condition, escalation path — aren't optional. They are the work. Building an agent is mostly the work of constraining it.


Multi-agent: a different question

A common path through the architecture decision goes: workflow feels too rigid, so the team builds an agent; the agent feels too messy, so they build several agents to coordinate. That second step is usually wrong.

Multi-agent is not the next step after a single agent feels hard. It is a separate design decision that must earn its coordination cost through specialization, separation, or measurably better results.

Coordination is not free. Each agent has its own context, memory, and scope; the protocol between them is its own design problem. Token cost is multiplicative, latency is additive, and debuggability is significantly worse than a single agent — failures can come from any one agent, from the coordination layer, or from the interaction between them.

Multi-agent earns its cost in a small number of cases. When the work genuinely splits across specialized domains where one model can't hold all the context — say, a system that needs a security expert and a performance expert each reasoning about the same change with their own knowledge bases. When the work needs independent review — one agent generates, another checks, kept separate so the generator can't coach the reviewer. When the work needs separation of authority — one agent has write access to one system, another to a different one, with the boundary enforced by design. In those cases, coordination cost is the price of admission. In most other cases, a single agent with the right tools and the right context does the same work for less.

The most common failure mode is coordination overhead on a problem that didn't need it. Three agents pass messages back and forth to do work one agent could have done directly. The system looks architecturally impressive in design reviews; it costs three times as much, takes three times longer, and fails in ways that take three times as long to diagnose.

There is a useful parallel with how teams approached microservices a decade ago — a legitimate pattern that often got applied to problems that didn't need it. Multi-agent has a similar risk profile. Earn the second agent. Then earn the third.


Warning signs you chose too much architecture

Over-engineering an architecture is harder to spot than under-engineering one. The system runs. The demo works. The cost shows up six months later, in production, with no obvious villain. By then the team has built tooling, monitoring, and operational habits around the wrong shape, and unwinding is expensive.

Four warning signs are worth recognizing early.

Your agent keeps running past the point where the answer was already correct. The system finds the right answer in step three but doesn't stop. It keeps reasoning, keeps calling tools, keeps revising. By step eight the answer is the same as step three, but the user has waited twenty seconds and the system has spent ten times the cost. This usually means the stopping condition is underspecified or the agent has been given too open a goal. Sometimes it means a workflow would have been better — if the answer is reliably correct by step three, perhaps step three didn't need an agent in the first place.

Your multi-agent system is just routing in a costume. Three agents pass messages, but the messages always flow the same direction. One classifies. One handles. One responds. There is no genuine coordination — no negotiation, no specialization that couldn't have been a tool call, no review loop that adds value. The system would be cheaper and more reliable as a routing workflow with one or two specialist agents at the leaves.

Your agent never escalates to a human, even when it clearly should. The agent is allowed to take any action within its tool set, but the tool set doesn't include "stop and ask." The agent improvises through situations it doesn't understand, produces confident but wrong outputs, and the team notices only when a customer reports it. Escalation is a designed control surface, not a fallback. If your agent doesn't have one, you have built something more dangerous than what you needed.

Your "agent" is really doing one fixed sequence of steps the developer wrote in the prompt. The system prompt contains instructions like "first do X, then Y, then Z, then return the result." The model follows the prompt because it's a competent model. The system functions. But the architecture is a workflow being executed by a model that has no idea it's a workflow. The team is paying agent prices for workflow behavior, and getting workflow rigidity wrapped in agent unpredictability. The right move is to take the steps out of the prompt and put them in code, where they belong.

These signs share a pattern. They appear when the architectural choice was made for reasons other than fit. Sometimes the team wanted to build "an agent" because the word sounds advanced. Other times a workflow felt old-fashioned, or a single LLM call sounded too simple to be impressive. The architectures themselves are not at fault. The fit was wrong.

If any of these signs sounds like your system, the fix is rarely a better prompt or a smarter model. It is usually a step down the ladder. An agent that always follows X → Y → Z probably wants to become a workflow. A multi-agent system that only classifies, handles, and responds probably wants to become a router plus one bounded specialist.


Default to the simplest shape that works

Start at the bottom of the ladder. Climb only when the rung below provably cannot carry the load.

A single LLM call solves more problems than most teams give it credit for. When the task is summarization, classification, extraction, simple reasoning, or single-turn generation — that is the shape. Don't add a loop. Don't add tools the task doesn't need. Don't add an evaluator if a single well-prompted call returns the answer.

A predefined workflow handles the next tier — anything where the steps are known, the paths are bounded, and the reliability requirements matter. Most business processes live here. Most support flows live here. Most data-processing pipelines live here. Workflows are not exciting and they are correct.

Hybrid is the right shape when one decision in the middle is genuinely messy and the rest isn't. It is more common in real deployed systems than most introductory writing on agents acknowledges. Most teams should treat hybrid as the default for anything that involves customer-facing decisions, claims, routing across overlapping categories, or any workflow where one step needs judgment the others don't.

Single agent is correct when the path emerges from the interaction with the environment — when the next step really cannot be designed upfront, when tool choice depends on intermediate results, when the system needs to observe, decide, and adapt across many turns. It is a smaller fraction of cases than the current state of the industry suggests, and the agents that succeed in production are the ones where the surrounding constraints — tools, budget, stopping, escalation — are designed as carefully as the loop itself.

Multi-agent is correct when coordination earns its cost through genuine specialization or separation. That is rarer still, and earning the second agent is more work than the first.

The most expensive production agents are the ones that should never have been agents in the first place. The cost is paid in tokens, latency, on-call hours, and the slow accumulation of complexity that nobody can unwind. The cheapest version of any architecture is the right architecture, chosen before construction.


Three takeaways

  1. Start lower on the ladder than your instincts suggest. A single LLM call or predefined workflow often solves the problem with less cost, latency, and debugging pain.
  2. The key question is who decides the next step. If the developer can draw the path ahead of time, use a workflow. If the model must choose the next action at runtime, you are moving into agent territory.
  3. Use agency only where uncertainty earns it. Hybrid is often the practical middle ground: keep predictable steps in the workflow, let the agent handle one bounded decision, then return control to the workflow.

Looking ahead

We now have the five shapes and the one question that chooses among them — who decides the next step. What we do not have yet is what it actually takes to build one of these: the architecture map for a real agent, including the parts that never make it onto a whiteboard but decide whether the thing survives production. Choosing the shape is the first decision. Building it is the next one. That is Part 6.


Source note: this article builds on the workflow-versus-agent distinction and the "start simple" principle from Anthropic's Building Effective Agents (Schluntz & Zhang). The architecture ladder, the "who decides the next step" framing, and the treatment of hybrid as its own rung are this series' own synthesis.

Part of AI in Practice — three practical series on MCP, RAG, and AI Agents, focused on why these patterns exist, where they break, and how to think through the engineering decisions behind them.


Top comments (0)