whchi

Posted on Jun 23

4 Design Dimensions of Agentic Workflows

#ai

AI agent, AI workflow, and agentic workflow are terms that often get mixed together. Some people call any LLM that can use tools an agent. Others say it only counts if the system can run on its own for a long time. When definitions don't match, asking "is this agentic or not?" usually gets you nowhere.

A more practical approach is to break the system down into a few observable design choices: who decides the next step, how fixed the execution path is, how multiple agents work together, and where humans step in.

The four dimensions below are not industry standards, and they are not four mutually exclusive quadrants. They are just a set of coordinates I use to understand agentic workflows. The same system can land in different positions at different levels, and it will shift depending on the task and the risk involved.

First, Let's Separate Workflow from Agent

Anthropic's classification is a good starting point:

Workflow: The LLM and tools run along a pre-defined code path.
Agent: The LLM dynamically decides its own execution process and how to use tools.

This is not the only definition. The community is still debating where the line between agent and workflow sits, and real-world systems are almost always a mix of both. For example, code might lock in the broad structure of a research process, while the LLM decides what to search for and which tools to call at each stage.

So instead of putting one label on the whole system, it's more useful to clearly describe how it is controlled.

Dimension 1: Who Controls Orchestration?

The first question is: who decides what happens next?

Mode	Description	Best for
Code-driven	Code controls the sequence, branching, retries, and stop conditions; the LLM only handles specific nodes	Clear rules, high cost of errors, needs stable reproducibility
Model-driven	The LLM selects tools, plans steps, or delegates tasks based on the current goal and state	Open-ended tasks, high input variety, paths that are hard to define in advance
Hybrid	Code sets the outer frame and safety boundaries; the LLM makes decisions within a limited scope	Most practical agentic systems

The OpenAI Agents SDK also separates orchestration into LLM-driven and code-driven modes, and explicitly notes that both can be used together.

Note that this dimension has nothing to do with which framework you use. LangGraph, AutoGen, and CrewAI can all produce multiple orchestration styles. You cannot determine the architecture just by knowing the framework.

Dimension 2: How Fixed Is the Execution Path?

The second question is: does every run follow the same path?

Mode	Description	Example
Fixed	Steps and order are defined in advance	Fetch data → summarize → review → publish
Conditional	Branches, retries, or skipped steps based on intermediate results	Trigger a deep investigation only when an anomaly is detected
Adaptive	The next step is decided at runtime; the path may keep changing	Open-ended research, debugging, exploratory coding

Conditional branching does not require an LLM — a regular if-else works fine. Adaptive does not mean uncontrolled; a good adaptive system still has budgets, permissions, stop conditions, and tool boundaries to keep it in check.

One more thing: Fixed does not equal DAG. As soon as there are retries, corrections, or an evaluator-optimizer loop, the path can cycle back. The real question is "who decides the path, and when?" — not "is this a DAG?"

Dimension 3: How Do Agents Collaborate?

The third question actually covers two things: how many independent agents are in the system, and how they hand work to each other.

Mode	Description	Main cost
Single Agent	One agent uses multiple tools or loads different skills within the same task state	Context can grow too large; role boundaries are weaker
Manager-Worker	A central agent breaks down tasks, calls workers, and integrates the results	The manager can become a bottleneck or a single point of failure
Handoff	One agent passes control to another specialized agent	Requires careful handling of context transfer and responsibility boundaries
Peer / Decentralized	Multiple agents collaborate through a shared protocol with no single central decision-maker	Hardest to debug; highest coordination cost

Multi-agent does not mean "calling the LLM many times." A single agent can have many model calls. What makes something multi-agent is whether each unit has its own distinct instructions, context, tools, state, or responsibility boundary.

Frameworks are also not one-to-one with a topology. AutoGen Teams, for instance, offers round-robin, model selector, swarm, and graph flow all at once. Assuming AutoGen or CrewAI means "no center, behavior emerges naturally" is simply wrong.

Dimension 4: Where Do Humans Step In?

Autonomy should not be measured by how many steps an agent can take in a row. Three tool calls might delete production data; thirty read-only searches might carry almost no risk. A better measure is: which actions require approval, and how recoverable are mistakes?

Mode	Description	Common controls
Human-triggered	Humans decide each major action; AI provides suggestions or drafts	Preview, step-by-step confirmation
Checkpointed	The system can proceed on its own but waits for approval at high-risk nodes	Confirmation before financial transactions, deletions, or external publishing
Goal-driven within guardrails	Humans set the goal and boundaries; the system completes the work within defined permissions, budgets, and stop conditions	Sandbox, allowlist, spending limits, audit log, rollback

Even a highly autonomous system does not mean "no one is responsible." Observability, failure handling, permission isolation, and post-run auditing are still required in production.

Four Common Combinations

When you put all four dimensions together, you will notice that what appears in practice is not a single maturity ladder, but a few combinations matched to different task types.

1. Reviewable Research Pipeline

Single Agent × Hybrid × Mostly Fixed × Checkpointed

Code or documents define the research stages. The LLM handles searching, analysis, and writing at each stage. Humans review sources, conclusions, and publishing. This design does not aim for maximum autonomy, but it is easy to reproduce, version-control, and resume after interruptions.

2. Deterministic Batch Processing

Single Agent × Code-driven × Conditional × Goal-driven within guardrails

Code handles scheduling, retries, and data flow. The LLM only does classification, extraction, or text judgment. Good for customer service routing, document classification, and contract field extraction.

3. Single Adaptive Agent

Single Agent × Model-driven × Adaptive × Checkpointed

The agent reads error messages, edits code, runs tests, and decides the next step based on the results. This works well for exploration and debugging, but it should still pause for human confirmation before destructive operations, external publishing, or sensitive data.

4. Manager-Worker Parallel Research

Multi-Agent × Model-driven Manager × Adaptive × Checkpointed

A central agent splits a research question into parts and sends them to multiple workers to handle in parallel, then integrates and fills the gaps. Anthropic's public multi-agent research system uses exactly this orchestrator-worker architecture — not a fully decentralized swarm.

What Does the Community and Research Say?

There is no consensus that "multi-agent is always better," but a few signals keep appearing across official documentation, research papers, and developer discussions.

Start with the Simplest Architecture

Anthropic recommends finding the simplest working solution first and only adding agentic complexity when you actually need it. AutoGen's documentation also suggests optimizing a single agent first, and only moving to a team after confirming the single agent is not enough.

The Hacker News discussion on Anthropic's article mostly agreed with simple, composable workflows, but added two points: Human-in-the-Loop introduces long wait times, retries, and repeated execution; and production systems cannot rely on just a prompt and a model loop — durable execution, state management, and observability are unavoidable.

Multi-Agent Has a Coordination Tax

Adding another agent is not just one more model call. It also brings context transfer, result integration, error propagation, latency, and higher debugging cost. The cases where it actually pays off are usually:

The task can be clearly split and run in parallel.
Different subtasks each need a large, isolated block of context.
Different agents need different tools, permissions, or responsibility boundaries.
A single agent has already started degrading because it has too many tools or too long a context.

A 2025 paper, Towards a Science of Scaling Agent Systems, compared 260 configurations and found that multi-agent setups may benefit parallelizable tasks, but can actually regress on sequential reasoning tasks due to coordination overhead. This is a controlled result across six agentic benchmarks with a fixed compute budget — it should not be taken as a universal rule for all products. The more reliable takeaway is: match the architecture to the nature of the task; the number of agents is not a measure of capability.

"Fully Automated" Is Not the Only Goal

In finance, healthcare, legal, payments, and formal data modification scenarios, Checkpointed is often not a temporary compromise — it is the right product choice from the start. The appropriate level of autonomy should be determined by risk, reversibility, and where responsibility lies, not by how high the autonomy score is.

A Real Example: My Investment Research Repo

My own tw-stock-researcher is closest to this combination:

Single Agent × Hybrid × Mostly Fixed × Checkpointed

Skills are defined in Markdown documents; the LLM loads only the capabilities the task needs.
The overall direction is fixed: case init → data fetching → analysis → conclusions → market observation, in that order.
Within each stage, the LLM can still decide which tools to use and how deep to go based on data quality.
Output is saved as structured Markdown for easy human review, version control, and resumption across sessions.

I originally described it as LLM-as-Orchestrator × Fixed DAG, which was not quite right. A more accurate description is: the macro flow is fixed, the local paths are left to the LLM; documents handle constraints, code and humans guard the boundaries. This hybrid design makes sense for research tasks, because "reviewable" is usually more important than "fully automated."

How to Choose?

When designing an agentic workflow, you can work through six questions in order:

Can this be reliably solved with regular code? Hand the deterministic parts to deterministic code first.
How much variation is there in inputs and paths? Low variation means a fixed flow; high variation means letting the model plan dynamically.
Can the task actually be split up? The extra cost of multi-agent only makes sense when tasks can run in parallel, need isolation, or require different permissions.
What is the cost of failure, and how reversible is it? High-risk or irreversible actions must have approval checkpoints and permission restrictions.
When something breaks, can you tell where? Every step should have visible inputs, outputs, state, and stop reasons.
Is there an eval showing the complexity is worth it? Compare success rates, cost, latency, and manual correction effort — do not judge by how impressive the demo looks.

Conclusion

"Agentic" is not an architecture name, and it is not a badge of maturity you pin to a system. The four questions that actually carry information are:

Is the next step decided by code, a model, or both together?
Is the path fixed, conditional, or generated dynamically at runtime?
Is it a single agent, manager-worker, handoff, or decentralized collaboration?
At which risk points do humans step in, and what guardrails does the system have?

When you can answer these clearly, the system's cost, risk, and maintainability become readable. A good agentic workflow is not the one with the most autonomy or the most agents — it is the one that stays flexible where the task needs flexibility, and stays deterministic where it does not.

DEV Community