Seenivasa Ramadurai

Posted on Feb 22

Designing Agentic AI Systems: How Real Applications Combine Patterns, Not Hype

#agents #ai #architecture #llm

Most explanations of AI agent patterns are either too abstract to be useful or too simplified to be accurate.This guide attempts to be both technically precise and genuinely easy to understand by grounding each pattern in a human behavior most engineers, architects, and product leaders already know well.

The Foundation: Two Operating Models of AI Systems

Before discussing agent patterns, we need to establish a distinction that quietly determines almost every architectural decision you will make.

Not all AI systems operate the same way.

In practice, modern LLM systems fall into two operating models defined by where control lives.

Understanding this boundary is essential because it shapes reliability, safety, observability, testing strategy, and governance.

1. Agentic Workflows Intelligence Inside Deterministic Systems

In an agentic workflow, the system is fundamentally code driven.

Engineers define:
The sequence of steps
Branching logic
Guardrails
Failure handling
Termination conditions

The LLM is invoked at specific points to perform bounded tasks such as interpretation, generation, classification, or reasoning but it operates within a structure defined by deterministic software.

The execution path is known ahead of time.

The system behaves like a controlled pipeline augmented with probabilistic intelligence.

You can think of this as:

A deterministic system that calls an LLM as a capability.

This model aligns with how most production AI systems are built today including RAG pipelines, prompt chains, tool augmented services, and orchestrated workflows.

2. Autonomous Agents Goal Driven Adaptive Systems

In an autonomous agent, control shifts.

Instead of code prescribing each step, the system provides:

A goal
A set of tools
Constraints or policies
An environment to observe
The LLM then decides:
What action to take
Which tool to use
How to interpret outcomes
When to continue or stop

Execution emerges dynamically through an iterative loop often described in literature as Reason → Act → Observe (ReAct).

There is no predefined sequence beyond high level boundaries.

You can think of this as:

A goal driven system where the model determines the workflow at runtime.

This approach appears in research agents, exploration systems, coding agents, investigative assistants, and adaptive planning environments.

Why This Distinction Matters A Clear Engineering Explanation

Choosing between an agentic workflow and an autonomous agent changes how you design reliability, testing, monitoring, and governance.

The core idea:

👉 If code controls the flow, you manage risk through software engineering.
👉 If the model controls decisions, you manage risk through evaluation and guardrails.

Where control sits defines where problems appear.

Failure Modes How Things Break

Agentic Workflows

Failures usually come from traditional engineering issues:
Missing logic branches
Incorrect orchestration
Bad retrieval results
API failures
Integration bugs
Incorrect assumptions coded into flow

Example:
A RAG pipeline returns wrong documents → answer is wrong.
Root cause is traceable.

Autonomous Agents

Failures come from cognitive behavior:
Model misunderstands goal
Takes unnecessary actions
Gets stuck in loops
Hallucinates tool usage
Makes unsafe decisions
Drifts from original objective Example: Agent keeps calling tools repeatedly trying to “improve” answer. Root cause is emergent.

Testing Strategy How You Validate Systems Workflows

You can test like traditional software:
Unit tests
Integration tests
Regression tests
Deterministic scenarios
Same input → same path.

Agents

You test like behavioral systems:
Simulation environments
Evaluation datasets
Adversarial testing
Monte Carlo runs running the agent many times with slight variations or randomness to observe behavior across scenarios and uncover edge cases
Human review
Same input may produce different actions.

Observability What You Need to Monitor Workflows

Logs are enough:
Step execution
API responses
Latency
Errors
You follow the pipeline.

Agents

You need deeper insight:
Reasoning traces
Decision trees
Tool calls
Memory state
Goal progress
Action outcomes
You monitor behavior, not just execution.

Governance and Safety How You Control Risk Workflows

You enforce rules in code:
Hard guardrails
Approval steps
Validation checks
Compliance rules
System cannot deviate.

Agents

You enforce policies around behavior:
Tool permissions
Budget limits
Action constraints
Kill switches
Human oversight
Policy engines
System can explore — within boundaries.

Determinism vs Adaptability The Tradeoff Workflows optimize for:

Predictability
Repeatability
Reliability
Auditability

Best for:

Finance
Healthcare
HR
Claims
Compliance

Agents optimize for:

Exploration
Problem solving
Ambiguity handling
Learning-like behavior

Best for:

Research
Coding assistants
Investigations
Planning
Discovery

Mental Model (Simple)

Think of it like this:

System Analogy
Agentic workflow Train on tracks
Autonomous agent Explorer in wilderness

Train = safe, predictable.
Explorer = powerful, uncertain.

Real Enterprise Impact

This decision affects:

Architecture complexity
Cost control
Production stability
Incident response
Compliance posture
Operational maturity

Many teams underestimate this and get surprised later.

One Sentence Summary

Workflows reduce uncertainty by design. Agents embrace uncertainty to gain capability.

Foundational Capabilities Across All Patterns

Before diving into individual patterns, modern agentic systems rely on a set of shared primitives:

Tools

Mechanisms that allow models to interact with systems like APIs, databases, workflows, messaging, code execution.

Tools turn reasoning into action

A2A (Agent-to-Agent Communication)

Mechanisms for agents to collaborate, delegate, and exchange results critical for multi agent systems and orchestrations.

Memory Layers

STM (Short Term Memory)
Session context — conversation history, current task state.

LTM (Long Term Memory)
Persistent knowledge user preferences, historical interactions, embeddings, knowledge graphs.

Pattern 1: Augmented LLM

What it is (technical)

A plain LLM has three built in limits:

Frozen knowledge (training time only)
No durable memory (unless you provide it)
No actions (it only generates text)

The Augmented LLM pattern fixes this by equipping the model at runtime with

Retrieval (RAG): Pull relevant documents/records and inject them into context before answering.

Tools: Let the model call functions (APIs, DB queries, calculators, code execution).

Memory: Persist useful context across turns/sessions (STM in the window; LTM in external storage like vector DB / KG / profile store).

Human equivalent

A specialist (doctor/lawyer/analyst) isn’t powerful because of “brain only.” They’re powerful because they have:

the client file (retrieval),
live systems (tools),
and prior notes (memory).

Augmented LLM is that same upgrade: a model with a desk, not a model in isolation.

Key design notes

Retrieval quality is the ceiling. Garbage context → confident wrong answers.
Tool schemas must be crystal-clear. Ambiguous tools create silent, hard-to-debug failures.

Pattern 2 Durable Agent

What it is (technical)

Most LLM interactions are short lived seconds or minutes.

But real workflows

Span days or weeks
Require approvals
Survive failures
Need audit trails

A Durable Agent wraps an AI system in a persistent execution layer that

Checkpoints state after each step
Supports pause/resume
Retries safely
Tracks full history

Typical engines:

Temporal
Durable Functions
Step Functions
Workflow engines

Human equivalent

A loan approval process.

It doesn’t restart because someone went on vacation it resumes exactly where it paused.

Key design notes

Idempotency is critical (avoid duplicate actions)
Plan schema evolution early
Track execution lineage for auditability

Pattern 3 Prompt Chaining

What it is (technical)

A complex task is broken into sequential steps.
Each step:

Performs a focused task
Produces structured output
Is validated before moving forward

This improves:

Reliability
Observability
Control

Human equivalent

Factory assembly line.
Each station does one job not everything.

Key design notes

Prevent error propagation with validation
Keep step outputs structured
Avoid passing unnecessary context

Pattern 4 Evaluator & Optimizer

What it is (technical)

Introduce a feedback loop

Generate output
Evaluate against criteria
Improve based on feedback
Repeat until acceptable

Human equivalent

Writer and editor iterating drafts.

Key design notes
Define clear evaluation rubric
Limit iterations
Watch for evaluator bias

Pattern 5 Autonomous Agent

What it is (technical)

The model controls its own loop

Decide next action
Execute
Observe
Update plan
Repeat
There is no fixed path.

Human equivalent

Detective following leads.

Key design notes
Enforce action budgets
Require approval for risky actions
Log everything

Pattern 6 Parallelization

What it is (technical)

Independent subtasks run concurrently.
Two modes:
Sectioning
Voting

Human equivalent

Team dividing work.

Key design notes

Ensure independence
Design aggregation carefully
Watch cost spikes

Pattern 7 Routing

What it is (technical)

A classifier directs requests to specialized handlers.

Human equivalent

Hospital triage nurse.

Key design notes

Measure routing accuracy
Define fallback path
Tune confidence thresholds

Pattern 8 Orchestrator & Workers

What it is (technical)

A coordinator decomposes tasks and assigns them to specialists.

Human equivalent

General contractor managing trades.

Key design notes

Define worker contracts
Detect conflicts
Avoid over fragmentation

How These Patterns Come Together in Real Systems

These patterns aren’t competing approaches they’re building blocks. In production, they’re layered deliberately, each solving a different class of problem.

Take a contract review system for a legal team.

A routing layer sits at the front, classifying incoming documents NDA, employment agreement, vendor contract, regulatory filing and directing each to the appropriate processing path.

Behind that, each path runs as a prompt chain: one step extracts clauses and metadata, another compares them against standard templates, and a third generates a risk summary. Between steps, code validates outputs to prevent errors from propagating.

When agreements become complex for example, multi party contracts the workflow invokes an orchestrator workers pattern. Specialized workers analyze indemnification, jurisdiction, termination rights, and other domains independently, and their findings are synthesized into a unified assessment.

Every model call operates as an augmented LLM, grounded with retrieval from contract libraries and connected to internal systems through tools.

Before results are delivered, an evaluator optimizer loop checks the output against defined quality criteria ensuring completeness, correctness, and appropriate risk classification.

All of this runs within a durable execution layer. If partner review is required, the system pauses, waits, and resumes later without losing state or restarting the process.

One system. Multiple patterns. Each contributing a specific capability the others don’t provide.

Where to Begin

A common mistake in agentic system design is starting with the most sophisticated pattern instead of the most appropriate one. Autonomous agents are compelling in demos, but in production they introduce governance, observability, and reliability challenges that many teams underestimate.

In practice, the most effective approach is evolutionary:

Start with an augmented LLM so your system has the right context, tools, and grounding.
Introduce prompt chaining when tasks naturally break into sequential steps.
Add routing when different request types require different handling strategies.
Use parallelization when independent work can improve throughput.
Introduce evaluator loops when output quality must be consistently enforced.
Adopt orchestrator workers when problems require multiple specialized perspectives.
Wrap workflows in durable execution when processes span time or involve human checkpoints.
Explore autonomous agents selectively for open-ended subtasks — with clear limits and safeguards.

You don’t need all patterns. In fact, most systems shouldn’t use all of them.

The real goal is simpler: apply the smallest set of patterns that delivers reliability, clarity, and operational confidence for the problem you’re solving.

Thanks
Sreeni Ramadorai