A practical, production-oriented guide to AI agents — from why demos break in production to the architecture choices, control surfaces, and failure modes that make them hold up. Patterns over products. No tool hype.
Examples use a fictional company, TechNova, as a running thread.
The Series
Part 1: The Demo Worked. Production Didn't.
Priya's refund went through on a shipped order. The model was right. The system around it wasn't. Why agent demos break the moment they meet production — and what the demo hid that production reveals.
Part 2: What Makes Something an Agent
Define what an agent actually is in engineering terms — a control loop with tools, state, and boundaries. The three primitives an agent composes (MCP for acting, RAG for knowing, Skills for following reusable procedures). The bridge from manual ReAct to native tool calling.
Part 3: How the Control Loop Actually Works
What happens turn by turn when the agent runs. State that carries across turns, stopping conditions as real decisions, and context as a finite engineering resource — not just a bigger window.
Part 4: Five Agent Patterns and the Control Surfaces That Make Them Safe
The five shapes an agent loop takes — prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer — and the nine control surfaces that decide whether each shape is safe to ship.
Part 5: Workflow, Agent, or Single LLM Call — How to Decide
Five practical architectures ordered from lowest cost to most flexible, and the one question that chooses among them: who decides the next step. Why hybrid is the steady-state shape for most production systems, and the warning signs that you reached too high on the ladder.
Part 6: Building the Production Agent Loop
Build a production agent from the loop up — the architecture map, tool contracts, a packaged procedure, state, budgets, and a trace. The core lesson: a tool response describes the request, not the world, so for irreversible actions the agent has to verify the world changed before it commits. A 200 OK is not proof.
Part 7: When the Loop Goes Wrong: Reading Agent Failures from the Trace
When the agent reports success but the world disagrees, the trace already recorded what happened. Read it before you reach for a bigger model or a blind retry: inspect the trace, classify the failure, decide whether a retry actually helps, and turn the failure into an eval so it cannot come back quietly.
Part 8: The Boundaries That Keep Agents Safe
A correctly running loop can still do more than its job if its authority was never bounded. The four questions that find the gap before it becomes an incident — what the agent can see, do, remember, and prove — plus the lifecycle discipline that keeps those boundaries in place after launch.
This series is complete. All eight parts are linked above.
Related Series in the AI in Practice Hub
All three series live at aiinpracticehub.com — the canonical hub, with a guided "where to start."
MCP in Practice — Read from the beginning
The Model Context Protocol from first principles — what MCP is, why it exists, and how to build production-grade tool servers and clients.
RAG in Practice — Read from the beginning
Retrieval-augmented generation from first principles — why AI gets things wrong, what RAG fixes, and how the full pipeline works.
Top comments (0)