Gursharan Singh

Posted on May 27 • Edited on Jul 2

AI Agents in Practice — Part 3: How the Control Loop Actually Works

#agents #ai #architecture #llm

Part 3 of 8 — AI Agents in Practice

Previous - What Makes Something an Agent? (Part 2)

Part 2 named the control loop in five words: observe → decide → act → check → repeat.

That's the shape. Here's what it looks like in actual production, four turns into a multi-turn cancellation case:

Turn 1. Priya: "I'd like to cancel order #4471 and get a refund."
Agent observes the request, decides to check order status first, calls get_order_status(4471).

Turn 2. Tool returns: "status: shipped, carrier: FedEx, tracking: 1Z…, estimated delivery: tomorrow."
Agent observes the result, decides the cancellation procedure says don't cancel shipped orders, plans to offer return or escalation.

Turn 3. Agent to Priya: "This order shipped yesterday — would you like me to start a return when it arrives, or connect you with a human agent?"

Turn 4. Priya hasn't replied yet. The conversation is paused on a decision the agent isn't allowed to make alone. The active context now holds: the original cancellation request, the order status, the procedure decision, the offered options, and the waiting state.

By turn four, three engineering problems are alive at the same time:

State — the agent's working state has a paused task waiting on Priya's choice: start a return after delivery, or hand off to a human agent.
Stopping — the original task is paused, not done. When does this conversation end? Which outcome counts as "complete"?
Context — the active context window holds tool outputs, retrieval text, planning notes, and an in-progress decision. Some of this is needed for the next turn. Some is exhaust.

The five-word loop hasn't changed. But each step now has to do real work — and the wrong answer to any of these three problems is what makes production agents fail in the ways Part 1 named.

This article is about each problem, in order.

We'll walk through what each loop step actually does, then dig into state discipline, stopping discipline, context discipline, and traces.

The Loop in Five Words

The loop from Part 2: observe → decide → act → check → repeat. Same five words. Different question now: what does each step actually do?

Step	What it does
Observe	Gather the current working state — the relevant pieces for this turn (the user's most recent request, the current task, the tool results from the previous turn, the constraints from any active skill). Observe is a curation step, not a dump.
Decide	Model chooses the next action: call a tool, ask the user a question, or stop. The decision is constrained by what tools are available, what the current state allows, and what the procedure (if any) says is the next legitimate step.
Act	Whatever was decided actually runs — a tool executes, a message is sent, a skill is invoked. The act is what changes things in the world — and what most production failures actually do damage through.
Check	Result flows back. Tool returned what the agent expected, or something different, or it failed, or it timed out. The check step reads what actually happened, not what was intended.
Repeat	Loop runs again with new state, until the agent decides it's done, escalates, or the controller breaks it.

The loop runs in this order on every turn. The order is the mechanism. The mechanism is what creates room for production-grade behavior: pre-checks before destructive actions, escalation paths before commitment, observation before re-decision.

A control loop that observes after deciding is just a script with hallucination.¹

One practical detail matters here because it shapes the decide step directly:

The tool description is the decision interface.

When the model picks a tool in the decide step, it isn't reading source code or API docs — it's reading the name, the short description, and the argument schema the application exposes. That surface is what the agent decides against. Omit failure behavior and the agent retries on permanent errors; omit when-to-use guidance and it calls the wrong tool confidently. Part 6 shows how this surface is designed in the production build.

Planning Happens Inside the Loop

One common misconception: agents plan once, then execute the plan unchanged. As if planning is a separate phase that produces a sequence of steps the agent then performs.

That's not how production agents work. Planning happens inside the loop, on each turn, as part of the decide step.

The ReAct pattern (Reasoning → Action → Observation) makes this concrete. Each turn, the model takes stock of where things stand, chooses a next action, watches the result come back, and takes stock again with that new information. The reasoning isn't a single up-front plan; it's a renewed decision each turn.

This matters because plans go stale as soon as the world answers back. At turn one, the reasonable plan might be: cancel the order, then refund the customer. But turn two changes the situation: the tool says the order already shipped. Now the original plan is not just incomplete — it is unsafe. If the agent treats the first plan as fixed, it keeps moving toward the wrong action. If planning happens inside the loop, each new observation can invalidate, narrow, or replace the plan before the next action runs.

Planning inside the loop also creates a debugging problem: when the agent changes direction, what tells you why? That is where visible reasoning helps. The point is not to show private reasoning to the user. The point is to record a safe, inspectable trace of the decision: what state the model saw, what action it chose, and why that action looked valid at that moment. Without that, the agent may still work, but the team cannot explain or debug its behavior. (Part 7 covers traces as their own discipline.)

Brief contrast: a workflow plans up front (the developer wrote the steps). An agent re-plans inside the loop (the model picks the step). The same task can be done by either; the choice depends on whether the steps need to adapt to what comes back. That choice — agent or workflow — is Part 5's question.

State Carries Across Turns

State isn't just "what the agent knows." State is what carries across turns: the working facts the next turn needs, and the task's lifecycle condition. The lifecycle part is a set of recognizable conditions the agent transitions between, and each condition changes what the agent is allowed to do next.

The TechNova cancellation case can be modeled as a small state flow.

The common path moves through open → needs-info → needs-approval, with escalated, acting → complete, and blocked as branches the case can land in.

These aren't decorative labels. Each state changes what actions are allowed. From needs-approval, the agent cannot call cancel_order without first receiving customer confirmation. From complete, the agent should not be making more tool calls. From escalated, the agent's job is to summarize and stop, not to keep working.

The cancellation case walks through this:

Turn 1 — state is open. Priya asks to cancel.
Turn 2 — state moves to needs-info. Agent fetches order status.
Turn 3 — order is shipped. State moves to needs-approval for the alternative (return or escalation). Agent presents options.
Turn 4 — Priya hasn't replied. State is paused, still needs-approval. The rule: paused tasks waiting on customer choice should not be silently re-decided.

Production agents handle this by modeling state explicitly — a state object passed turn-to-turn, a status field in a database, a structured tag in the system context — not by hoping the model keeps track of it in the prompt. The form varies; the discipline doesn't: state changes are first-class events the system records and can react to, not implicit transitions in natural language.

We will get into implementation patterns later. For now, the key discipline is simple: state changes should be explicit, recorded, and available to the next turn.

When Does the Loop Stop?

Stopping is a decision, not an emergent property.

Part 1 said: "the demo stops when the engineer stops it; production agents have to stop themselves." That sentence hides four distinct stopping conditions production agents actually need:

Final answer. The agent has done what was asked and produced the user-facing result. Stop and return. This is the cleanest stop, and the easiest to get wrong — the agent thinks the task is done when the side effects didn't actually complete.
Maximum iterations. A bounded loop count. If the agent hasn't reached a final answer in N turns, stop and report what it tried. This protects against infinite loops that compound cost and damage. The bound is a real engineering choice — too low and useful work gets cut off; too high and runaway loops eat money before anyone notices.
Blocked. The agent cannot proceed without a piece of information or a permission it doesn't have. Stop, summarize what's blocking, hand off to whatever can unblock it (the user, a human agent, a different system).
Escalated. The agent recognizes the case is outside its authority. Not a failure — a designed handoff. Stop the agent loop, route to a human or a more-authorized system, and let that system pick up the case.

Blocked and escalated are related, but they are not the same. Blocked means the agent is missing something required to continue: information, permission, or a system result. Escalated means the agent has enough information to know the case is outside its authority. Blocked asks, "What do I need before I can continue?" Escalated says, "I should not continue."

In Priya's case, the loop does not end just because the first action failed. It changes shape. If Priya chooses a return, the agent may move into an acting state and complete the return flow. If she chooses a human agent, the agent stops by escalation. If she does not reply, the task remains blocked on customer input. Same conversation, different valid stopping points depending on what happens next.

Two production failure modes around stopping, both worth naming:

The agent stops when it shouldn't — it says "Done!" but the side effects didn't complete, or completed wrongly. This is Part 1's confident-and-wrong failure mode at the stopping boundary.
The agent doesn't stop when it should — it keeps retrying, keeps re-planning, keeps looping. Every turn costs tokens and time; destructive non-idempotent actions multiply real damage.

We will come back to detection and enforcement later. Here, the key point is simpler: production agents need explicit stopping conditions, not just a hope that the loop ends cleanly.

Context Is a Real Engineering Resource

The model's context window is finite. That sentence sounds obvious, but most demos hide its consequences.

In a demo, the context fits. The conversation is short, the tool outputs are small, the retrieval is precise. The model has all the room it needs to reason.

In production, by turn four, the context is full of:

System prompt and tool descriptions — the stable preamble that has to be present every turn.
Conversation history — every user turn, every agent turn.
Tool outputs — order status, retrieval results, error messages, partial successes.
Retrieved policy text and any skill files loaded for the current task.
Reasoning notes, plans, and attempts — including half-completed work and course corrections from earlier turns.

By turn ten, all of that has compounded. The model still has the same finite attention budget. The signal-to-noise ratio has degraded. Important state from turn two may be buried under tool outputs from turn seven.

Bigger context windows do not fix this — they delay it. A 1M-token window holding 1M tokens of mostly-stale content makes worse decisions than a 50K window holding 50K tokens of curated working state. The size of the window isn't the variable; the quality of what's in the window is.

Two things start happening as the context fills with noise:

Context drift. The model's decisions start drifting because the active context is polluted with stale state. A plan from turn two may still look fresh to the model on turn nine, even though turn three already invalidated it. (Compounding effect: tokens buried mid-window can get less attention than tokens near the edges — critical state in the middle can be effectively invisible.)
Cost compounding. Every turn pays the token cost of the entire context. Every extra token of stale context is something you pay for again on every turn. Prompt caching discounts what's unchanged, but the accumulated stale context is still carried and paid for, and caching does nothing for the attention problem above.

So context is a resource. It has a budget. It needs management. That's not premature optimization — that's the realistic engineering reality of multi-turn production agents.

Context Cleanup Is a State Pipeline

Context cleanup is what keeps multi-turn agents from drowning in their own output.

The instinct, when context fills up, is to summarize. That instinct is incomplete. Generic summarization compresses everything indiscriminately, which loses the distinction between active working state (still needed) and exhaust (no longer needed). After summarization, the agent has a smaller context — but the smaller context still contains the same proportions of signal and noise.

The better discipline:

Context cleanup is a state pipeline, not generic summarization.

The pipeline: raw output → parse → extract useful facts → update current state → archive raw output → drop junk from active context.

The discipline applies turn-by-turn, not only at compaction time.

This is the central context-management move for production agents.

Walk through the pipeline on a tool output:

Raw output. Tool returns 500 lines of test logs.
Parse. The system identifies the structure — pass/fail counts, error messages, stack traces.
Extract useful facts. Only the failing tests and their error reasons are needed for the next decision.
Update current state. The agent's working state now includes "tests X and Y failed with reason Z."
Archive raw output. The full 500 lines go to a log store the agent can retrieve from if needed later.
Drop junk from active context. The 500 lines do not stay in the active context window.

Same pipeline applies to:

Tool outputs — extract the useful structured facts; archive the rest.
Old plans — when a new observation invalidates a plan, archive the old plan; do not keep both active.
Stale attempts — when a tool call fails permanently (shipped order can't be cancelled), record the conclusion (do not retry cancel_order; order is shipped); drop the full retry chain.
Duplicate state — the same fact expressed three different ways in different turns becomes one canonical state field.
Reasoning notes — the conclusion stays; the deliberation that produced it can be archived.

In the TechNova cancellation case, the active state should keep facts like order shipped and waiting on Priya's choice. The full tool response, the earlier cancel-then-refund plan, and any failed retry details belong in the archive, not in the active working context.

Generic summarization vs the state pipeline:

Generic summarization (what most teams try)	State pipeline (what works)
Compress everything at the end of the turn	Process turn-by-turn, every turn
Loses the distinction between active state and exhaust	Active state preserved; exhaust archived
Smaller context, same signal-to-noise ratio	Smaller context, better signal-to-noise
Model reasoning still drifts on stale data	Model reasoning grounded in current state

The agent's active context after cleanup is small, curated, and accurate. The archive is searchable if something becomes relevant again.

This is a turn-by-turn discipline. Most agents don't get this right by accident. It has to be built into the loop's check step: every turn, the system asks what new state did this turn produce, and what exhaust can be archived?

We will get into storage, retrieval, and implementation patterns later. For now, the core idea is that cleanup belongs inside the loop, not as an occasional afterthought.

Tracing the Loop, Turn by Turn

Everything in this article is invisible without traces.

A trace records, for each turn: what the agent observed (the working state at the start of the turn), what it decided (the reasoning and the chosen action), what it did (the tool call and arguments), what came back (the tool output), and how the state changed (state transition).

That structure isn't optional. It's how you debug production agents. When Priya's refund-on-a-shipped-order happens in production, the only useful artifact is the trace of that conversation's loop. Did the agent observe the shipping status? What did it decide based on what it saw? Did the tool description tell it shipped orders can't be cancelled? Did the state transition correctly?

At minimum, the trace should show three layers:

Tool call traces — what the agent called, with what arguments, and what came back.
Decision traces — what the model was reasoning about on each turn.
State transitions — what state the agent was in, before and after each act.

Part 7 covers traces and evaluations as their own discipline. Part 3's job is just to say: the loop has to be inspectable, every turn, or none of the discipline in this article is verifiable.

A control loop you can't inspect is a control loop you can't trust.

Three takeaways

The loop is the easy part. The patterns wrapped around the loop are what determine production behavior. Observe → decide → act → check → repeat is a shape. What turns the shape into a working system is state discipline, stopping discipline, context discipline, and trace discipline.
Context cleanup is not generic summarization. It is a state pipeline. Raw output → parse → extract → update state → archive → drop. Turn by turn. The discipline that keeps multi-turn agents from drowning in their own output.
A control loop you can't inspect is a control loop you can't trust. Traces aren't a debugging convenience. They're how a team reconstructs what the agent saw, what it chose, and what changed.

Looking ahead

We now have the loop, the state, the stopping condition, the context discipline, and the trace. What we do not have yet is the catalogue of shapes production agents use to arrange these mechanics — prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer — and the control surfaces (tool access, memory, approval, escalation, termination, and more) that decide whether each shape is safe to ship. That is Part 4.

Part of AI in Practice — three practical series on MCP, RAG, and AI Agents, focused on why these patterns exist, where they break, and how to think through the engineering decisions behind them.

This series uses "control loop" as the primary term throughout. Some sources call the same mechanism an "action-feedback loop." Both phrases describe the same thing; consistency in this series helps the reader build a single mental model across the series. ↩

Top comments (4)

Harjot Singh • May 31

The five-word loop is clean, and the word doing the heavy lifting is check. Most people draw the loop as observe-decide-act-repeat and quietly drop the check, which is exactly why their demo dies in production, an agent that acts and immediately loops without verifying the act landed is how you get the seven-empty-docs and the confidently-wrong-refund. The check step is where reliability lives. A cancellation-plus-refund case is the perfect example because the stakes are asymmetric and irreversible: act without checking that the cancel actually succeeded before you fire the refund, and you've either double-refunded or refunded an order that's still active. The discipline that makes the loop safe is that check has to verify the world changed (query the order status), not trust the tool's return or the model's belief that it worked. And on irreversible steps, check-before-next-act becomes confirm-before-act. That observe-decide-act-CHECK-repeat with a real verification gate is the core of how I build agent loops in Moonshift. In your cancellation flow, does the check re-read ground-truth state, or does it trust the tool response that the cancel went through?

Gursharan Singh • May 31

Harjot — really appreciate this. You're right that check is doing more work than Part 3 made visible, and your framing is sharper: tool responses describe the request, not necessarily the world.

For irreversible actions like cancel-then-refund, the safe pattern is a ground-truth re-read of get_order_status after the cancel, not trust in the tool's return. A 200 OK might mean the request was accepted, queued, retried, or partially processed — not that the business state is already safe for the next action. Your "confirm-before-act" reframing captures that better than the article did.

This is exactly the kind of distinction I need to make concrete when I get into the build/architecture part of the series. Curious about Moonshift — do you treat the verification gate as a separate step in the loop, or fold it into check itself?

Harjot Singh • May 31

Separate step, deliberately. In Moonshift the loop is propose-act-verify-commit, and verify is its own phase with its own typed pass/fail, not folded into the agent's check, because the moment verification lives inside the same model call that produced the action, it inherits the same blind spots and you get the model grading its own homework. Keeping it separate means it can be a different mechanism entirely: a schema/postcondition check, a re-read of ground truth (exactly your get_order_status-after-cancel pattern), or a different model, and only a real pass lets the loop commit and move on. The confirm-before-act you described is precisely the gate, the action proposes, the verify step confirms the world actually changed the way the tool claimed, and the irreversible ones do not commit without it. Fold it into check and you lose the independence that makes it trustworthy. Looking forward to the build part of your series, this is the piece I'd most want to see you make concrete.

Gursharan Singh • Jun 1

Got it — separate step, deliberately. “Model grading its own homework” is the cleanest argument I’ve seen for why verification needs independence.

I’m going to keep the five-word loop as-is for the series, but your point changes how I’ll explain the build version: check is still the loop step, and for irreversible actions, check needs a real verify-before-commit gate.

Thanks Harjot — this exchange was genuinely useful.