DEV Community

Cover image for How Agents Work: The Patterns Behind the Magic
Agentic Loops
Agentic Loops

Posted on • Originally published at agenticloopsai.substack.com on

How Agents Work: The Patterns Behind the Magic

You open Claude Code and describe a task: “Migrate all tests from Jest to Vitest”

The agent reads 47 test files. Rewrites them. Runs the test suite. Gets 12 failures. Fixes them one by one. Updates package.json. Removes old dependencies. Runs tests again. All pass. Commits the changes.

You did nothing. The agent just... figured it out.

Or you’re using GitHub Copilot in agent mode. You paste an error message. It searches your codebase, finds the relevant files, identifies the bug, writes a fix, runs your tests, and opens a PR.

This feels like magic.

But it’s not magic. It’s a pattern. A surprisingly simple one.

The Secret: It’s Just a Loop

Here’s the entire pattern:

def agent_loop(prompt):
    context = [{"role": "user", "content": prompt}]

    while True:
        response = call_llm(context)

        if response.has_tool_calls:
            for tool_call in response.tool_calls:
                result = execute_tool(tool_call.name, tool_call.args)
                context.append({"role": "function", "content": result})
        else:
            return response.text
Enter fullscreen mode Exit fullscreen mode

When you call an LLM, you provide context (instructions, your prompt, and history). The LLM then responds in one of two ways:

  • I know the answer, here you go.
  • “I need more information. I see you have a tool—can you run it and let me know the result?

This simple pattern allows the LLM to run in a loop (autonomously) until it has figured it out.

Tools aren’t magic either. They’re just functions you expose:

tools = {
    'read_file': lambda path: open(path).read(),
    'write_file': lambda path, content: open(path, 'w').write(content),
    'run_command': lambda cmd: subprocess.run(cmd, shell=True, capture_output=True)
}
Enter fullscreen mode Exit fullscreen mode

Give an LLM a few tools like read_file, write_file or run_command and watch it become a developer.

Here’s what happens on each iteration:

  1. Build context — Combine the system prompt, conversation history, and tool results into a single payload
  2. Call the LLM — Send everything to the model and wait for a response
  3. Check for tool calls — The model either returns final text (done) or requests tool execution (continue)
  4. Execute and update context — Run the requested tools, add their outputs to context, loop back to step 2

The model doesn’t “remember” anything between API calls. Every call includes the full conversation. That’s why agents can reason across multiple steps—they see the entire history each time.

That’s the core execution loop most coding agents (and not only) build on. Of course, production agents are more complex than this. There’s context management, rate limiting, cost control, tool sandboxing, and more. We’ll cover that later. Here we’re focusing on the core pattern.

The Prompt Is the Personality

The loop is the skeleton. The prompt encodes behavior.

Every production agent ships with a carefully tuned system prompt that shapes *how* the model reasons. This isn’t “you are a helpful assistant”—it’s operational guidance:

You are a coding agent. Before writing code, read the existing files.
When tests fail, fix one at a time. Never delete files without confirmation.
If stuck after 3 attempts, explain what's blocking you and ask for help.
Prefer simple solutions. Avoid over-engineering.
Enter fullscreen mode Exit fullscreen mode

The prompt encodes:

  • Strategy — When to plan vs. act immediately
  • Guardrails — What actions require confirmation
  • Recovery behavior — How to handle repeated failures
  • Style — Terse or verbose, cautious or aggressive

Two agents with identical tools and the same model behave completely differently based on their prompts. GitHub Copilot’s agent uses a carefully crafted system prompt optimized for code assistance. Claude Code takes a bit different approach with system prompt and user prompts for different modes. Both work—for their use cases.

Ever wonder how popular coding agents like Claude Code or Codex work? We are collecting system prompts and internal configurations from popular AI agents in the agenticloops-ai/agentic-apps-internals repo — study them to see how the pros do it.

When debugging agent behavior, check the prompt first. The loop is usually fine. The instructions are usually the problem.

The Building Blocks

These aren’t historical stages—they’re tools in your toolkit. Pick the right one for your task.

Full code for these patterns is available on GitHub — fork it, break it, build on it: agenticloops-ai/agentic-ai-engineering

Level 1: One-Shot

You ask, it answers:

response = call_llm("Write a function to calculate fibonacci")
print(response)
Enter fullscreen mode Exit fullscreen mode

The model writes code it’s never run. That’s a gamble. No feedback, no iteration, no way to know if it works.

Level 2: Single Tool Call

The model can reach outside itself—once:

def llm_with_tools(prompt):
    tools = {
        'search': lambda q: search_web(q),
        'read_file': lambda path: open(path).read(),
        'calculator': lambda expr: safe_calc(expr)
    }

    response = call_llm(prompt, available_tools=tools)

    if response.wants_to_use_tool:
        result = tools[response.tool_name](response.tool_input)
        return result
Enter fullscreen mode Exit fullscreen mode

Note : This is conceptual pseudocode. Real implementations need schema validation, error handling, and sandboxing.

Now the model can search for information, read files, calculate things. But it only gets one shot. If the search fails or the code has a bug, it’s stuck.

Level 3: The ReAct Loop (Reason + Act)

The breakthrough came in 2022 with the ReAct paper from Princeton and Google. The insight: let the model use tools in a loop.

def react_loop(goal):
    context = [{"role": "user", "content": goal}]

    while True:
        response = call_llm(context)

        print(f"Thought: {response.reasoning}")

        if response.is_final_answer:
            return response.text

        if response.wants_tool:
            result = execute_tool(response.tool_name, response.tool_input)
            print(f"Observation: {result}")

            context.append(response.as_message())
            context.append({"role": "user", "content": f"Observation: {result}"})
Enter fullscreen mode Exit fullscreen mode

This unlocks genuine problem-solving.

Example: You ask “What’s the weather in the city where the Eiffel Tower is located?”

Thought: I need to know which city has the Eiffel Tower
Action: search("Eiffel Tower location")
Observation: The Eiffel Tower is in Paris, France

Thought: Now I can search for Paris weather
Action: search("Paris weather")
Observation: Paris: 18°C, partly cloudy

Answer: It's 18°C and partly cloudy in Paris.
Enter fullscreen mode Exit fullscreen mode

The model chains actions together. Each observation informs the next thought.

For code generation, this is transformative:

Thought: I'll write the fibonacci function
Action: write_file("fib.py", "def fib(n):\n    return n if n < 2 else fib(n-1) + fib(n-2)")

Thought: Let me test it
Action: run_command("python -c 'from fib import fib; print(fib(10))'")
Observation: 55

Thought: It works. Let me add edge cases
Action: write_file("test_fib.py", "...")
Action: run_command("pytest test_fib.py")
Observation: FAILED - RecursionError for fib(1000)

Thought: I need to optimize this
Action: write_file("fib.py", "def fib(n):\n    a, b = 0, 1\n    for _ in range(n):\n        a, b = b, a + b\n    return a")
Action: run_command("pytest test_fib.py")
Observation: All tests passed

Answer: Done. Fibonacci function with tests.
Enter fullscreen mode Exit fullscreen mode

The key: let the model see what happened and decide what to try next.

Here’s the execution flow:

Agentic Loop

The loop continues until verification passes or max iterations reached.

Level 4: Planning Patterns

ReAct is reactive—it figures things out step by step. But for complex tasks, you want planning.

The Planning Pattern breaks work into steps first:

def planning_agent(task):
    # Step 1: Create a plan
    plan_prompt = f"Break this into steps: {task}"
    plan = call_llm(plan_prompt)

    # Step 2: Execute each step
    results = []
    for step in plan.steps:
        result = react_loop(step)  # Use ReAct for each step
        results.append(result)

    # Step 3: Synthesize
    return call_llm(f"Combine these results: {results}")
Enter fullscreen mode Exit fullscreen mode

Example:Build a REST API with authentication

The agent plans:

  1. Design database schema
  2. Create user model
  3. Implement auth endpoints
  4. Add JWT token handling
  5. Write tests
  6. Deploy

Then executes each step using ReAct loops. Each step can iterate, use tools, recover from errors.

When planning helps:

  • Multi-file changes that need coordination
  • Architecture decisions before coding
  • Tasks with clear dependencies (must do A before B)
  • Large refactors where you need the full picture first

When ReAct is better:

  • Single-file edits or small changes
  • Bug fixes where you explore the problem
  • Responding to errors as they appear
  • Tasks where requirements emerge during work

Key difference: Planning is top-down (design then execute). ReAct is exploratory (act, observe, adjust).

Real agents often combine both:

  1. Use Planning to break down the architecture
  2. Use ReAct within each step to handle details and errors
  3. Planning gives structure, ReAct gives adaptability

Pattern Comparison: Real Example

Let’s say you ask: “Add rate limiting to our API”

One-shot LLM:

Here's rate limiting middleware:
[generates code]
Enter fullscreen mode Exit fullscreen mode

Maybe it works, maybe not. No way to know.

Tool use (single call):

[generates code]
[runs tests once]
Error: Module 'redis' not found
Enter fullscreen mode Exit fullscreen mode

Stuck. Can’t fix the error.

ReAct loop:

Thought: I'll add rate limiting with Redis
Action: write_file("middleware/rate_limit.py", "...")
Action: run_command("python middleware/rate_limit.py")
Observation: Error: Module 'redis' not found

Thought: Need to install redis
Action: run_command("pip install redis")
Observation: Successfully installed

Action: run_command("python middleware/rate_limit.py")
Observation: Error: REDIS_URL not set

Thought: Need to add config
Action: write_file(".env.example", "REDIS_URL=redis://localhost")
Action: run_command("pytest")
Observation: All tests pass

Answer: Rate limiting added and working
Enter fullscreen mode Exit fullscreen mode

Planning pattern:

Plan:
1. Choose rate limiting library
2. Set up Redis connection
3. Implement middleware
4. Add tests
5. Update configuration docs

[Executes each step with ReAct]
Enter fullscreen mode Exit fullscreen mode

The Ralph Mode: Wrapping ReAct in an Outer Loop

Ralph (original concept by Geoffrey Huntley) extends the agentic loop pattern by adding an outer loop. Instead of one agent session, run the agent repeatedly until the entire project is done.

The Core Pattern

def ralph_loop(task_prompt, max_iterations=100):
    for iteration in range(max_iterations):
        # Run full ReAct agent session
        react_agent(prompt=task_prompt)

        # Check if done
        if verify_complete():
            return

        # Agent context resets, but file system persists:
        # - Git history shows all previous attempts
        # - Modified files reflect cumulative changes
        # - progress.txt tracks what was tried
        # - AGENTS.md accumulates learned patterns
Enter fullscreen mode Exit fullscreen mode

The key insight: Agent context resets each iteration (no token limit issues), but state persists through files.

”Better to fail predictably than succeed unpredictably.” — Geoffrey Huntley

Ralph accepts that agents will make mistakes. The question isn’t how to prevent errors—it’s how to make them visible and recoverable. Each iteration adds information. The loop converges toward success.

How It Works in Practice

Setup (following Ryan Carson’s approach):

  1. Write PRD with feature requirements
  2. Convert to atomic user stories (each fits in one context window)
  3. Create completion criteria
  4. Run the loop

Iteration example:

Iteration 1:
- Implements user story 1
- Breaks tests
- Commits with "WIP: story 1 attempt"

Iteration 2:
- Reads progress.txt: "Iteration 1 broke auth tests"
- Reads git log: sees what changed
- Fixes the tests
- Updates progress.txt
- Moves to story 2
Enter fullscreen mode Exit fullscreen mode

Memory between iterations:

  • progress.txt — iteration-to-iteration notes
  • AGENTS.md — permanent patterns and conventions
  • Git history — what was tried and why
  • Modified files — cumulative changes

When Ralph Works

Best for:

  • Large refactors (100+ files)
  • Feature implementation with clear requirements
  • Pattern migrations across codebase
  • Test coverage for existing code

Requires:

  • Clear success criteria (tests pass, linter clean)
  • Atomic tasks (each story fits in one context)
  • Good verification (actual checks, not LLM claims)

Doesn’t work for:

  • Vague requirements (”make it better”)
  • Architecture decisions
  • Creative/subjective work

Next Frontier: Agent Orchestration

Ralph runs one agent in a loop. The next step is running 20-30 agents in parallel — coordinated swarms across a codebase. Projects like Loom, Claude Flow, and Gas Town are pushing this boundary. Early days, high costs, wild failure modes — but the direction is clear.

We’ll cover multi-agent orchestration patterns in a dedicated post.


Everything Else Is Engineering

Once you understand the core patterns (ReAct, Planning, Ralph), everything else is software engineering. The loop is simple. Making it production-ready is where the real work is.

Production concerns:

  1. Context window management — Summarization, sliding windows, sub-agents
  2. Tool design — Task-specific tool sets, schema validation
  3. Cost control — Budget tracking, early exit, prompt caching
  4. Rate limiting — API quotas, exponential backoff
  5. Error handling — Retries, circuit breakers, graceful degradation
  6. Observability — Logging, tracing, replay for debugging
  7. Safety & sandboxing — Permission controls, execution limits
  8. Verification — Tests, linters, “definition of done” gates, evals

Sounds familiar? These are the same concerns you're already solving in distributed systems, microservices, and streaming pipelines. You're not learning a new discipline. You're applying good engineering to a new runtime.


From Theory to Practice

The best way to understand agents is to build one. You’ll learn more in a weekend than reading 100 blog posts.

Start with a minimal agent:

The full working code is on GitHub — clone it and experiment: minimal_agent.py

import subprocess
import anthropic

client = anthropic.Anthropic()

TOOLS = [{
    "name": "bash",
    "description": "Run a bash command",
    "input_schema": {
        "type": "object",
        "properties": {"command": {"type": "string"}},
        "required": ["command"],
    },
}]

def agent(goal: str) -> str:
    messages = [{"role": "user", "content": goal}]

    for _ in range(10):
        response = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=4096, messages=messages, tools=TOOLS
        )
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            return response.content[0].text

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                # WARNING: Unsafe - use sandboxing in production
                result = subprocess.run(
                    block.input["command"], shell=True, capture_output=True, text=True, timeout=30
                )
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result.stdout or result.stderr,
                })
        messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached"

# Try it
if __name__ == "__main__":
    print(agent("List Python files in current directory and count lines in each"))

Enter fullscreen mode Exit fullscreen mode

That’s ~50 lines. Now you have a working agent.

Then level up:

  1. Add more tools (read_file, write_file)
  2. Implement cost tracking
  3. Add better error handling
  4. Build verification checks
  5. Try a small real task

Pattern Selection Guide

Choose based on your task:

  • One-shot LLM - Quick questions, text generation, explanations, no tools needed
  • Tool use - Needs current data, simple calculations, one external call
  • ReAct loop - Multi-step problems, needs iteration, can fail and retry, most coding tasks
  • Planning pattern - Complex architecture, multiple files, clear stages, dependencies between steps
  • Ralph pattern - Large scale (100+ files), mechanical work, clear success criteria, can run for hours

The progression isn’t replacing patterns. It’s adding options. Start simple, add what you need.


Real Results

The future isn’t coming. It’s already shipping code.

Resources and Further Reading

Core Papers:

Practical Guides:

Tools/API:


Conclusion

The magic of Claude Code and GitHub Copilot isn’t the LLM. It’s the loop.

The pattern is simple: Reason → Act → Observe → Repeat

But this simplicity creates genuine problem-solving capability. We’ve moved from AI that generates text to AI that accomplishes tasks.

The patterns:

  • Agentic loop (ReAct): For iterative problem-solving
  • Planning: For complex multi-step tasks
  • Ralph: For autonomous large-scale work

None of this requires fancy frameworks. Just an LLM API, some tools, and a loop.

Build one this weekend. You’ll understand agents better than reading 100 blog posts.


Full code for these patterns is available at agenticloops-ai/agentic-ai-engineering on GitHub — fork it, break it, build on it.

We’re publishing agent engineering content every week. No hype. Just code and learned patterns.

Coming next week: Disassembling AI Agents Part 1: How GitHub Copilot works?

What patterns are you using in production? What’s breaking? What’s working? Share in the comments—we’re building this community together.

Top comments (0)