Three Layers for Production-Grade Claude API Agents in Python

#claude #ai #python #agentskills

TL;DR

Most Claude API agent tutorials show the happy path. This one focuses on the three engineering layers that make agents actually reliable in production: (1) schema discipline in tool definitions, (2) a correct agentic loop that handles tool errors gracefully, and (3) a retry wrapper with exponential backoff and jitter. Ends with a structured output boundary using Pydantic and messages.parse().

All code is runnable. No placeholder functions.

The Problem With Most Agent Demos

Demo agents work in notebooks because notebooks run one cell at a time, tolerate manual retries, and have a human in the loop who can interpret a malformed response. Production agents do not have those affordances. They need to handle tool exceptions without crashing, survive API rate limits without user-visible errors, and produce output that downstream systems can parse reliably.

This guide walks through a complete customer order lookup pipeline that demonstrates all three layers. We use claude-sonnet-4-6 and the current anthropic Python SDK.

Setup

pip install anthropic pydantic
export ANTHROPIC_API_KEY="your-key-here"

import anthropic
import json
import time
import random
from typing import Any
from pydantic import BaseModel, Field

client = anthropic.Anthropic()

Layer 1: Tool Schema Design

Tool definitions are contracts. The model uses the description field to decide when to call a tool and uses the input_schema to construct arguments. Poor descriptions produce poor calls.

Three practices that eliminate the most common failure modes:

Negative constraints in the description: tell the model when NOT to use the tool.
Enums for finite value sets: prevents hallucinated parameter values entirely.
additionalProperties: false: prevents the model from inventing parameter names.

GET_ORDERS_TOOL = {
    "name": "get_customer_orders",
    "description": (
        "Retrieves all orders for a given customer ID. "
        "Use this tool when the user asks about order history, "
        "order status, or any order-related information for a specific customer. "
        "Do NOT use this tool to look up product information or inventory."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "customer_id": {
                "type": "string",
                "description": "The unique customer identifier, formatted as 'CUST-XXXXXX'.",
            },
            "status_filter": {
                "type": "string",
                "enum": ["pending", "shipped", "delivered", "cancelled", "all"],
                "description": "Filter orders by status. Defaults to 'all' if not specified.",
            },
            "limit": {
                "type": "integer",
                "description": "Maximum number of orders to return. Must be between 1 and 100.",
                "minimum": 1,
                "maximum": 100,
            },
        },
        "required": ["customer_id"],
        "additionalProperties": False,
    },
}

Layer 2: The Agentic Loop

The critical properties of a correct agentic loop:

Full conversation history on every turn: the model needs complete prior context, including past tool calls and results.
Tool errors returned as is_error results, not raised exceptions: lets the model attempt recovery rather than crashing the loop.

def get_customer_orders(
    customer_id: str,
    status_filter: str = "all",
    limit: int = 20,
) -> dict:
    """In production, replace with a real database client."""
    mock_orders = [
        {"order_id": "ORD-001", "status": "delivered", "total": 149.99, "date": "2026-03-15"},
        {"order_id": "ORD-002", "status": "shipped",   "total": 89.50,  "date": "2026-04-01"},
        {"order_id": "ORD-003", "status": "pending",   "total": 220.00, "date": "2026-04-07"},
    ]
    if status_filter != "all":
        mock_orders = [o for o in mock_orders if o["status"] == status_filter]
    return {
        "customer_id": customer_id,
        "orders": mock_orders[:limit],
        "count": len(mock_orders),
    }

TOOL_REGISTRY: dict[str, Any] = {
    "get_customer_orders": get_customer_orders,
}

def run_agent(
    user_message: str,
    tools: list[dict],
    model: str = "claude-sonnet-4-6",
) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model=model,
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        # Always append the full assistant response (including tool_use blocks).
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        if response.stop_reason != "tool_use":
            raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason!r}")

        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue
            try:
                fn = TOOL_REGISTRY.get(block.name)
                if fn is None:
                    raise ValueError(f"Unknown tool: {block.name!r}")
                result = fn(**block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result),
                })
            except Exception as exc:
                # Return errors to the model so it can attempt corrective action.
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"Error executing {block.name}: {exc}",
                    "is_error": True,
                })

        messages.append({"role": "user", "content": tool_results})

When is_error: true, the model sees the failure and can retry with different arguments, choose a different tool, or inform the user. Raising an exception, by contrast, terminates the loop with no opportunity for recovery.

Layer 3: API Call Retry Logic

LLM APIs return rate limit errors (429), transient server errors (500, 529), and connection failures regularly at production volume. Exponential backoff with jitter is the standard mitigation.

RETRYABLE_STATUS_CODES = {429, 500, 502, 503, 529}

def call_with_retry(
    client: anthropic.Anthropic,
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    **create_kwargs,
) -> anthropic.types.Message:
    """
    Exponential backoff + jitter for transient API errors.
    Never retries auth errors (401), permission errors (403), or validation errors (400).
    """
    for attempt in range(max_retries + 1):
        try:
            return client.messages.create(**create_kwargs)

        except anthropic.RateLimitError:
            if attempt == max_retries:
                raise
            delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
            print(f"[retry] Rate limited. Waiting {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)

        except anthropic.APIStatusError as exc:
            if exc.status_code not in RETRYABLE_STATUS_CODES or attempt == max_retries:
                raise
            delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
            print(f"[retry] HTTP {exc.status_code}. Waiting {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)

        except anthropic.APIConnectionError:
            if attempt == max_retries:
                raise
            delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
            print(f"[retry] Connection error. Waiting {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)

    raise RuntimeError("Unreachable")

Why jitter? Without it, all clients that hit a rate limit simultaneously will retry simultaneously, compounding the problem. Jitter distributes the retry load across time.

Why not retry 401/403/400? These indicate configuration problems (wrong key, missing permissions, invalid request body). Retrying them wastes quota and delays the operator's awareness of the real issue.

Structured Outputs at the Pipeline Boundary

For pipelines whose output is consumed programmatically, use messages.parse() with a Pydantic model as output_format. The SDK guarantees schema compliance before returning; validation errors surface at the API boundary rather than propagating silently through downstream services.

class OrderSummary(BaseModel):
    customer_id: str = Field(description="The customer ID queried.")
    total_orders: int = Field(description="Total number of orders found.")
    total_spend: float = Field(description="Sum of all order totals, in USD.")
    most_recent_status: str = Field(description="Status of the most recent order.")
    plain_summary: str = Field(description="One-sentence natural language summary for display.")

def get_order_summary(customer_id: str) -> OrderSummary:
    """
    Runs the full agentic pipeline and returns a Pydantic-validated summary.
    The agent calls tools autonomously; structured output applies to the final response only.
    """
    messages = [
        {
            "role": "user",
            "content": (
                f"Look up all orders for customer {customer_id} and provide a summary. "
                "Include the total number of orders, the sum of all order totals, "
                "the status of the most recent order, and a one-sentence plain-language summary."
            ),
        }
    ]
    response = client.messages.parse(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[GET_ORDERS_TOOL],
        messages=messages,
        output_format=OrderSummary,
    )
    return response.parsed

if __name__ == "__main__":
    summary = get_order_summary("CUST-123456")
    print(f"Customer:    {summary.customer_id}")
    print(f"Orders:      {summary.total_orders} | Total spend: ${summary.total_spend:.2f}")
    print(f"Most recent: {summary.most_recent_status}")
    print(f"Summary:     {summary.plain_summary}")

Expected output:

Customer:    CUST-123456
Orders:      3 | Total spend: $459.49
Most recent: pending
Summary:     Customer CUST-123456 has 3 orders totalling $459.49, with the most recent still pending as of April 2026.

Key Takeaways

Layer	What it prevents
Schema discipline (enums, `additionalProperties: false`, negative constraints)	Hallucinated parameters, wrong tool selection
`is_error` tool results instead of raised exceptions	Silent pipeline crashes, lost recovery opportunities
Exponential backoff with jitter	Rate limit outages, retry storms
`messages.parse()` with Pydantic	Silent schema drift, malformed data in downstream systems

Each layer is independently testable. The retry wrapper can be unit-tested against mock HTTP responses without touching the agentic loop. The tool registry can be tested with synthetic inputs without calling the API. The Pydantic schema can be validated against fixture data without running the agent at all.