Why Strict JSON Mode Doesn't Stop Hallucinated Tool Calls

#agents #llm #python #ai

Book: AI Agents Pocket Guide
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You've seen the postmortem. Strict JSON mode is on, the schema is tight, the trace shows strict: true, every payload validates. And yet the agent called delete_user(user_id="usr_4f9...") against a user who never existed, with a reason field copied from a different ticket entirely. The Pydantic model was happy. The database was not.

This is the failure mode that surprises teams who treat structured outputs as a safety boundary. Strict mode does exactly what the docs say it does, no more. Anthropic's strict tool use guarantees "Tool input strictly follows the input_schema" and "Tool name is always valid (from provided tools or server tools)." OpenAI's Structured Outputs makes the same kind of promise in its platform docs guide (see also the original announcement): the model won't drop a required key or invent an enum value outside the schema.

Both guarantees are real. Neither says the values are true.

The token-level constraint that produces strict mode operates on grammar, not facts. A user_id: str field with pattern: ^usr_[a-f0-9]{8}$ will always emit a string of the right shape. Whether that string identifies a user who exists in your database is a question the sampler cannot answer.

Four shapes in the wild, then the layer that catches all of them.

1. Hallucinated parameter values

The most common one. The model invents an ID, a date, an email, a project name, and writes it into a perfectly-shaped field. Schema-valid. Factually wrong.

import anthropic
from pydantic import BaseModel, Field

client = anthropic.Anthropic()

class CancelOrder(BaseModel):
    order_id: str = Field(pattern=r"^ord_[a-f0-9]{12}$")
    reason: str

CANCEL_TOOL = {
    "name": "cancel_order",
    "description": "Cancel a customer order by id.",
    "strict": True,
    "input_schema": CancelOrder.model_json_schema(),
}

r = client.messages.create(
    model="claude-sonnet-4-7",
    max_tokens=300,
    tools=[CANCEL_TOOL],
    tool_choice={"type": "tool", "name": "cancel_order"},
    messages=[{
        "role": "user",
        "content": (
            "The customer asked us to cancel their most "
            "recent order. They did not give an order id."
        ),
    }],
)

The model has no order ID. It has been told to call cancel_order, so it will. The order_id will match the regex: twelve hex characters that no row in your orders table has ever held. The schema check passes and the handler runs. Best case it raises OrderNotFound; worst case a defensive except swallows the error and the customer gets "Your order has been cancelled" in the chat reply.

The fix is not in the schema layer. The schema cannot know which order IDs exist. Only your application can.

2. Schema-valid but semantically wrong

The values exist. They just refer to the wrong thing.

class TransferFunds(BaseModel):
    from_account: str = Field(pattern=r"^acc_[0-9]{10}$")
    to_account: str = Field(pattern=r"^acc_[0-9]{10}$")
    amount_cents: int = Field(ge=1)

The conversation has two account IDs in it: the user's own account and the recipient's. The model swaps them. The Pydantic model is satisfied (both fields are valid account-id strings), and the transfer goes the wrong direction. The schema layer has nothing to say about this. Both values pass the regex, the typing is correct, and the integer range is fine. Schema validation is over; the bug is downstream.

This is the most painful version because it is invisible at the API boundary. The audit log shows a clean tool_use block. The 400-or-200 dashboard shows 200. The customer support ticket says "my money went to a stranger."

3. Phantom tool names (when strict mode is off, or with mixed providers)

Both Anthropic strict mode and OpenAI Structured Outputs constrain tool names against the provided list. With strict on, the model cannot emit a tool the API doesn't know about — that part is real.

The phantom-tool failure shows up in three places where that constraint is missing:

Strict not enabled. Default tool definitions on Anthropic do not enable strict; you opt in per tool with "strict": true.
Output is text, not a structured block. Some agent frameworks instruct the model to "respond with {"tool": "<name>", "args": {...}} JSON." That is not a real tool call. There is no API-side constraint on the tool field. The model will invent delete_account if your prompt mentions accounts.
Multi-step traces with tool lists that change. Step 1 exposes [dispatch_refund, cancel_order]; step 3 narrows the palette to [cancel_order] only. The model still has the step-1 tools in conversation history. It recalls dispatch_refund and calls it again. If your application doesn't validate the name against the current step's tool list, you'll dispatch a refund the current palette had explicitly removed.

The defense is mechanical: re-validate the tool name against the active tool list on every step, including textual JSON envelopes that bypass the API-side constraint entirely.

4. Coerced types and field-order assumptions

Not strictly hallucinations, but the same shape of bug because they pass schema validation. Two flavors:

Stringified numbers that downstream code parses back wrong. Your schema says amount: number. Strict mode emits a JSON number. Fine. But somewhere in your pipeline you serialize the tool input to a queue, deserialize on the worker, and the worker uses a JSON parser whose configuration treats numerics as strings (a real configuration option in some Java and Go decoders, e.g. Go's json.Number or a Jackson BigDecimal setup). 0 becomes "0", 0.1 becomes "0.1", and the comparison if amount > 0 is suddenly true for non-empty strings on dynamic-typed languages downstream.

Field-order assumptions the spec does not preserve. The JSON spec does not require object members to be ordered. JSON Schema does not enforce ordering either. If your validation logic walks the input dict expecting from_account before to_account for a left-to-right "looks reasonable" check, you are relying on a property the spec disclaims.

The defense: validate values, not just shapes

Schema validation is the cheapest layer. It catches the easiest failures. It is not the last layer. The pattern that catches the failures above:

from pydantic import BaseModel, Field, ValidationError
from typing import Literal

class CancelOrder(BaseModel):
    order_id: str = Field(pattern=r"^ord_[a-f0-9]{12}$")
    reason: Literal[
        "customer_request",
        "fraud",
        "duplicate",
        "out_of_stock",
    ]

def handle_cancel_order(
    args: dict,
    *,
    user_id: str,
    active_tools: set[str],
    tool_name: str,
) -> dict:
    if tool_name not in active_tools:
        return _tool_error(
            f"Tool '{tool_name}' is not in the current "
            f"tool list. Re-issue from: {sorted(active_tools)}"
        )
    try:
        parsed = CancelOrder.model_validate(args)
    except ValidationError as e:
        return _tool_error(f"Schema validation failed: {e}")

    order = db.orders.get(parsed.order_id)
    if order is None:
        return _tool_error(
            f"order_id '{parsed.order_id}' does not exist. "
            f"Ask the user for their order id, or call "
            f"`list_orders(user_id={user_id!r})` first."
        )
    if order.user_id != user_id:
        return _tool_error(
            f"order_id '{parsed.order_id}' belongs to a "
            f"different user. Refusing to cancel."
        )
    if order.status in ("shipped", "delivered"):
        return _tool_error(
            f"Order is in status '{order.status}'. "
            f"Cancellation requires an unshipped order."
        )

    return _cancel(order, parsed.reason)

def _tool_error(message: str) -> dict:
    return {"error": message, "is_error": True}

Each layer catches a class the others can't:

Tool-name allowlist against the current step's tool list. Catches phantom tools and stale names from earlier steps.
Pydantic validation with Literal enums and regex patterns. Catches type coercion and shape drift.
Business-rule checks against your database. Catches hallucinated IDs, cross-tenant references, and state-based illegality (cancelling a shipped order). This is the layer the schema cannot replace.

The error returns are the second half of the pattern. Send them back to the model as tool_result blocks with is_error: true. Anthropic's tool use guide and the equivalent OpenAI flow both treat tool errors as recoverable signals: the model sees the message, adjusts, and reissues with corrected arguments. That is how you get "order_id does not exist; ask the user" out of an LLM agent without writing a custom recovery prompt.

What strict mode is actually for

Strict mode is a parser-elimination feature, not a correctness feature. The bug it kills is the one where you string-replace "True" to "true" in a try/except json.JSONDecodeError loop. Kill that bug and you're left with a different one: the model is confidently wrong about values. That problem lives in your application, not your prompt.

The schema is the contract for shape. Your database, tenancy model, and business rules are the contract for meaning. Only your code can see the second contract; the model cannot.

The teams that ship reliable agents write the application-side validators they would have written for any user-input form, and let the model handle the natural-language part. The cleverness budget goes into the validators, not the prompt.

If this was useful

The AI Agents Pocket Guide has a chapter on the validation layers an agent needs between the model output and any state-changing call: schema, identity, ownership, state-machine legality, recovery. If you're writing tools that do anything more dangerous than reading public data, the layered-validation playbook is in there.