Mukunda Rao Katta

Posted on May 25

My agent called search() with `{"query": null}`. The error message I returned saved the next 30 minutes.

#hermeschallenge #ai #python #agents

The first time I shipped an agent loop with tool calls, this is what happened.

The model called search(query=None) because it had reasoned its way into a state where it thought the query was already filled in. My tool function got the None, raised a TypeError, the framework dutifully sent the exception back to the model as a tool error, and the model said "I see, let me try again" and called search(query=None) again.

It did that nine times. Each call was a real LLM round-trip. The exception string did not tell the model what was wrong, because Python's default TypeError: argument of type 'NoneType' is not iterable is not a sentence a language model can act on.

agentvet is the small Python wrapper I wrote so this stops happening. It is on PyPI as agentvet. The whole library is one decorator and one exception type.

The shape of the fix

from agentvet import vet, ToolArgError

@vet({
    "query": {"type": "string", "minLength": 1, "description": "Search query, non-empty"},
    "limit": {"type": "integer", "minimum": 1, "maximum": 50, "default": 10},
})
def search(query: str, limit: int = 10):
    # Real work
    return search_engine.run(query, limit=limit)

# Inside the agent loop
try:
    result = search(**model_args)
except ToolArgError as e:
    # Pass e.retry_hint back to the model
    next_turn(role="tool", content=e.retry_hint)

ToolArgError.retry_hint is a string the model can read and act on. Not a Python traceback. Not a generic "invalid args" message. A specific, model-friendly sentence that names the broken arg and the fix.

What does the retry_hint look like

Three real examples from the corpus tests.

For the missing query case:

Argument validation failed for search().
Expected `query` to be a non-empty string, got None.
Try again with a valid `query`.

For the over-limit case:

Argument validation failed for search().
Expected `limit` to be an integer between 1 and 50, got 200.
Try again with a `limit` between 1 and 50.

For the wrong-type case:

Argument validation failed for search().
Expected `query` to be a string, got a dict.
Try again with `query` as a plain string. If you meant to search for a structured object, serialize it first.

The pattern is: name the tool, name the arg, name the expected vs got, give one concrete next step. Models reliably fix the call on the next turn instead of looping.

What it does NOT do

It does not execute the tool when args fail. The decorator short-circuits. The tool function body never runs.
It does not retry automatically. The retry decision lives in the agent loop, not in the validator. The validator just produces a clean hint.
It does not call the LLM. It is a pure validation step. Bring your own LLM call site.
It does not pretend to be a full JSON Schema validator. It implements the subset that maps cleanly to tool-arg validation. If you need full JSON Schema, plug in jsonschema via the validator= parameter.

Inside the lib: one design choice worth showing

The hard call was the type-coercion question. When the model returns "5" as a string and the schema expects an integer, do you coerce or reject?

Coerce too aggressively and you mask real bugs. Reject too aggressively and you generate retry traffic on cases where the model meant well.

The library's answer is a coerce_simple_scalars=True default that quietly coerces numeric strings to numbers and ISO-8601 strings to dates, but only for top-level scalar types. Nested dicts and lists are never coerced. The coercion always logs to a coerced_args field on a successful return so the caller can see what happened.

result, coerced = search.with_meta(query="hello", limit="20")
# coerced = {"limit": ("string", "integer", "20", 20)}

In production this almost always silently fixes the wrong-shape-numeric case without generating a retry. In tests you set coerce_simple_scalars=False to catch the same bug at write time.

When this is useful

You are running an agent loop with tool calls and you want fewer dead retries.
You are writing tools that the model has to call and you want clear error messages that the model can act on.
You are wiring agentvet alongside agentsnap (snapshot tests for runs) or agenttrace (cost + latency capture) and you want the validation step to be part of the same composable stack.

When this is NOT what you want

For human-facing API validation. Use Pydantic or jsonschema directly. The retry_hint is shaped for a language model, not a human user.
For runtime contract validation between services. agentvet is single-process and synchronous. For inter-service validation, use a proper contract testing tool.

Install

pip install agentvet

Repo: https://github.com/MukundaKatta/AgentVetPy
Rust port: https://github.com/MukundaKatta/agentvet-rs

Sibling libraries

Lib	Boundary	Repo
agentvet	Tool-arg validation with LLM-friendly hints	this repo
agentguard	Egress allowlist for tool fetches	https://github.com/MukundaKatta/AgentGuardPy
agentsnap	Snapshot tests for agent runs	https://github.com/MukundaKatta/AgentSnapPy
agentcast	Structured output enforcement	https://github.com/MukundaKatta/AgentCastPy
prompt-shield	Pattern-based prompt-injection detection	https://github.com/MukundaKatta/prompt-shield

What's next

A vet_for(model="claude-sonnet-4-7") configuration so the retry_hint can be tuned per model family. Different models respond differently to error phrasing. Worth measuring.

DEV Community