Mukunda Rao Katta

Posted on May 25

llm-tool-arg-coerce: Coerce LLM Tool Args to Expected Types with a Function-Sig Shortcut

#hermeschallenge #ai #python #agents

The LLM returned "10" for a parameter typed as int. My function crashed with TypeError: '<' not supported between instances of 'str' and 'int'. I found it in a stack trace 45 minutes into a batch run. The LLM had been doing this the whole time, and every call that hit an integer comparison had silently crashed and been swallowed by the error handler that just logged "tool call failed" and continued.

The easy fix is to add int(args["limit"]) at the top of each tool function. I did that. Then I had a bool that came in as "false" (a string), and bool("false") in Python is True because any non-empty string is truthy. So I added a special case for bools. Then a list came in as a JSON string instead of a parsed list, so I added json.loads() for that. Six weeks later I had twenty tool functions, each with its own slightly different coercion block at the top, written by different people at different times.

The actual problem is that the LLM returns everything as strings or lightly typed JSON values, but Python functions have real type annotations. The coercion should happen at the boundary, in one place, reading the actual type annotations of the actual function. llm-tool-arg-coerce does that. It calls typing.get_type_hints() on your function and coerces each argument to its annotated type before you ever touch the args dict. A CoercionResult records exactly what was changed.

Shape of the fix

from llm_tool_arg_coerce import coerce_to_sig

def search_records(query: str, limit: int, include_archived: bool) -> list:
    # Real database call. Types matter here.
    return db.search(query, limit=limit, include_archived=include_archived)

# What the LLM actually sent:
raw_from_llm = {
    "query": "active users",
    "limit": "10",              # string, needs to be int
    "include_archived": "false" # string, needs to be bool
}

result = coerce_to_sig(search_records, raw_from_llm)
# result.coerced == {"query": "active users", "limit": 10, "include_archived": False}
# result.conversions == [("limit", "str->int"), ("include_archived", "str->bool")]

args = result.coerced
rows = search_records(**args)

# JSON schema path: if you have a schema dict instead of a Python function
from llm_tool_arg_coerce import coerce_to_schema

schema = {
    "type": "object",
    "properties": {
        "count": {"type": "integer"},
        "active": {"type": "boolean"},
        "tags": {"type": "array"},
    }
}

result = coerce_to_schema(schema, {"count": "5", "active": "true", "tags": '["a","b"]'})
# result.coerced == {"count": 5, "active": True, "tags": ["a", "b"]}

# Strict mode: raise on any failed coercion instead of returning a partial result
from llm_tool_arg_coerce import CoercionError

result = coerce_to_sig(search_records, raw_from_llm, strict=True)
# Raises CoercionError if any arg cannot be coerced to its annotated type

The CoercionResult object gives you coerced (the fixed dict), conversions (list of (field, "fromtype->totype") tuples), and failures (list of fields that could not be coerced, if not in strict mode).

What it does NOT do

It does not validate that required fields are present. If a field is annotated but not in the args dict, it leaves it absent rather than raising. Use your function's normal TypeError for that, or a schema validator upstream. It does not handle Optional[T] deeply: Optional[int] coerces the value to int if present and non-null, and passes None through unchanged. It does not coerce complex nested types like list[dict[str, int]]: it reads the outer container type only. A list annotation will parse a JSON string into a list but will not recursively coerce the element types inside. For deep nested coercion you need a full validation library like Pydantic. Finally, it does not call the function for you: it gives you the fixed args dict and you call the function yourself.

Inside the lib

The core is a type resolver that maps Python annotation types to coercion handlers. The handlers cover str, int, float, bool, list, dict, and typing.Optional wrappers. When the source value is already the correct type, the handler returns it unchanged and records no conversion. When coercion is possible, it applies and records the "fromtype->totype" pair. When coercion is not possible (for example, coercing "hello" to int), the behavior depends on the strict flag: in strict mode it raises CoercionError; in non-strict mode it leaves the value as-is and adds it to failures.

Bool coercion is the one that has to be careful. bool("false") in Python gives True. The handler treats the strings "false", "no", "0", "off", and "" as False, and everything else as True, which matches what most LLMs intend when they write "false" as a JSON string. This is a deliberate departure from Python's default bool() behavior and it is documented in the module docstring.

The JSON-schema path (coerce_to_schema) does the same coercion but drives from a schema dict instead of a function signature. It reads properties[field]["type"] and maps to the same set of handlers. This is useful when you define tools with raw schema dicts rather than Python functions, or when you are consuming tool definitions from an external source.

38 tests cover: int/float/bool/list/dict coercion from string inputs, bool string edge cases ("false", "0", "no", "off"), already-correct types pass-through, Optional unwrapping, strict mode CoercionError, failure accumulation in non-strict mode, JSON string to list/dict coercion, schema-based coercion path, and the conversions/failures record format.

When useful

Any tool function with typed annotations where the LLM reliably sends the right field names but not always the right types
Codebases where different engineers wrote coercion logic by hand in each tool function and the behavior is inconsistent
Agent frameworks where tool dispatch is centralized: add one coerce_to_sig call in the dispatch layer and all tools get consistent coercion
Auditing: the conversions list tells you how often the LLM sends mistyped args, which helps you improve your tool descriptions to reduce the frequency
JSON schema-defined tools where you want the same coercion behavior but drive from the schema dict rather than a Python function

When not useful

Tools where the args are already properly typed because the framework does deserialization before calling the handler
Deeply nested types like list[dict[str, list[int]]]: the library handles the outermost container, not recursive structures
Strict validation of required fields, field presence, or field value ranges: this is type coercion only, not validation
Cases where you already use Pydantic or dataclasses for tool args: those handle coercion as part of construction, so adding this on top is redundant

Install

pip install llm-tool-arg-coerce

Zero dependencies. Python 3.9+. Uses only typing, inspect, and json from the standard library.

Siblings

Library	Language	What it does
tool-arg-coerce-py	Python	Port of the Rust crate; similar coercion but schema-driven only
agentvet	Python/npm	Validates tool call shape before execution
tool-arg-defaults	Python	Fills missing tool args from schema defaults
tool-arg-rename	Python	Converts arg name case conventions between LLM and handler
tool-arg-fuzzy	Python	Fuzzy-matches LLM arg values to known enum members
tool-schema-from-fn	Python	Generates a JSON Schema from a Python function signature

What's next

The next useful addition is Optional[T] full handling: detect when a value is the string "null" or "none" and coerce it to Python None before the Optional unwrap. Right now "null" passes through as a string. Adding recursive container coercion for simple cases like list[int] is also on the list. Those two changes would cover the 90% case without pulling in Pydantic.

Part of the Hermes Agent Challenge sprint. Source at github.com/MukundaKatta/llm-tool-arg-coerce. PyPI: pip install llm-tool-arg-coerce.

DEV Community