DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

The Tool That Kept Doing Full-Table Scans: tool-arg-defaults

It started with a slow query.

My agent called get_records and the underlying API returned 80,000 rows. The response was several megabytes, it blew past the context window, and the whole run failed with a truncation error. I added a max_results parameter to the tool and gave it a default of 10 in the function signature.

The model kept omitting it.

Not sometimes. Almost always. When the model decided get_records was the right tool, it would pass {"table": "customers", "filter": "active"} and nothing else. Python's default parameter kicked in and set max_results=10 at the function level. So far, so good.

Except the underlying API had its own default. The records_client.query() call I was making only passed max_results when it was explicitly included in the kwargs. If I called records_client.query(table=table, filter=filter) without a max_results argument, the API defaulted to no limit. Full-table scan every time.

The function default was 10. But the function body looked like this:

def get_records(table: str, filter: str, max_results: int = 10):
    kwargs = {"table": table, "filter": filter}
    if max_results:
        kwargs["max_results"] = max_results
    return records_client.query(**kwargs)
Enter fullscreen mode Exit fullscreen mode

That if max_results check was the bug. max_results=10 is truthy, so when the model passed it explicitly, the limit was applied. When the model omitted it entirely, the function received max_results=10 from the default, the if passed, and the limit was applied. It seemed fine in unit tests because the tests always exercised the default path through the function.

In production the model was not omitting the parameter from the function call. It was omitting the key from the dict it passed to the dispatcher. The dispatcher called the function like get_records(**model_args) where model_args was {"table": "customers", "filter": "active"}. No max_results key at all. Python filled in the default. The if branch ran. Limit applied.

But this was only on the current version of the dispatcher. A previous version called records_client.query(**model_args) directly and added max_results as a kwarg only when it was present. That version was still running in one of the staging environments. In that environment, no limit was applied when the model omitted the key.

Two environments, same code, different behavior, because the default lived in two places and only one was being hit.

The Shape of the Fix

tool-arg-defaults makes the default application explicit and consistent. You define per-tool defaults in a single dict. Before any environment or function is involved, the library fills in the missing keys. The function receives a complete dict. Every layer downstream sees the same values.

from tool_arg_defaults import apply_defaults

DEFAULTS = {
    "get_records": {
        "max_results": 10,
        "order": "desc",
    }
}

# model_args from the LLM: {"table": "customers", "filter": "active"}
filled = apply_defaults(model_args, DEFAULTS["get_records"])
# filled: {"table": "customers", "filter": "active", "max_results": 10, "order": "desc"}

result = get_records(**filled)
Enter fullscreen mode Exit fullscreen mode

The function can now do whatever it wants with max_results. It is always present. No conditional needed. No staging vs production discrepancy.

You can also use the class interface if you are managing many tools:

from tool_arg_defaults import ToolDefaults

defaults = ToolDefaults({
    "get_records": {"max_results": 10, "order": "desc"},
    "send_message": {"priority": "normal", "retry": True},
})

filled = defaults.fill("get_records", model_args)
Enter fullscreen mode Exit fullscreen mode

The fill method does the same thing. The class is just a container so you do not have to pass the defaults dict at every call site.

What It Does NOT Do

A few things are intentionally out of scope:

  • It does not validate the filled args. That is a separate problem. Use agentvet for schema-based validation after defaults are applied.
  • It does not coerce types. If the default is an integer and the model passes a string, the string wins. Type coercion is handled by tool-arg-coerce-py.
  • It does not inject defaults into the schema seen by the model. The LLM still sees the same tool description you gave it. If you want the schema to reflect the defaults, generate it with tool-schema-from-fn.
  • It does not handle nested argument paths. If your tool takes a dict-valued parameter and you want to fill defaults inside that dict, you handle that yourself.

Inside the Lib: None as a Real Value

This is the most important design decision in the library, and the one most likely to be done wrong in a naive implementation.

When the model returns {"table": "customers", "max_results": null}, it is making an explicit choice. It wants to query without a limit. null in JSON maps to None in Python. If the library treats None as missing and replaces it with 10, the model's instruction is silently ignored.

The library treats None as a real value. Only a truly absent key triggers the default. A key present with value None is left alone.

from tool_arg_defaults import apply_defaults

defaults = {"max_results": 10}

# Key absent: default is applied
result = apply_defaults({}, defaults)
# {"max_results": 10}

# Key present with None: None is kept
result = apply_defaults({"max_results": None}, defaults)
# {"max_results": None}

# Key present with a value: value is kept
result = apply_defaults({"max_results": 50}, defaults)
# {"max_results": 50}
Enter fullscreen mode Exit fullscreen mode

The caller always wins. But "caller" includes the model passing null. Absent is different from null.

This matters more than it sounds. Consider a search tool where max_results=None means "give me everything" and max_results=10 means "give me the top ten". If the model passes null and the library replaces it with 10, the agent quietly stops returning complete results. That is a behavior change with no error, no log, and no easy trail.

The check inside the library is a strict key presence test, not a truthiness test:

def apply_defaults(args: dict, defaults: dict) -> dict:
    result = dict(args)
    for key, value in defaults.items():
        if key not in result:
            result[key] = value
    return result
Enter fullscreen mode Exit fullscreen mode

key not in result is false when the key is present with None. if not args.get(key) would be true for None, 0, "", and False. That is the wrong check for this use case.

When This Is Useful

You have tools where the model frequently omits optional parameters and you want consistent behavior regardless of which code path handles the call.

You have a mismatch between function-signature defaults and API defaults. The function default is a Python convention. The API default is a contract with an external service. Keeping them in sync via code is fragile. Applying defaults explicitly before the call makes the intent visible.

You are building a dispatcher that handles many tools and you want one place to define what "unspecified" means for each parameter. Not scattered across function signatures.

You want to log or audit exactly what values were used for each call, including defaults that were filled in. With explicit filling you can diff the original model_args against the filled args and record what was substituted.

When NOT to Use This

Your function signature defaults are already the single source of truth and you never call the underlying API without going through the function. If there is no layering issue, there is no problem to solve.

Your tools have no optional parameters. If every parameter is required, there is nothing to fill.

You need type coercion or validation as part of the filling step. This library only fills missing keys. For coercion, add tool-arg-coerce-py after this step. For validation, add agentvet after that.

Install

pip install tool-arg-defaults
Enter fullscreen mode Exit fullscreen mode

Zero dependencies. Python 3.9 and above.

from tool_arg_defaults import apply_defaults

# Simplest case
filled = apply_defaults(
    {"table": "orders"},
    {"max_results": 10, "order": "desc"}
)
# {"table": "orders", "max_results": 10, "order": "desc"}
Enter fullscreen mode Exit fullscreen mode

The repo is at MukundaKatta/tool-arg-defaults. 20 tests, all passing. No runtime dependencies.

Siblings

These four libraries cover adjacent parts of the same tool-call lifecycle.

Lib Boundary Repo
agentvet Validate args against a schema after defaults are filled MukundaKatta/agentvet
tool-arg-coerce-py Coerce arg types, applied after defaults are set MukundaKatta/tool-arg-coerce-py
tool-schema-from-fn Generate the schema from the function signature, documenting which args have defaults MukundaKatta/tool-schema-from-fn
agent-fn-registry Store function, schema, side effects, and defaults in one place MukundaKatta/agent-fn-registry

The order that makes sense in a dispatcher: generate the schema with tool-schema-from-fn, receive model args, fill missing keys with tool-arg-defaults, coerce types with tool-arg-coerce-py, validate with agentvet, then call the function. Each step is independent. You can drop any one out.

What Is Next

The most obvious gap is nested defaults. If a tool accepts a config dict parameter and you want to fill defaults inside that dict, the current implementation does not help. A recursive fill mode with dot-notation keys would cover that.

A second useful addition would be a record_filled flag that returns both the filled args and a dict of keys that were substituted. That makes it easy to log what the library changed, which is useful for debugging runs where the model's omission of a parameter was intentional but the default was wrong.

The core problem that this library solves is small and specific. That is the point. A max_results parameter that the model keeps omitting should not require two hours of debugging across staging and production environments. It should require a one-line dict entry.

The full-table scan ran eleven times before I found the root cause. It was not a bug in the model. It was a default defined in three places and only enforced in two of them.

Top comments (0)