DEV Community

Papers Mache
Papers Mache

Posted on

JSON-Schema masks can block needed tool calls

Grammar‑based token masks can silently block the very function calls an LLM agent must emit. A lightweight two‑pass inference hack sidesteps the problem without retraining the model.

Before this work, engineers routinely combined JSON‑Schema output constraints with tool‑calling APIs, assuming the two constraints coexist harmlessly. Existing agents simply turned on the schema validator and let the model decide when to invoke a tool.

The suppression stems from the way schemas are enforced: “JSON Schema constraints are compiled into grammar‑based token masks that render tool‑call tokens unreachable during decoding” [1]. The mask prunes every token that would start a function call, so decoding never reaches a valid tool‑call token even though the rest of the response complies with the schema.

Running the model in a transparent two‑pass mode eliminates the dead‑end. In the second pass the mask is dropped, allowing the model to emit the missing call, and the paper reports that “Tool Invocation Rate increased from 0% to 100%” [1]. The fix preserves full schema compliance while recovering every required tool activation.

The study evaluates open‑weight model families in a production pipeline, but does not assess closed‑source models or more complex multi‑tool workflows, leaving it unclear whether they suffer the same mask‑induced deadlock. Moreover, the extra decoding pass adds additional latency, which may be a concern for real‑time agents. This suggests a need for smarter mask designs that exclude only truly illegal tokens rather than bluntly cutting off all call prefixes.

If the two‑pass pattern holds across deployments, any benchmark that measures tool use under schema constraints should be re‑run with the mask disabled in a second pass. More importantly, production agents can adopt the transparent two‑pass strategy as a default safety net, ensuring that tool calls are never silently lost while still guaranteeing structured output.

References

  1. Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Top comments (0)