You gave your LLM a tool. It called it wrong. Again. Maybe it hallucinated a parameter name, nested an object three levels deep when the API expected a flat string, or cheerfully returned a 200 OK summary of a response that was actually a 422 validation error.
If you've wired up any LLM to a real API -- OpenAI function calling, Anthropic tool use, MCP servers, LangChain agents -- you've hit this wall. The good news: most failures follow predictable patterns, and there are concrete fixes.
Here are three patterns I use to take LLM tool calling from "works 60% of the time" to "works 95%+ of the time."
Why LLMs Fumble API Calls
Before the patterns, a quick mental model of why this happens.
LLMs generate tokens left-to-right. When your tool schema looks like this:
{
"body": {
"type": "object",
"properties": {
"user": {
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"street": { "type": "string" },
"city": { "type": "string" },
"geo": {
"type": "object",
"properties": {
"lat": { "type": "number" },
"lng": { "type": "number" }
}
}
}
}
}
}
}
}
}
The model has to hold 4 levels of nesting in its attention window while deciding what to generate next. Each nested brace is a point where it can lose track of which object it's inside. It's like asking someone to write valid JSON by hand, blindfolded, one character at a time.
Three things go wrong most often:
- Hallucinated keys -- the model invents parameter names that sound right but don't exist in the schema
-
Wrong nesting -- values end up at the wrong depth (
cityinsidegeoinstead ofaddress) - Dropped required fields -- deep-nested required params get silently skipped
Let's fix each one.
Pattern 1: Flatten Your Parameter Schemas
This is the single highest-impact change you can make. Instead of handing the LLM a nested object tree, flatten it into a single-depth key-value map using dot-notation or underscore-delimited keys.
Before (nested):
{
"body": {
"type": "object",
"properties": {
"user": {
"type": "object",
"properties": {
"name": { "type": "string" },
"address": {
"type": "object",
"properties": {
"city": { "type": "string" },
"zip": { "type": "string" }
}
}
}
}
}
}
}
After (flat):
{
"body__user__name": { "type": "string" },
"body__user__address__city": { "type": "string" },
"body__user__address__zip": { "type": "string" }
}
The LLM now sees a simple list of key-value pairs. No nesting to track, no braces to match. Your middleware reconstructs the nested structure before sending to the actual API.
Here's the transform in Python:
def flatten_schema(schema, prefix="", separator="__"):
"""Flatten a nested JSON schema into dot-notation keys."""
flat = {}
if schema.get("type") == "object" and "properties" in schema:
for key, value in schema["properties"].items():
new_prefix = f"{prefix}{separator}{key}" if prefix else key
if value.get("type") == "object" and "properties" in value:
flat.update(flatten_schema(value, new_prefix, separator))
else:
flat[new_prefix] = value
else:
flat[prefix] = schema
return flat
def unflatten_params(flat_params, separator="__"):
"""Reconstruct nested dict from flat keys before sending to API."""
nested = {}
for key, value in flat_params.items():
parts = key.split(separator)
current = nested
for part in parts[:-1]:
current = current.setdefault(part, {})
current[parts[-1]] = value
return nested
Usage in your tool-calling middleware:
# 1. Flatten the schema before registering the tool
original_schema = load_openapi_spec("petstore.yaml")
flat_schema = flatten_schema(original_schema["requestBody"])
# 2. Register tool with flat schema
register_tool("create_user", parameters=flat_schema)
# 3. When the LLM calls the tool, unflatten before forwarding
def handle_tool_call(name, flat_args):
nested_args = unflatten_params(flat_args)
return call_api(name, nested_args)
Why this works: every parameter is now a single decision point. The model picks a key, picks a value, moves on. No state tracking across nesting levels. In my testing, this alone cuts parameter hallucination by roughly 40-60% on complex APIs.
Pattern 2: Truncate API Responses Intelligently
The second failure mode isn't about calling the API -- it's about what happens after. Your LLM calls a list endpoint and gets back 500 objects, each with 30 fields, nested 4 levels deep. That's easily 50,000+ tokens of raw JSON crammed into the context window.
The LLM either:
- Chokes and produces garbage
- Summarizes incorrectly ("The API returned 3 users" when it returned 500)
- Blows your token budget in one call
Smart truncation solves this by applying two rules:
Rule 1: Slice arrays to a reasonable size
If the response is an array of 500 items, the LLM almost never needs all 500. Slice it and tell the model what happened.
def truncate_arrays(data, max_items=20):
"""Slice arrays and add a count hint for the LLM."""
if isinstance(data, list):
total = len(data)
sliced = [truncate_arrays(item, max_items) for item in data[:max_items]]
if total > max_items:
sliced.append(f"... ({total - max_items} more items, {total} total)")
return sliced
if isinstance(data, dict):
return {k: truncate_arrays(v, max_items) for k, v in data.items()}
return data
Rule 2: Limit nesting depth
Deep nesting past 4-5 levels rarely carries information the LLM needs for its next decision. Collapse it.
def limit_depth(data, max_depth=5, current_depth=0):
"""Replace deeply nested structures with a type hint."""
if current_depth >= max_depth:
if isinstance(data, dict):
return f"{{...}} ({len(data)} keys)"
if isinstance(data, list):
return f"[...] ({len(data)} items)"
return data
if isinstance(data, dict):
return {
k: limit_depth(v, max_depth, current_depth + 1)
for k, v in data.items()
}
if isinstance(data, list):
return [limit_depth(item, max_depth, current_depth + 1) for item in data]
return data
Combine both:
def smart_truncate(response_data, max_items=20, max_depth=5):
"""Apply both truncation strategies."""
truncated = truncate_arrays(response_data, max_items)
truncated = limit_depth(truncated, max_depth)
return truncated
The key insight: you're not losing information. You're giving the LLM a useful summary instead of a data dump. The model can still see the structure, the first N items, and the total count. If it needs item #47, it can ask for a filtered query.
Pattern 3: Generate Tool Definitions from API Specs
The most error-prone step in LLM tool calling is one most people do by hand: writing the tool definition itself.
You read the API docs, you write a JSON schema, you describe each parameter. And you get it subtly wrong -- a typo in a field name, a missing enum value, a required field marked as optional. Now the LLM is working from a broken map, and no amount of prompt engineering will save it.
The fix: don't write tool definitions by hand. Generate them directly from the API's OpenAPI (Swagger) spec.
Most APIs already have one. If they don't, you can usually generate a rough spec from their docs in minutes.
import yaml
def generate_tools_from_spec(spec_path):
"""Generate flat tool definitions from an OpenAPI spec."""
with open(spec_path) as f:
spec = yaml.safe_load(f)
tools = []
for path, methods in spec.get("paths", {}).items():
for method, operation in methods.items():
if method not in ("get", "post", "put", "patch", "delete"):
continue
# Collect parameters from path, query, header, and body
params = {}
for param in operation.get("parameters", []):
params[param["name"]] = {
"type": param.get("schema", {}).get("type", "string"),
"description": param.get("description", ""),
"required": param.get("required", False),
}
# Flatten request body schema if present
body_schema = (
operation.get("requestBody", {})
.get("content", {})
.get("application/json", {})
.get("schema", {})
)
if body_schema:
flat_body = flatten_schema(body_schema, prefix="body")
params.update(flat_body)
tools.append({
"name": operation.get("operationId", f"{method}_{path}"),
"description": operation.get("summary", ""),
"parameters": params,
})
return tools
This approach gives you:
- Accurate field names -- straight from the spec, no typos
- Complete enum values -- the LLM sees every valid option
- Correct required/optional markers -- no more dropped fields
- Automatic updates -- re-generate when the API version bumps
I automated this exact pipeline into an open-source tool called mcp-openapi that converts any OpenAPI spec into working MCP tools with flattening and truncation built in -- but the pattern works regardless of what framework you're using.
Your Next Steps
You don't need to implement all three patterns at once. Here's a prioritized action plan:
Start with flattening (Pattern 1). Pick your most-used API integration, flatten its schema, and measure the before/after accuracy. You'll likely see an immediate improvement.
Add response truncation (Pattern 2) for any endpoint that returns lists or deeply nested objects. Start with
max_items=20andmax_depth=5-- you can tune from there.Automate tool generation (Pattern 3) once you're integrating more than 2-3 APIs. The upfront cost pays for itself fast in fewer bugs and easier maintenance.
The underlying principle behind all three patterns is the same: reduce ambiguity for the model. Flat schemas reduce structural ambiguity. Truncation reduces information overload. Spec-driven generation reduces human error. Stack all three and your LLM tool calling becomes boring and reliable -- which is exactly what you want in production.
What's the weirdest LLM tool-calling failure you've seen? I once watched GPT-4 confidently pass "latitude": "yes" to a geocoding API. Drop your best horror stories in the comments -- I reply to every one.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.