Shane Ho

Posted on Mar 14 • Edited on Mar 19

3 Patterns That Fix LLM API Calling — Stop Getting Hallucinated Parameters

#ai #python #beginners #tutorial

You gave your LLM a tool. It called it wrong. Again. Maybe it hallucinated a parameter name, nested an object three levels deep when the API expected a flat string, or cheerfully returned a 200 OK summary of a response that was actually a 422 validation error.

If you've wired up any LLM to a real API -- OpenAI function calling, Anthropic tool use, MCP servers, LangChain agents -- you've hit this wall. The good news: most failures follow predictable patterns, and there are concrete fixes.

Here are three patterns I use to take LLM tool calling from "works 60% of the time" to "works 95%+ of the time."

Why LLMs Fumble API Calls

Before the patterns, a quick mental model of why this happens.

LLMs generate tokens left-to-right. When your tool schema looks like this:

{
  "body": {
    "type": "object",
    "properties": {
      "user": {
        "type": "object",
        "properties": {
          "address": {
            "type": "object",
            "properties": {
              "street": { "type": "string" },
              "city": { "type": "string" },
              "geo": {
                "type": "object",
                "properties": {
                  "lat": { "type": "number" },
                  "lng": { "type": "number" }
                }
              }
            }
          }
        }
      }
    }
  }
}

The model has to hold 4 levels of nesting in its attention window while deciding what to generate next. Each nested brace is a point where it can lose track of which object it's inside. It's like asking someone to write valid JSON by hand, blindfolded, one character at a time.

Three things go wrong most often:

Hallucinated keys -- the model invents parameter names that sound right but don't exist in the schema
Wrong nesting -- values end up at the wrong depth (city inside geo instead of address)
Dropped required fields -- deep-nested required params get silently skipped

Let's fix each one.

Pattern 1: Flatten Your Parameter Schemas

This is the single highest-impact change you can make. Instead of handing the LLM a nested object tree, flatten it into a single-depth key-value map using dot-notation or underscore-delimited keys.

Before (nested):

{
  "body": {
    "type": "object",
    "properties": {
      "user": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "address": {
            "type": "object",
            "properties": {
              "city": { "type": "string" },
              "zip": { "type": "string" }
            }
          }
        }
      }
    }
  }
}

After (flat):

{
  "body__user__name": { "type": "string" },
  "body__user__address__city": { "type": "string" },
  "body__user__address__zip": { "type": "string" }
}

The LLM now sees a simple list of key-value pairs. No nesting to track, no braces to match. Your middleware reconstructs the nested structure before sending to the actual API.

Here's the transform in Python:

def flatten_schema(schema, prefix="", separator="__"):
    """Flatten a nested JSON schema into dot-notation keys."""
    flat = {}

    if schema.get("type") == "object" and "properties" in schema:
        for key, value in schema["properties"].items():
            new_prefix = f"{prefix}{separator}{key}" if prefix else key
            if value.get("type") == "object" and "properties" in value:
                flat.update(flatten_schema(value, new_prefix, separator))
            else:
                flat[new_prefix] = value
    else:
        flat[prefix] = schema

    return flat


def unflatten_params(flat_params, separator="__"):
    """Reconstruct nested dict from flat keys before sending to API."""
    nested = {}

    for key, value in flat_params.items():
        parts = key.split(separator)
        current = nested
        for part in parts[:-1]:
            current = current.setdefault(part, {})
        current[parts[-1]] = value

    return nested

Usage in your tool-calling middleware:

# 1. Flatten the schema before registering the tool
original_schema = load_openapi_spec("petstore.yaml")
flat_schema = flatten_schema(original_schema["requestBody"])

# 2. Register tool with flat schema
register_tool("create_user", parameters=flat_schema)

# 3. When the LLM calls the tool, unflatten before forwarding
def handle_tool_call(name, flat_args):
    nested_args = unflatten_params(flat_args)
    return call_api(name, nested_args)

Why this works: every parameter is now a single decision point. The model picks a key, picks a value, moves on. No state tracking across nesting levels. In my testing, this alone cuts parameter hallucination by roughly 40-60% on complex APIs.

Pattern 2: Truncate API Responses Intelligently

The second failure mode isn't about calling the API -- it's about what happens after. Your LLM calls a list endpoint and gets back 500 objects, each with 30 fields, nested 4 levels deep. That's easily 50,000+ tokens of raw JSON crammed into the context window.

The LLM either:

Chokes and produces garbage
Summarizes incorrectly ("The API returned 3 users" when it returned 500)
Blows your token budget in one call

Smart truncation solves this by applying two rules:

Rule 1: Slice arrays to a reasonable size

If the response is an array of 500 items, the LLM almost never needs all 500. Slice it and tell the model what happened.

def truncate_arrays(data, max_items=20):
    """Slice arrays and add a count hint for the LLM."""
    if isinstance(data, list):
        total = len(data)
        sliced = [truncate_arrays(item, max_items) for item in data[:max_items]]
        if total > max_items:
            sliced.append(f"... ({total - max_items} more items, {total} total)")
        return sliced

    if isinstance(data, dict):
        return {k: truncate_arrays(v, max_items) for k, v in data.items()}

    return data

Rule 2: Limit nesting depth

Deep nesting past 4-5 levels rarely carries information the LLM needs for its next decision. Collapse it.

def limit_depth(data, max_depth=5, current_depth=0):
    """Replace deeply nested structures with a type hint."""
    if current_depth >= max_depth:
        if isinstance(data, dict):
            return f"{{...}} ({len(data)} keys)"
        if isinstance(data, list):
            return f"[...] ({len(data)} items)"
        return data

    if isinstance(data, dict):
        return {
            k: limit_depth(v, max_depth, current_depth + 1)
            for k, v in data.items()
        }

    if isinstance(data, list):
        return [limit_depth(item, max_depth, current_depth + 1) for item in data]

    return data

Combine both:

def smart_truncate(response_data, max_items=20, max_depth=5):
    """Apply both truncation strategies."""
    truncated = truncate_arrays(response_data, max_items)
    truncated = limit_depth(truncated, max_depth)
    return truncated

The key insight: you're not losing information. You're giving the LLM a useful summary instead of a data dump. The model can still see the structure, the first N items, and the total count. If it needs item #47, it can ask for a filtered query.

Pattern 3: Generate Tool Definitions from API Specs

The most error-prone step in LLM tool calling is one most people do by hand: writing the tool definition itself.

You read the API docs, you write a JSON schema, you describe each parameter. And you get it subtly wrong -- a typo in a field name, a missing enum value, a required field marked as optional. Now the LLM is working from a broken map, and no amount of prompt engineering will save it.

The fix: don't write tool definitions by hand. Generate them directly from the API's OpenAPI (Swagger) spec.

Most APIs already have one. If they don't, you can usually generate a rough spec from their docs in minutes.

import yaml

def generate_tools_from_spec(spec_path):
    """Generate flat tool definitions from an OpenAPI spec."""
    with open(spec_path) as f:
        spec = yaml.safe_load(f)

    tools = []
    for path, methods in spec.get("paths", {}).items():
        for method, operation in methods.items():
            if method not in ("get", "post", "put", "patch", "delete"):
                continue

            # Collect parameters from path, query, header, and body
            params = {}
            for param in operation.get("parameters", []):
                params[param["name"]] = {
                    "type": param.get("schema", {}).get("type", "string"),
                    "description": param.get("description", ""),
                    "required": param.get("required", False),
                }

            # Flatten request body schema if present
            body_schema = (
                operation.get("requestBody", {})
                .get("content", {})
                .get("application/json", {})
                .get("schema", {})
            )
            if body_schema:
                flat_body = flatten_schema(body_schema, prefix="body")
                params.update(flat_body)

            tools.append({
                "name": operation.get("operationId", f"{method}_{path}"),
                "description": operation.get("summary", ""),
                "parameters": params,
            })

    return tools

This approach gives you:

Accurate field names -- straight from the spec, no typos
Complete enum values -- the LLM sees every valid option
Correct required/optional markers -- no more dropped fields
Automatic updates -- re-generate when the API version bumps

I automated this exact pipeline into an open-source tool called mcp-openapi that converts any OpenAPI spec into working MCP tools with flattening and truncation built in -- but the pattern works regardless of what framework you're using.

Your Next Steps

You don't need to implement all three patterns at once. Here's a prioritized action plan:

Start with flattening (Pattern 1). Pick your most-used API integration, flatten its schema, and measure the before/after accuracy. You'll likely see an immediate improvement.
Add response truncation (Pattern 2) for any endpoint that returns lists or deeply nested objects. Start with max_items=20 and max_depth=5 -- you can tune from there.
Automate tool generation (Pattern 3) once you're integrating more than 2-3 APIs. The upfront cost pays for itself fast in fewer bugs and easier maintenance.

The underlying principle behind all three patterns is the same: reduce ambiguity for the model. Flat schemas reduce structural ambiguity. Truncation reduces information overload. Spec-driven generation reduces human error. Stack all three and your LLM tool calling becomes boring and reliable -- which is exactly what you want in production.

What's the weirdest LLM tool-calling failure you've seen? I once watched GPT-4 confidently pass "latitude": "yes" to a geocoding API. Drop your best horror stories in the comments -- I reply to every one.

If this helped, follow @docat0209 — I write weekly about AI-assisted development and developer tools.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.