Son Seong Jun

Posted on Mar 15 • Originally published at github.com

I gave an LLM 248 tools and accuracy dropped to 12%. Here's what fixed it.

#llm #python #opensource #openapi

LLM agents break when you give them too many tools. I hit this wall with 248 Kubernetes API endpoints — the model's accuracy dropped to 12%. Vector search didn't fix it. Graph-based retrieval did.

Here's the problem, why vector search fails, and how I solved it with graph-tool-call — an open-source, zero-dependency Python library for tool retrieval.

The problem: context overflow kills accuracy

I was building an LLM agent (qwen3:4b) for a Kubernetes cluster. 248 API endpoints, all exposed as tools. Threw them all into the context and asked the model to "scale my deployment."

Accuracy? 12%. The model choked on 8,192 tokens of tool definitions.

This isn't a model problem — it's a retrieval problem. The LLM needs a smaller, relevant subset of tools. But how do you pick the right ones?

Why vector search isn't enough

Natural first instinct: embed all tool descriptions, find the closest matches via cosine similarity. Simple.

Except... when a user says "cancel my order and get a refund," vector search returns cancelOrder. But the actual workflow is:

listOrders → getOrder → cancelOrder → processRefund

Vector search finds one tool. You need the chain. Real API workflows involve sequencing, prerequisites, and complementary operations that flat similarity search completely misses.

The solution: graph-based tool retrieval

I built graph-tool-call — it models tool relationships as a directed graph. Tools have edges like PRECEDES, REQUIRES, COMPLEMENTARY. When you search, it doesn't just find one match — it traverses the graph and returns the whole workflow.

The retrieval fuses four signals via weighted Reciprocal Rank Fusion (wRRF):

Signal	What it does
BM25	Keyword matching against tool names & descriptions
Graph traversal	Expands results along PRECEDES/REQUIRES/COMPLEMENTARY edges
Embedding	Semantic similarity (optional — Ollama, OpenAI, vLLM, etc.)
MCP annotations	Prioritizes read-only vs destructive tools based on query intent

Benchmark results

Same 248 K8s tools, same model (qwen3:4b, 4-bit quantized):

Setup	Accuracy	Tokens	Token reduction
All 248 tools (baseline)	12%	8,192	—
graph-tool-call (top-5)	82%	1,699	79%
+ embedding + ontology	82%	1,924	76%

On smaller APIs (19–50 tools), baseline accuracy is already high — but graph-tool-call still cuts tokens by 64–91%.

Here's what it looks like in action — token savings, e-commerce workflow search, and GitHub API search:

Zero dependencies

The core runs on Python stdlib only. No numpy, no torch, no heavy ML frameworks. Install only what you need:

pip install graph-tool-call                # core — zero deps
pip install graph-tool-call[embedding]     # + semantic search
pip install graph-tool-call[mcp]           # + MCP server mode
pip install graph-tool-call[all]           # everything

Try it in 30 seconds

uvx graph-tool-call search "user authentication" \
  --source https://petstore.swagger.io/v2/swagger.json

As an MCP server

Drop this in your .mcp.json and any MCP client (Claude Code, Cursor, Windsurf) gets smart tool search:

{
  "mcpServers": {
    "tool-search": {
      "command": "uvx",
      "args": ["graph-tool-call[mcp]", "serve",
               "--source", "https://api.example.com/openapi.json"]
    }
  }
}

Python API

from graph_tool_call import ToolGraph

tg = ToolGraph.from_url(
    "https://petstore3.swagger.io/api/v3/openapi.json",
    cache="petstore.json",
)

# Retrieve only relevant tools
tools = tg.retrieve("cancel my order", top_k=5)
for t in tools:
    print(f"{t.name}: {t.description}")

MCP Proxy: 172 tools → 3 meta-tools

Running multiple MCP servers? Their tool definitions pile up in every LLM turn. MCP Proxy bundles them behind a single server:

172 tools across servers → 3 meta-tools (search_tools, get_tool_schema, call_backend_tool)
After search, matched tools are dynamically injected for 1-hop direct calling
Saves ~1,200 tokens per turn

claude mcp add tool-proxy -- \
  uvx "graph-tool-call[mcp]" proxy --config ~/backends.json

What makes this different

	Vector-only	graph-tool-call
Dependencies	Embedding model required	Zero (stdlib only)
Tool source	Manual registration	Auto-ingest from OpenAPI / MCP / Python
Search	Flat similarity	BM25 + graph + embedding + annotations
Workflows	Single tool matches	Multi-step chain retrieval
History	None	Demotes used tools, boosts next-step
LLM dependency	Required	Optional (better with, works without)

Get started

GitHub: github.com/SonAIengine/graph-tool-call
PyPI: pip install graph-tool-call
Docs: Architecture · Benchmarks

If you're dealing with large tool sets in production, I'd love to hear what threshold you hit before retrieval became necessary. Drop a comment or open an issue — contributions welcome 🙌

DEV Community