<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Son Seong Jun</title>
    <description>The latest articles on DEV Community by Son Seong Jun (@sonaiengine).</description>
    <link>https://dev.to/sonaiengine</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3811553%2Fd1c21fdb-b932-4ebc-8c08-d9ed37bc4bf1.jpeg</url>
      <title>DEV Community: Son Seong Jun</title>
      <link>https://dev.to/sonaiengine</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sonaiengine"/>
    <language>en</language>
    <item>
      <title>I Built a Graph-Based Tool Search Engine for LLM Agents — Here's What I Learned After 1068 Tools</title>
      <dc:creator>Son Seong Jun</dc:creator>
      <pubDate>Sun, 22 Mar 2026 08:02:38 +0000</pubDate>
      <link>https://dev.to/sonaiengine/i-built-a-graph-based-tool-search-engine-for-llm-agents-heres-what-i-learned-after-1068-tools-4fj4</link>
      <guid>https://dev.to/sonaiengine/i-built-a-graph-based-tool-search-engine-for-llm-agents-heres-what-i-learned-after-1068-tools-4fj4</guid>
      <description>&lt;p&gt;LLM agents need tools. But when you have 248 Kubernetes API endpoints or 1068 GitHub API operations, you can't stuff them all into the context window. The standard fix is vector search — embed tool descriptions, find the closest match. It works for finding &lt;em&gt;one tool&lt;/em&gt;. But real tasks aren't one tool.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/SonAIengine/graph-tool-call" rel="noopener noreferrer"&gt;graph-tool-call&lt;/a&gt;, a Python library that models tool relationships as a graph and retrieves execution chains, not just individual matches. After reaching v0.15, I ran a fair competitive benchmark against 6 retrieval strategies across 1068 API endpoints. The results were humbling — and led to a complete architecture rethink.&lt;/p&gt;

&lt;p&gt;This post covers what I found, what I broke, and what I built differently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Vector Search Isn't Enough
&lt;/h2&gt;

&lt;p&gt;Consider this user request:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Cancel my order and process a refund"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Vector search finds &lt;code&gt;cancelOrder&lt;/code&gt; — the closest semantic match. But the actual workflow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;listOrders → getOrder → cancelOrder → requestRefund
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You need &lt;code&gt;getOrder&lt;/code&gt; first because &lt;code&gt;cancelOrder&lt;/code&gt; requires an &lt;code&gt;order_id&lt;/code&gt;. You need &lt;code&gt;requestRefund&lt;/code&gt; after because that's the business process. Vector search returns one tool; you need a chain of four.&lt;/p&gt;

&lt;p&gt;This isn't a retrieval quality problem. It's a &lt;strong&gt;structural knowledge&lt;/strong&gt; problem. No amount of embedding improvement will teach a vector database that &lt;code&gt;getOrder&lt;/code&gt; must precede &lt;code&gt;cancelOrder&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Competitive Benchmark: 6 Strategies, 9 Datasets
&lt;/h2&gt;

&lt;p&gt;I wanted to know: does graph-tool-call actually beat vector search? Not on my own cherry-picked examples, but on a fair, reproducible benchmark.&lt;/p&gt;

&lt;p&gt;First, I analyzed &lt;a href="https://github.com/langchain-ai/langgraph/tree/main/libs/langgraph-bigtool" rel="noopener noreferrer"&gt;bigtool&lt;/a&gt; (LangGraph's tool retrieval library). Its core is surprisingly simple — it calls &lt;code&gt;store.search()&lt;/code&gt; on LangGraph's vector store. bigtool isn't a retrieval algorithm; it's a wrapper around cosine similarity search.&lt;/p&gt;

&lt;p&gt;So I set up a fair comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cosine similarity with qwen3-embedding (≈ what bigtool does)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25 Only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keyword matching (TF-IDF style)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graph Only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Graph traversal from category nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25 + Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;graph-tool-call default (no embedding)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector + BM25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hybrid without graph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;BM25 + Graph + Embedding + Annotation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I ran all 6 across 9 datasets ranging from 19 tools (Petstore) to 1068 tools (GitHub full API), using the same queries and the same evaluation metrics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Recall@5&lt;/th&gt;
&lt;th&gt;MRR&lt;/th&gt;
&lt;th&gt;Miss%&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.897&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.1%&lt;/td&gt;
&lt;td&gt;176ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BM25 Only&lt;/td&gt;
&lt;td&gt;91.6%&lt;/td&gt;
&lt;td&gt;0.819&lt;/td&gt;
&lt;td&gt;7.5%&lt;/td&gt;
&lt;td&gt;1.5ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BM25 + Graph&lt;/td&gt;
&lt;td&gt;91.6%&lt;/td&gt;
&lt;td&gt;0.819&lt;/td&gt;
&lt;td&gt;7.5%&lt;/td&gt;
&lt;td&gt;14ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector + BM25&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.897&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.1%&lt;/td&gt;
&lt;td&gt;171ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.897&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.1%&lt;/td&gt;
&lt;td&gt;172ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The uncomfortable truth: embedding dominates.&lt;/strong&gt; Vector Only already hits 96.8% Recall. Adding Graph, BM25, or annotations on top of it provides zero additional improvement.&lt;/p&gt;

&lt;p&gt;Without embedding, BM25 alone achieves 91.6% — decent, but clearly below vector search. And here's the worst part: &lt;strong&gt;BM25 + Graph performed the same as BM25 alone.&lt;/strong&gt; The graph wasn't helping at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Bugs That Made Graph Harmful
&lt;/h2&gt;

&lt;p&gt;During the benchmark, I discovered that on some datasets, BM25 + Graph actually scored &lt;em&gt;worse&lt;/em&gt; than BM25 alone. The graph was actively degrading results. I spent days debugging this and found three root causes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 1: &lt;code&gt;set_weights()&lt;/code&gt; Was Silently Ignored
&lt;/h3&gt;

&lt;p&gt;The retrieval engine has adaptive weight selection based on corpus size. When I called &lt;code&gt;set_weights(keyword=1.0, graph=0.0)&lt;/code&gt; to test BM25-only, the adaptive function &lt;strong&gt;overwrote my settings&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_adaptive_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# This always ran, ignoring any manual set_weights() call
&lt;/span&gt;    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# hardcoded!
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# hardcoded!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every benchmark strategy produced &lt;strong&gt;identical results&lt;/strong&gt; because the weights never changed. I ran the benchmark 5 times before catching this.&lt;/p&gt;

&lt;p&gt;The fix was adding a &lt;code&gt;_weights_manual&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...):&lt;/span&gt;
    &lt;span class="c1"&gt;# ... set values ...
&lt;/span&gt;    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_weights_manual&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# disable adaptive
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_adaptive_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_weights_manual&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_keyword_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_graph_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... adaptive logic ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bug 2: Graph Just Echoed BM25
&lt;/h3&gt;

&lt;p&gt;The graph channel worked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query → BM25 finds top 10 → Graph expands their neighbors → Done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since graph expansion started from BM25 results, it could only find tools that were &lt;em&gt;already near&lt;/em&gt; what BM25 found. It provided zero independent signal — just amplified BM25's noise.&lt;/p&gt;

&lt;p&gt;I needed the graph to find things BM25 &lt;em&gt;couldn't&lt;/em&gt;. That meant starting from a different place entirely — more on this below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 3: Annotations Overwhelmed Precision at Scale
&lt;/h3&gt;

&lt;p&gt;With 248+ Kubernetes tools, many share the same HTTP method. When the intent classifier detected "create" in a query, annotation scoring boosted &lt;em&gt;every&lt;/em&gt; POST endpoint. The correct tool (&lt;code&gt;createCoreV1NamespacedService&lt;/code&gt;) got pushed below a wrong one (&lt;code&gt;createCoreV1Namespace&lt;/code&gt;) because both are POST requests.&lt;/p&gt;

&lt;p&gt;This is a precision vs recall tradeoff. Annotations help recall (finding more candidates), but at scale they destroy precision (ranking the right one first).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture Fix: Graph as Candidate Injection
&lt;/h2&gt;

&lt;p&gt;The root problem was putting Graph into the wRRF (weighted Reciprocal Rank Fusion) scoring as a 4th channel alongside BM25, Embedding, and Annotation. Since Graph's standalone accuracy was only 69.8% (vs BM25's 91.6%), it dragged down the fused score.&lt;/p&gt;

&lt;p&gt;I completely removed Graph from wRRF. Instead, Graph now acts as an independent &lt;strong&gt;candidate injection&lt;/strong&gt; channel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before (broken):
  BM25 + Graph + Embedding + Annotation → wRRF fusion → results
                  ↑ noise injected here

After (fixed):
  BM25 + Embedding + Annotation → wRRF fusion → primary results
                                                      ↓
  Graph (independent) → inject candidates BM25 missed → final results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical rule: &lt;strong&gt;Graph candidates are always scored below the lowest BM25 result.&lt;/strong&gt; Graph can only &lt;em&gt;add&lt;/em&gt; tools that BM25 missed, never displace a BM25 result. This guarantees &lt;code&gt;BM25 + Graph ≥ BM25 alone&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_inject_graph_candidates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;graph_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...):&lt;/span&gt;
    &lt;span class="c1"&gt;# Only tools NOT already in primary results
&lt;/span&gt;    &lt;span class="n"&gt;new_candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;graph_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;final_scores&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;# Score below lowest primary result
&lt;/span&gt;    &lt;span class="n"&gt;min_primary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;injection_base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;min_primary&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g_score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;max_inject&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;final_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;injection_base&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;norm_score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Making Graph Search Generic
&lt;/h2&gt;

&lt;p&gt;The original graph search had 49 GitHub-specific aliases hardcoded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old code — only works with GitHub API
&lt;/span&gt;&lt;span class="n"&gt;_RESOURCE_ALIASES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pull request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pulls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pulls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;runner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 45 more
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I replaced this with &lt;strong&gt;dynamic reverse-indexing&lt;/strong&gt;. When you ingest any OpenAPI spec, tool names and descriptions are automatically tokenized and mapped to their category nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# New code — works with any API
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;neighbor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_neighbors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category_node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# "requestRefund" → tokens: ["request", "refund"]
&lt;/span&gt;    &lt;span class="c1"&gt;# → "refund" maps to "orders" category
&lt;/span&gt;    &lt;span class="n"&gt;name_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;([a-z])([A-Z])&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\1 \2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;neighbor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;name_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;stem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;category_node&lt;/span&gt;

    &lt;span class="c1"&gt;# Also index description keywords
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;description_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;stem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;category_node&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now "refund" → orders, "checkout" → cart, "stargazer" → activity — all automatically from any OpenAPI spec.&lt;/p&gt;




&lt;h2&gt;
  
  
  1068 Tool Stress Test
&lt;/h2&gt;

&lt;p&gt;I fetched the entire GitHub REST API OpenAPI spec — 1068 endpoints. This is where things get interesting.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Recall@5&lt;/th&gt;
&lt;th&gt;MRR&lt;/th&gt;
&lt;th&gt;Miss%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector Only&lt;/td&gt;
&lt;td&gt;88.0%&lt;/td&gt;
&lt;td&gt;0.761&lt;/td&gt;
&lt;td&gt;12.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BM25 + Graph&lt;/td&gt;
&lt;td&gt;78.0%&lt;/td&gt;
&lt;td&gt;0.643&lt;/td&gt;
&lt;td&gt;22.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Pipeline&lt;/td&gt;
&lt;td&gt;88.0%&lt;/td&gt;
&lt;td&gt;0.761&lt;/td&gt;
&lt;td&gt;12.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At this scale, everything degrades. The 22% miss rate for BM25 is high. I analyzed every miss case:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Expected&lt;/th&gt;
&lt;th&gt;Why it missed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Close an existing issue"&lt;/td&gt;
&lt;td&gt;issues/update&lt;/td&gt;
&lt;td&gt;"close" ≠ "update"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Add org member"&lt;/td&gt;
&lt;td&gt;orgs/set-membership-for-user&lt;/td&gt;
&lt;td&gt;Completely different naming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Register self-hosted runner"&lt;/td&gt;
&lt;td&gt;actions/create-registration-token&lt;/td&gt;
&lt;td&gt;Indirect mapping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Trigger workflow dispatch"&lt;/td&gt;
&lt;td&gt;actions/create-workflow-dispatch-event&lt;/td&gt;
&lt;td&gt;"trigger" ≠ "create"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern: &lt;strong&gt;every miss is a semantic gap&lt;/strong&gt; where the query uses different words than the tool name. "Close" means "update status to closed". "Register" means "create registration token". No keyword matching can bridge these gaps — this is fundamentally what embeddings solve.&lt;/p&gt;

&lt;p&gt;This was a humbling realization. I stopped trying to make BM25+Graph beat vector search on ranking accuracy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Graph Actually Wins: Workflow Chains
&lt;/h2&gt;

&lt;p&gt;If Graph can't beat embeddings at ranking individual tools, what &lt;em&gt;can&lt;/em&gt; it do that embeddings can't?&lt;/p&gt;

&lt;p&gt;The answer: &lt;strong&gt;return execution chains, not individual tools.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process a refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. listOrders — prerequisite for requestRefund
2. requestRefund — primary action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, an LLM agent receiving just &lt;code&gt;requestRefund&lt;/code&gt; would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Call &lt;code&gt;requestRefund&lt;/code&gt; → error: "order_id required"&lt;/li&gt;
&lt;li&gt;Figure out it needs &lt;code&gt;getOrder&lt;/code&gt; → call it&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;requestRefund&lt;/code&gt; again → success&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's &lt;strong&gt;3 LLM round trips&lt;/strong&gt;. With the workflow chain, it's &lt;strong&gt;1 round trip&lt;/strong&gt;. Each round trip is an LLM API call — this directly cuts costs.&lt;/p&gt;

&lt;p&gt;The workflow planner uses the graph's &lt;code&gt;REQUIRES&lt;/code&gt; and &lt;code&gt;PRECEDES&lt;/code&gt; edges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Find primary tool&lt;/strong&gt; via resource-first search (query keywords → category → tools)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand prerequisites&lt;/strong&gt; — follow REQUIRES edges backward, but only same-category GET/LIST methods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topological sort&lt;/strong&gt; — order by dependency&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The "same-category GET/LIST only" filter was critical. Without it, a single query would pull in 12+ unrelated prerequisites through loose cross-resource REQUIRES edges. With it, chains stay focused: 2-4 steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual Workflow Editor
&lt;/h3&gt;

&lt;p&gt;Auto-generated chains aren't 100% accurate. "Close an issue" can't automatically map to &lt;code&gt;updateIssue&lt;/code&gt; because of the semantic gap. So instead of chasing 100% automation, I built a visual editor.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;plan.open_editor(tools=tg.tools)&lt;/code&gt; opens a browser-based drag-and-drop editor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Drag&lt;/strong&gt; to reorder steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Click&lt;/strong&gt; tools in the sidebar to add steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;X&lt;/strong&gt; to remove steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export JSON&lt;/strong&gt; to save the workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a single HTML file, zero dependencies — consistent with graph-tool-call's philosophy.&lt;/p&gt;

&lt;p&gt;For code-first users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;close an issue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reorder&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;getIssue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updateIssue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_param_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updateIssue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;getIssue.response.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;close_issue.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Remote Deployment: SSE Transport
&lt;/h2&gt;

&lt;p&gt;The MCP server previously only supported stdio — local process communication. This meant every developer had to install and run graph-tool-call locally.&lt;/p&gt;

&lt;p&gt;I added SSE (Server-Sent Events) and Streamable-HTTP transport:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy once on a server&lt;/span&gt;
graph-tool-call serve &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; https://api.example.com/openapi.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--transport&lt;/span&gt; sse &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Team members just add a URL to their MCP client config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tool-search"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://tool-search.internal:8000/sse"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;stdio is 1:1 local. SSE is 1:N network. One server, entire team.&lt;/p&gt;

&lt;p&gt;The proxy mode also supports SSE, so you can aggregate multiple MCP servers behind a single remote endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;graph-tool-call proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; backends.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--transport&lt;/span&gt; sse &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Don't compete with embeddings on ranking
&lt;/h3&gt;

&lt;p&gt;If you have a good embedding model, vector search will beat keyword + graph at ranking individual tools. Period. The benchmark proved this conclusively.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Graph's value is structural, not semantic
&lt;/h3&gt;

&lt;p&gt;Embeddings find semantically similar tools. Graphs encode &lt;em&gt;relationships&lt;/em&gt; — what must come before what, what requires what. These are different kinds of knowledge. Don't try to use one where the other is needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Fair benchmarks are humbling
&lt;/h3&gt;

&lt;p&gt;My original story was "Graph beats baseline by 70%." After running a fair 6-strategy comparison, the real story is "Graph ties BM25 at ranking, but uniquely provides workflow chains." Less dramatic. More honest. More useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. 100% automation &amp;lt; good defaults + easy editing
&lt;/h3&gt;

&lt;p&gt;I spent days trying to make the workflow planner automatically find &lt;code&gt;updateIssue&lt;/code&gt; for "close an issue." Then I spent an afternoon building a visual editor. The editor was more valuable than any accuracy improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Zero-dep is a feature
&lt;/h3&gt;

&lt;p&gt;The entire core library has zero Python dependencies. BM25, graph traversal, wRRF fusion — all in stdlib Python. Add &lt;code&gt;[embedding]&lt;/code&gt; for semantic search, &lt;code&gt;[mcp]&lt;/code&gt; for MCP server mode. Users install only what they need.&lt;/p&gt;




&lt;h2&gt;
  
  
  Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Supported tool scale&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1068&lt;/strong&gt; (tested)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall@5 (no embedding)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;91.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall@5 (with embedding)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency (no embedding)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.5ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token reduction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;64–91%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies (core)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test coverage&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;494 tests&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;graph-tool-call
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;graph_tool_call&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolGraph&lt;/span&gt;

&lt;span class="n"&gt;tg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ToolGraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://petstore3.swagger.io/api/v3/openapi.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;petstore.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Search
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;place an order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Workflow chain
&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buy a pet and place an order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# WorkflowPlan([addPet → placeOrder])
&lt;/span&gt;
&lt;span class="c1"&gt;# Visual editor
&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open_editor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or try it without installing — the &lt;a href="https://github.com/SonAIengine/graph-tool-call/tree/main/playground" rel="noopener noreferrer"&gt;interactive playground&lt;/a&gt; runs in your browser with demo data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/SonAIengine/graph-tool-call" rel="noopener noreferrer"&gt;github.com/SonAIengine/graph-tool-call&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/graph-tool-call/" rel="noopener noreferrer"&gt;pypi.org/project/graph-tool-call&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback, issues, and stars are welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I gave an LLM 248 tools and accuracy dropped to 12%. Here's what fixed it.</title>
      <dc:creator>Son Seong Jun</dc:creator>
      <pubDate>Sun, 15 Mar 2026 09:45:25 +0000</pubDate>
      <link>https://dev.to/sonaiengine/i-gave-an-llm-248-tools-and-accuracy-dropped-to-12-heres-what-fixed-it-91h</link>
      <guid>https://dev.to/sonaiengine/i-gave-an-llm-248-tools-and-accuracy-dropped-to-12-heres-what-fixed-it-91h</guid>
      <description>&lt;p&gt;LLM agents break when you give them too many tools. I hit this wall with &lt;strong&gt;248 Kubernetes API endpoints&lt;/strong&gt; — the model's accuracy dropped to &lt;strong&gt;12%&lt;/strong&gt;. Vector search didn't fix it. Graph-based retrieval did.&lt;/p&gt;

&lt;p&gt;Here's the problem, why vector search fails, and how I solved it with &lt;a href="https://github.com/SonAIengine/graph-tool-call" rel="noopener noreferrer"&gt;graph-tool-call&lt;/a&gt; — an open-source, zero-dependency Python library for &lt;strong&gt;tool retrieval&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem: context overflow kills accuracy
&lt;/h2&gt;

&lt;p&gt;I was building an LLM agent (qwen3:4b) for a Kubernetes cluster. 248 API endpoints, all exposed as tools. Threw them all into the context and asked the model to "scale my deployment."&lt;/p&gt;

&lt;p&gt;Accuracy? &lt;strong&gt;12%.&lt;/strong&gt; The model choked on 8,192 tokens of tool definitions.&lt;/p&gt;

&lt;p&gt;This isn't a model problem — it's a &lt;strong&gt;retrieval&lt;/strong&gt; problem. The LLM needs a smaller, relevant subset of tools. But how do you pick the right ones?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why vector search isn't enough
&lt;/h2&gt;

&lt;p&gt;Natural first instinct: embed all tool descriptions, find the closest matches via cosine similarity. Simple.&lt;/p&gt;

&lt;p&gt;Except... when a user says "cancel my order and get a refund," vector search returns &lt;code&gt;cancelOrder&lt;/code&gt;. But the actual workflow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;listOrders → getOrder → cancelOrder → processRefund
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vector search finds &lt;strong&gt;one tool&lt;/strong&gt;. You need the &lt;strong&gt;chain&lt;/strong&gt;. Real API workflows involve sequencing, prerequisites, and complementary operations that flat similarity search completely misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution: graph-based tool retrieval
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/SonAIengine/graph-tool-call" rel="noopener noreferrer"&gt;&lt;strong&gt;graph-tool-call&lt;/strong&gt;&lt;/a&gt; — it models tool relationships as a &lt;strong&gt;directed graph&lt;/strong&gt;. Tools have edges like &lt;code&gt;PRECEDES&lt;/code&gt;, &lt;code&gt;REQUIRES&lt;/code&gt;, &lt;code&gt;COMPLEMENTARY&lt;/code&gt;. When you search, it doesn't just find one match — it &lt;strong&gt;traverses the graph&lt;/strong&gt; and returns the whole workflow.&lt;/p&gt;

&lt;p&gt;The retrieval fuses four signals via &lt;strong&gt;weighted Reciprocal Rank Fusion (wRRF)&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keyword matching against tool names &amp;amp; descriptions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graph traversal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Expands results along PRECEDES/REQUIRES/COMPLEMENTARY edges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Semantic similarity (optional — Ollama, OpenAI, vLLM, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP annotations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Prioritizes read-only vs destructive tools based on query intent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Benchmark results
&lt;/h2&gt;

&lt;p&gt;Same 248 K8s tools, same model (qwen3:4b, 4-bit quantized):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Token reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All 248 tools (baseline)&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;td&gt;8,192&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;graph-tool-call (top-5)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,699&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;79%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ embedding + ontology&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,924&lt;/td&gt;
&lt;td&gt;76%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On smaller APIs (19–50 tools), baseline accuracy is already high — but graph-tool-call still cuts tokens by &lt;strong&gt;64–91%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what it looks like in action — token savings, e-commerce workflow search, and GitHub API search:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FSonAIengine%2Fgraph-tool-call%2Fmain%2Fassets%2Fdemo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FSonAIengine%2Fgraph-tool-call%2Fmain%2Fassets%2Fdemo.gif" alt="graph-tool-call demo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero dependencies
&lt;/h2&gt;

&lt;p&gt;The core runs on &lt;strong&gt;Python stdlib only&lt;/strong&gt;. No numpy, no torch, no heavy ML frameworks. Install only what you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;graph-tool-call                &lt;span class="c"&gt;# core — zero deps&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;graph-tool-call[embedding]     &lt;span class="c"&gt;# + semantic search&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;graph-tool-call[mcp]           &lt;span class="c"&gt;# + MCP server mode&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;graph-tool-call[all]           &lt;span class="c"&gt;# everything&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Try it in 30 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx graph-tool-call search &lt;span class="s2"&gt;"user authentication"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; https://petstore.swagger.io/v2/swagger.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  As an MCP server
&lt;/h3&gt;

&lt;p&gt;Drop this in your &lt;code&gt;.mcp.json&lt;/code&gt; and any MCP client (Claude Code, Cursor, Windsurf) gets smart tool search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tool-search"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"graph-tool-call[mcp]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
               &lt;/span&gt;&lt;span class="s2"&gt;"--source"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.example.com/openapi.json"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;graph_tool_call&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolGraph&lt;/span&gt;

&lt;span class="n"&gt;tg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ToolGraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://petstore3.swagger.io/api/v3/openapi.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;petstore.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieve only relevant tools
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel my order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  MCP Proxy: 172 tools → 3 meta-tools
&lt;/h2&gt;

&lt;p&gt;Running multiple MCP servers? Their tool definitions pile up in every LLM turn. MCP Proxy bundles them behind a single server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;172 tools&lt;/strong&gt; across servers → &lt;strong&gt;3 meta-tools&lt;/strong&gt; (&lt;code&gt;search_tools&lt;/code&gt;, &lt;code&gt;get_tool_schema&lt;/code&gt;, &lt;code&gt;call_backend_tool&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;After search, matched tools are &lt;strong&gt;dynamically injected&lt;/strong&gt; for 1-hop direct calling&lt;/li&gt;
&lt;li&gt;Saves &lt;strong&gt;~1,200 tokens per turn&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add tool-proxy &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  uvx &lt;span class="s2"&gt;"graph-tool-call[mcp]"&lt;/span&gt; proxy &lt;span class="nt"&gt;--config&lt;/span&gt; ~/backends.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What makes this different
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Vector-only&lt;/th&gt;
&lt;th&gt;graph-tool-call&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Embedding model required&lt;/td&gt;
&lt;td&gt;Zero (stdlib only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual registration&lt;/td&gt;
&lt;td&gt;Auto-ingest from OpenAPI / MCP / Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flat similarity&lt;/td&gt;
&lt;td&gt;BM25 + graph + embedding + annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workflows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single tool matches&lt;/td&gt;
&lt;td&gt;Multi-step chain retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;History&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Demotes used tools, boosts next-step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM dependency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Optional (better with, works without)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/SonAIengine/graph-tool-call" rel="noopener noreferrer"&gt;github.com/SonAIengine/graph-tool-call&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;code&gt;pip install graph-tool-call&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://github.com/SonAIengine/graph-tool-call/blob/main/docs/architecture/overview.md" rel="noopener noreferrer"&gt;Architecture&lt;/a&gt; · &lt;a href="https://github.com/SonAIengine/graph-tool-call#benchmark" rel="noopener noreferrer"&gt;Benchmarks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're dealing with large tool sets in production, I'd love to hear what threshold you hit before retrieval became necessary. Drop a comment or open an issue — contributions welcome 🙌&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>opensource</category>
      <category>openapi</category>
    </item>
  </channel>
</rss>
