<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: xaip-agent</title>
    <description>The latest articles on DEV Community by xaip-agent (@xkumakichi).</description>
    <link>https://dev.to/xkumakichi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3879438%2F973a5c17-3aa5-4b12-9c4f-50ef1b572d8a.png</url>
      <title>DEV Community: xaip-agent</title>
      <link>https://dev.to/xkumakichi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xkumakichi"/>
    <language>en</language>
    <item>
      <title>AI Agents Pick Tools Blind</title>
      <dc:creator>xaip-agent</dc:creator>
      <pubDate>Tue, 14 Apr 2026 23:43:14 +0000</pubDate>
      <link>https://dev.to/xkumakichi/stop-your-ai-agent-from-picking-broken-mcp-servers-4pa0</link>
      <guid>https://dev.to/xkumakichi/stop-your-ai-agent-from-picking-broken-mcp-servers-4pa0</guid>
      <description>&lt;p&gt;I connected my AI agent to 3 MCP servers.&lt;/p&gt;

&lt;p&gt;It picked one at random.&lt;/p&gt;

&lt;p&gt;It timed out. Then retried a different one. Then finally hit one that worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;node without-xaip.js
&lt;span class="go"&gt;
→ Trying: unknown-server...
  ✗ error — package not found (8.2s)

→ Trying: sequential-thinking...
  ✓ connected — but wrong tool for docs task

→ Trying: context7...
  ✓ success (3.1s)

Total: 11.3 seconds, 2 wasted calls
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are over 1,000 MCP servers now. Your agent has no way to tell which ones are reliable, which ones are broken, and which ones are the right fit.&lt;/p&gt;

&lt;p&gt;So I built a fix: one API call that picks the right server first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;node with-xaip.js
&lt;span class="go"&gt;
→ XAIP selected: context7 (trust: 1.0, 248 verified executions)
  ✓ success (3.1s)

Total: 3.1 seconds, 0 wasted calls
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;XAIP&lt;/a&gt; — trust scoring for AI agents, backed by real execution data. Not benchmarks. Not self-reported metrics. Actual tool-call results, cryptographically signed.&lt;/p&gt;

&lt;h2&gt;
  
  
  A live API you can try right now
&lt;/h2&gt;

&lt;p&gt;No signup, no API key. Just curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Trust score for a specific MCP server&lt;/span&gt;
curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"context7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trust"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trusted"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"receipts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;248&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"xaip-aggregator (quorum:1)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"riskFlags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"computedFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"248 receipts via XAIP Aggregator BFT (1 nodes)"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or let XAIP pick the best server for your task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://xaip-trust-api.kuma-github.workers.dev/v1/select &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "task": "Fetch React documentation",
    "candidates": ["context7", "sequential-thinking", "unknown-server"]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"context7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Highest trust (1) from 248 verified executions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rejected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unknown-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unscored — no execution data"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"withoutXAIP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Random selection would pick an unscored server 33% of the time — no execution data, no safety guarantee"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;withoutXAIP&lt;/code&gt; field exists to make the risk visible. It's the answer to "why do I need this?"&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;XAIP has three moving parts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Trust API&lt;/strong&gt; — Returns trust scores for MCP servers. Scores come from real execution data, not self-reported metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Decision Engine&lt;/strong&gt; — &lt;code&gt;POST /v1/select&lt;/code&gt; takes a task and a list of candidate servers, returns the best pick with reasoning. Unscored servers are automatically excluded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Aggregator&lt;/strong&gt; — Collects Ed25519-signed execution receipts. Every tool call produces a cryptographic receipt that feeds back into trust scores.&lt;/p&gt;

&lt;p&gt;The trust model is Bayesian (Beta distribution), weighted by caller diversity to prevent single-caller gaming. If only one caller submits receipts for a server, the score reflects that limited evidence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Select → Execute → Report
  ↑                    │
  └────────────────────┘
     scores improve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The data is real
&lt;/h2&gt;

&lt;p&gt;This isn't a mock API. Trust scores are computed from 1,127 actual MCP tool-call executions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Trust&lt;/th&gt;
&lt;th&gt;Receipts&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;context7&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;td&gt;248&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sequential-thinking&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;td&gt;285&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;filesystem&lt;/td&gt;
&lt;td&gt;0.909&lt;/td&gt;
&lt;td&gt;594&lt;/td&gt;
&lt;td&gt;caution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Monitored via &lt;a href="https://github.com/xkumakichi/veridict" rel="noopener noreferrer"&gt;Veridict&lt;/a&gt;, a runtime execution monitor that tracks success rates, latency, and failure types.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;filesystem&lt;/code&gt; scores lower because it has real failures in its history — that's the system working correctly. A trust score should reflect reality, not optimism.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try the full demo
&lt;/h2&gt;

&lt;p&gt;The dogfooding demo runs the complete loop: select a server, execute MCP tool calls, submit a signed receipt, check the updated score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/xkumakichi/xaip-protocol.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xaip-protocol/demo
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npx tsx dogfood.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Takes about 15 seconds. You'll see XAIP select &lt;code&gt;context7&lt;/code&gt;, execute real tool calls against it, submit a receipt to the Aggregator, and print the comparison table.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;XAIP is at v0.4.0. The infrastructure is live and the data is real, but adoption is the bottleneck:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More servers&lt;/strong&gt; — Currently scoring 3 MCP servers. The system scales to any server, but needs execution data flowing in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More callers&lt;/strong&gt; — Caller diversity is the main lever for score accuracy. More independent callers = higher confidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform integrations&lt;/strong&gt; — Working toward integration with MCP registries like Smithery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building AI agents that use MCP, you can start using the API today. Scores will keep improving as more execution data flows in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond today
&lt;/h2&gt;

&lt;p&gt;Right now, XAIP helps agents pick working tools.&lt;/p&gt;

&lt;p&gt;But this becomes critical when agents start doing more than calling APIs — paying for services, delegating tasks across organizations, executing autonomous workflows.&lt;/p&gt;

&lt;p&gt;At that point, the question changes from "does this tool work?" to "can I trust this agent with money?"&lt;/p&gt;

&lt;p&gt;XAIP is designed for that future. But it already solves a real problem today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: &lt;code&gt;https://xaip-trust-api.kuma-github.workers.dev&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;xkumakichi/xaip-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt;: &lt;a href="https://www.npmjs.com/package/xaip-sdk" rel="noopener noreferrer"&gt;xaip-sdk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime monitor&lt;/strong&gt;: &lt;a href="https://github.com/xkumakichi/veridict" rel="noopener noreferrer"&gt;xkumakichi/veridict&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;XAIP doesn't make agents smarter. It prevents them from making dumb choices.&lt;/p&gt;

&lt;p&gt;Built this because I needed it. If your agent is still picking servers blind, &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;give it a try&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
