MCP Solved Tool Access. Tool Selection Is Still Unsolved

#mcp #ai #agents #opensource

I have been building agentic workflows for the better part of a year, and the same friction keeps coming back: my agents have access to too many tools and judgement about almost none of them.

MCP fixed the access problem. It standardised how agents call tools, how clients connect, how servers describe themselves. That is real, and it has unlocked a wave of useful servers — browser automation, file systems, search, databases, you name it.

But access is not selection. When I ask Claude Code to "build me a real-time chat app with auth," it does not need a list of every MCP server in the world. It needs to know: which database, which realtime transport, which auth provider, and whether those four picks are version-compatible with each other. That is a different problem, and it is the one I built ToolCairn to solve.

The shape of the problem

If you have built anything serious on top of MCP, you have probably hit one of these:

Tool overload. Your agent has 30+ servers connected. Most are noise for the current task. Surface area becomes a context-window problem.
Wrong-package picks. The model autocompletes to a popular but wrong library. (requests vs. httpx. socket.io vs. native WebSocket. next-auth vs. better-auth.)
Version drift. The picks individually look fine; together they do not install. Or they install and then crash at runtime on a peer-dep mismatch.
No "why". A directory listing tells you a tool exists. It does not tell you why an agent should pick it for this specific task, what it composes well with, or what the trust signals are.

These are not exotic problems. They show up in week one of any non-trivial agent build.

What I built

ToolCairn is an MCP server. You install it the same way you install any MCP server:

claude mcp add toolcairn -- npx @neurynae/toolcairn-mcp

Once it is connected, your agent gets a small, focused toolkit:

classify_prompt — decide whether a request is a single-tool need, a multi-layer stack build, a comparison, or unrelated.
search_tools / search_tools_respond — find the right tool for one specific need, with a clarification loop when the request is ambiguous.
refine_requirement + get_stack — for "build me a SaaS analytics dashboard"-shaped tasks, decompose into sub-needs and return a coherent stack with cross-tool compatibility.
compare_tools — head-to-head when the user asks "X vs Y."
check_compatibility — version-aware peer-dep evaluation across picks.
check_issue — last-resort known-bug lookup before the agent burns three more retries on a problem.
report_outcome — close the loop after the user actually uses the recommendation, so the graph learns.

Underneath, the recommendations are drawn from a graph of tools indexed across 35+ open-source registries — npm, PyPI, Cargo, Maven, Go, Composer, RubyGems, NuGet, Homebrew, and more. The current graph carries thousands of tools with usage context, registry metadata, and version data — not a flat directory listing.

A concrete example

Prompt: "Build me a real-time chat app with auth."

What the agent does, with ToolCairn connected:

classify_prompt → returns stack_building.
refine_requirement → decomposes into web-framework, realtime-transport, auth-provider, database.
get_stack → returns a ranked stack: Next.js + Socket.IO + NextAuth + PostgreSQL, with a cross-tool compatibility matrix.
check_compatibility → confirms next@15 ✅ socket.io-client@4 ✅ peer-dep evaluation across the four picks.
The agent writes the project. After it ships, report_outcome fires and the graph learns from the choice.

That is a different shape of response than "here are 12 chat libraries, sorted by GitHub stars."

What this is not

I want to be very specific about scope, because the closest comparison is "directory" and that is the wrong frame.

It is not a directory. Directories are for humans browsing. ToolCairn is for agents requesting context at task time.
It is not a replacement for the official MCP Registry. The MCP Registry is the canonical index of MCP servers. ToolCairn is one server inside that index, focused on selection, not listing.
It is not a ranking algorithm dressed up. Ranking matters, but the load-bearing piece is the graph — how tools relate, what they compose with, what versions work together.
It is not finished. Trust signals, integration breadth, and the per-task recommendation quality all still have a long runway. That is most of what I want feedback on.

How to try it

Web: toolcairn.neurynae.com
Docs: toolcairn.neurynae.com/docs
Architecture / trust: toolcairn.neurynae.com/about
GitHub: github.com/neurynae/toolcairn-mcp
npm: @neurynae/toolcairn-mcp
Install (Claude Code): claude mcp add toolcairn -- npx @neurynae/toolcairn-mcp

What I want feedback on

Genuinely, blunt feedback, not validation:

Recommendation relevance. Are the picks actually the picks you would have made?
Missing categories. Where does ToolCairn return nothing useful? Which ecosystems is the graph too thin in?
Trust signals. What would make a recommendation trustworthy enough that you would let an agent act on it without reviewing every line?
Client integrations. Claude Code is supported today. Cursor, Codex, Windsurf, VS Code AI — which should come first?

You can leave it on a GitHub issue, a comment under this post, or my DMs. I read everything.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.