Fight Hallucination
Modern LLM chat clients search the web. That is great for general knowledge. It is a problem for your product — because your internal documentation is not on the public internet, and even if parts of it are, web search retrieves whatever ranks highest, not what is authoritative. And.. did it get the document of the version your customer is using?
The result: customers ask their AI assistant about your product and get plausible-sounding wrong answers. The fix is — give the AI access to your actual docs with correct info at query time. The question is how.
Why Not RAG?
RAG requires either the customer to build and maintain a retrieval pipeline, or the vendor to host one — a vector DB, embedding model, and retrieval API running 24/7 for a doc corpus that changes a few times a year. Either way, the infrastructure cost is disproportionate to the problem.
Design Decision: Bundle the Docs Into the MCP Server
MCP (Model Context Protocol) is a standard for connecting tools and data sources to AI chat clients.
The decision was to compile the documentation directly into the MCP Server. No external database. No embedding pipeline. The customer connects the server and their AI immediately has access to accurate, current product documentation — sourced from the actual manual, not the web.
This shifts the indexing work to the vendor, run once at release time.
How the Doc Search Works: Index Navigation, Not Vector Similarity
The approach has two parts: a preprocessing step done once at release time, and tools exposed through the MCP server at runtime.
Preprocessing — understand and organize the documentation.
Before a release, processes the entire documentation set and generate a index for the LLM to easily understand the document and it's structure. The full documentation and this index are compiled into the MCP server.
Runtime — tools that let the LLM find and retrieve content.
The MCP server exposes tools that give the chat client access to the index. When a customer asks a question, the LLM in the chat client calls these tools to browse the index, identify which section is relevant, and retrieve it by calling a different tool to retrieve the actual content. The retrieved content is the full, intact section — not a fragment — and it comes directly from the binary. No external call. No database.
The result: the LLM understands what is in the documentation and where to find it, can match the customer's question to the right section, and retrieves authoritative content to generate the answer.
End-to-End Flow
Here is what happens when a customer asks "How do I create a user in ABC product?"
Customer Claude MCP Server Docs (binary)
| | | |
| "How do I | | |
| create a user?"| | |
|---------------->| | |
| | get_index() | |
| |------------------>| |
| | |-- read index ---->|
| | |<-- index ---------|
| |<-- index ---------| |
| | | |
| (identifies relevant section) |
| | | |
| | get_section( | |
| | "User Mgmt") | |
| |------------------>| |
| | |-- read section -->|
| | |<-- full section --|
| |<-- full section --| |
| | | |
| "To create a | | |
| user, go to | | |
| Admin > Users" | | |
|<----------------| | |
At no point does Claude guess or search the web. Every step is grounded in the documentation compiled into the MCP server.
Cost Comparison
| RAG | Bundled docs in MCP | |
|---|---|---|
| Infrastructure | Vector DB + embedding model | None |
| Ongoing cost | Hosting + API calls | Zero |
| Customer setup | Build retrieval pipeline | Connect the server once |
| Retrieved content | Fragments, chunk boundaries | Whole sections, intact |
| Accuracy on domain-specific queries | Depends on chunking strategy | High — intent-matched index |
Deployment
Package the binary as an MCPB file — a bundle that includes the server and a manifest — and ship it with your product release. The customer imports it into their chat client once. The server runs in STDIO mode as a local subprocess alongside the chat client. No cloud, no ports, no configuration.
Takeaway
If your customers are using AI chat clients — and they are — they are already asking questions about your product. The choice is whether those answers come from your documentation or from whatever the web surfaces.
Bundling your docs into an MCP server makes the right answer the default answer, with no setup required from the customer and no ongoing infrastructure cost on your end.
Top comments (0)