The RAG System That Mixed Documentation From Different Products (And Created Frankenstein Instructions)

#retrieval #rag #ai #automation

I built a RAG system for a company that sold three different software products. One knowledge base. Three hundred documents. One AI answering questions about all products.
The retrieval worked. The answers were chaos. Customers following the instructions ended up configuring Product A using steps from Product B while referencing features that only existed in Product C.
Nobody could actually complete any task using the AI's guidance.
The Setup
Software company with three products. Enterprise CRM, Marketing Automation Platform, and Analytics Dashboard. Separate products but overlapping concepts. All three had settings pages, user management, API integrations, and data exports.
They wanted one unified support bot. Customer asks a question, bot searches across all documentation, returns the answer. Efficient. No need to maintain three separate bots.
I built it as a standard RAG system. All documentation from all three products went into one vector database. Query comes in, retrieve relevant chunks, generate answer.
Tested with fifty questions. Answers looked reasonable. Deployed.
The Frankenstein Instructions
Within days, support tickets arrived with a consistent pattern. Customers saying the AI instructions did not match what they saw in their product.
Customer using the CRM asked: "How do I export contact data?"
AI answer: "Go to Settings, click Data Export, select CSV format, choose date range, and click Generate Report. You can schedule automatic exports under Advanced Options."
Customer response: "There is no Data Export in Settings. There is no Advanced Options menu. Where are these features?"
I checked the retrieved chunks. The answer combined three different sources. "Go to Settings" came from CRM documentation. "Select CSV format and choose date range" came from Analytics Dashboard docs. "Schedule automatic exports under Advanced Options" came from Marketing Automation docs.
Each individual chunk was accurate for its own product. But they described three different export workflows that did not exist in combination anywhere.
The Pattern
The vector search retrieved chunks based on semantic similarity. "Export contact data" matched documentation about exports from all three products. The retrieval did not filter by product context.
The LLM saw five chunks about exporting data. Each chunk described a different part of the export process, but from different products with different UIs and different features. The LLM synthesized these chunks into one coherent-sounding answer that described a workflow that did not exist in any actual product.
Another example: "How do I add team members?"
Retrieved chunks:
CRM: "Navigate to Team Settings and click Add User"
Marketing Automation: "Go to Account > Users > Invite New Member"
Analytics Dashboard: "Open the sidebar, select Team, and click the plus icon"
AI answer: "Navigate to Team Settings in the sidebar, click Add User or the plus icon, then select Invite New Member from the Account menu."
Completely incoherent. Every product had different navigation, different button labels, different flows. The answer was a mashup that worked nowhere.
Why This Happened
The vector embeddings measured semantic similarity, not product identity. The query "add team members" was semantically similar to documentation from all three products about adding users. Retrieval returned chunks from all three.
The prompt told the LLM to answer using the retrieved chunks. It never told the LLM to check if chunks were from compatible contexts. The LLM saw five relevant chunks and synthesized them into one answer, assuming they described parts of the same system.
There was no product boundary in the retrieval or the generation. The entire knowledge base was treated as one unified system when it actually described three separate systems with different architectures.
The Failed Fix
I tried adding product names to every chunk. Each chunk was tagged with metadata: product equals CRM, product equals Marketing Automation, or product equals Analytics Dashboard.
Then I filtered retrieval: only search chunks matching the user's product.
That required knowing which product the user had. The bot asked: "Which product are you using?" at the start of every conversation. Customers hated it. Many did not know the official product names. Some used multiple products and did not know which one their question was about.
The Real Solution Was Contextual Separation
The fix required treating the knowledge base as three separate domains, not one merged corpus, but doing so intelligently without forcing users to self-identify upfront.
First, I added product-specific vector namespaces. Each product's documentation lived in its own retrieval space. When a query came in, the system searched all three namespaces in parallel but kept results separated by source.
Second, I added context detection. Before retrieval, the system analyzed the query for product-specific terminology. Mentions of "campaign builder" indicated Marketing Automation. Mentions of "deal pipeline" indicated CRM. Mentions of "dashboard widgets" indicated Analytics.
If the query contained clear product signals, retrieval prioritized that product's namespace and only pulled from others if the primary namespace had low-confidence matches.
If the query was ambiguous, the system retrieved from all products but presented answers separately: "For CRM: [answer]. For Marketing Automation: [answer]. Which product are you using?"
Third, the generation prompt changed. The LLM was explicitly instructed: "These chunks may come from different products. Do not mix instructions from different products. If chunks conflict, they likely describe different systems. Present separate answers or ask for clarification."
What Changed
Question: "How do I export contact data?"
Old behavior:
Retrieved chunks from CRM, Marketing Automation, Analytics
Generated mashup answer
Customer could not follow instructions
New behavior:
Detected "contact data" as CRM-specific terminology
Retrieved primarily from CRM namespace
Generated answer using only CRM chunks
Answer matched actual CRM interface
If query was ambiguous: "How do I export data?"
Retrieved from all products but kept separate
Responded: "Export process varies by product. Are you using CRM, Marketing Automation, or Analytics Dashboard?"
User clarifies
System provides accurate product-specific answer
The Results
Before the fix, forty one percent of answers mixed incompatible instructions from multiple products. Customers could not complete tasks. Support tickets increased because the AI was giving impossible instructions. Trust in the AI dropped to twenty three percent.
After the fix, cross-product contamination dropped to under three percent, limited to edge cases where products genuinely shared identical features. Answer accuracy for product-specific questions reached eighty nine percent. Support tickets related to AI confusion disappeared.
The business impact was immediate. Customers using the AI to solve problems actually succeeded. Support ticket deflection went from negative, the AI was creating tickets, to positive fifty four percent deflection rate.
What I Learned
Semantic similarity does not equal contextual compatibility. Two chunks can be topically similar while describing completely different systems. Retrieval must consider source boundaries, not just content similarity.
Merging documentation from multiple products into one undifferentiated knowledge base destroys the contextual boundaries that make instructions actionable. Each product is a separate world with its own vocabulary, UI, and workflows.
LLMs will synthesize retrieved chunks into coherent answers even when those chunks are incompatible. The generation prompt must explicitly forbid cross-context mixing and require product consistency.
The Bottom Line
A RAG system retrieving from three products without source filtering created Frankenstein instructions by mixing incompatible chunks. The fix was product-aware retrieval with context detection and generation prompts that enforced single-product coherence.