Yaohua Chen

Posted on Jun 10

Don't Build That RAG Knowledge Base — Seven Reasons It Will Fail, and What to Build Instead

Companies Have Been Failing at This for 30 Years — AI Won't Change That by Itself

Clients come to me and say, "We want to build a company-wide AI knowledge base." I used to take those projects on the spot. Today, nine times out of ten, my first move is to talk them out of it.

It's not that knowledge bases are a bad idea. It's that we keep pointing the newest technology at a problem that has resisted every previous attempt for three decades.

Consider what we know about how badly information retrieval works inside companies:

McKinsey estimated that knowledge workers spend nearly 20% of their workweek — roughly one full day — searching for internal information or tracking down colleagues who have it [1].
A 2022 Starmind survey found knowledge workers lose about 1 hour 42 minutes every day hunting for information they need to do their jobs; a third lose more than two hours [2].
A 2026 enterprise search survey found internal searches succeed on the first attempt only ~10% of the time, while Google delivers a useful first page about 95% of the time [3].

None of these are new, AI-era problems. They are the same problems the knowledge management movement of the 1990s failed to solve, the corporate intranet failed to solve, and enterprise search failed to solve. The AI knowledge base is just the latest contender — and the same trap is waiting.

A quick orientation before we start. When I say "AI knowledge base," I mean a system built on RAG — Retrieval-Augmented Generation, the technique where the system searches an index of your documents and feeds the most relevant passages to an AI model before it composes an answer. This article is written for the people who approve, scope, and build these projects: engineering leaders, product owners, and the consultants advising them. One note on the examples ahead — a few of the public failures I cite (Air Canada, NYC's MyCity, DPD, McDonald's) are broad enterprise chatbot failures rather than knowledge-base failures in the strict sense; I use them as cautionary tales of the same underlying product mistake: a broad, under-scoped AI interface that promises to handle anything.

This article walks through the seven reasons these projects fail. For each one, I'll give you the problem, the root causes, and — because diagnosis without a prescription is just complaining — the solutions and best practices that actually work, with real companies and real numbers attached.

Problem 1: A Big Launch, Then Nobody Uses It

The Problem

Here's the script. See if it sounds familiar.

An executive attends an industry conference, hears an impressive talk, and comes back declaring, "We need an AI knowledge base." A project plan appears within weeks: three months for vendor selection, three months for development, three months for a pilot. Launch day goes beautifully — the demo works, the slides look great, leadership posts about it on LinkedIn.

Six months later, someone checks the admin dashboard: active users are below 10% of the company.

This pattern is well documented. Gartner found that 40% of corporate portal initiatives fail to achieve enough adoption to justify their ROI, and 10–15% get scrapped outright — findings that predate AI entirely [4]. The AI version repeats the same arc with a more expensive tech stack: Gartner predicted in mid-2024 that at least 30% of generative AI (GenAI) projects would be abandoned after proof of concept by the end of 2025 [5] — and by January 2026 had revised the realized figure up to 50% [6]. MIT's NANDA initiative went further: across 300 public enterprise AI implementations, 95% of GenAI pilots produced no measurable P&L impact [7].

In the post-mortems, the blamed causes are always the same: employees didn't know how to use it, the documents weren't organized, the model wasn't good enough, we picked the wrong use case. Then a "Knowledge Base 2.0" project gets approved — and dies the same death.

Root Causes

Success is defined as shipping, not adoption. The project is judged on demo day, when the only metric that matters is whether people are still using it six months later.
No one is accountable for usage. IT owns uptime; nobody owns adoption. When usage craters, there's no owner to notice or act.
The project can't be killed. Without pre-agreed failure criteria, a dying project limps along consuming budget until it's quietly buried.
Procurement precedes validation. Companies sign platform contracts before testing whether a single real workflow improves.

Solutions & Best Practices

Redefine "launch." Write into the project charter that success means weekly active usage and task-completion rates measured six months after go-live — not a working demo. MIT's research found the 5% of GenAI projects that succeeded were judged on narrow, measurable operational outcomes, not on shipping [7].
Write kill criteria up front. For example: "If weekly active users are below 30% of the target group at month three, we stop and re-scope." A project that cannot be killed cannot be honest. Gartner's analysis of failed GenAI projects lists "unclear business value" as a top abandonment driver — kill criteria force that clarity before money is spent [5][6].
Pilot before procurement. Run a 4–6 week pilot with 20–50 real users on one real workflow before signing any platform contract.
Make a business leader — not IT — accountable for adoption. IT can own uptime. Only the business can own usage.

Problem 2: The Technology Matters Far Less Than You Think

The Problem

The first time I ran one of these projects, every client conversation was about technology: which vector database (the system that stores AI-searchable representations of your text), what chunk size (how large a passage the AI reads at once), which embedding model (the algorithm that converts text into those representations), whether to add a reranking step.

The conversations sounded professional. The client was happy. The CTO nodded along.

I later realized all of it was beside the point.

Not because the choices were wrong — but because, in my experience, those choices influence the final outcome by less than 10%. That figure is a practitioner's estimate, not a measured statistic — but every engineer I know who has shipped these systems lands in the same neighborhood. Two stacks with different vector databases, chunking strategies, and embedding models produce real-world quality differences far smaller than the gap between a well-written and a poorly-written prompt template — and smaller still than the gap created by the quality of the documents themselves. Any engineer who has run RAG in production will tell you:

No amount of clever chunking or fancy architecture can fix fundamentally bad data.

The evidence backs this up. Gartner predicts that through 2026, organizations will abandon 60% of AI projects that aren't supported by AI-ready data, and found that 63% of organizations either lack the right data management practices for AI or aren't sure they have them [8]. NTT DATA's survey of 2,300+ IT and business leaders found 70–85% of GenAI deployments failing to meet ROI expectations, with unprepared data and misaligned strategy as leading causes [9].

What actually determines whether a knowledge base succeeds comes down to three things: source quality, user profile, and consumption scenario. All three are business problems, not technical ones.

Root Causes

Technology is what can be budgeted. Tech stacks fit neatly into RFPs, vendor comparisons, and line items. "Fix our documents" doesn't.
Nobody wants to open the data conversation — because step one is admitting, "Our documents are a mess."
Engineers optimize what they control. Chunking strategy is tweakable; the org's writing culture is not. So effort flows to the 10% lever instead of the 90% lever.

Solutions & Best Practices

Invert the budget. As a rule of thumb from my own projects — not a researched benchmark — I'd start at roughly 60% to data governance, 25% to UX and workflow integration, and 15% to the tech stack. Yes, that ratio looks strange in an RFP. That's the point. Gartner's analysis found organizations with successful AI deployments invest several times more in data foundations than those that fail [10].
Build a golden question set before you build anything else. Collect 100–200 real questions from the target users, each with a known-correct answer. Every change — chunking, prompts, document cleanup — gets evaluated against it. This turns "does it feel better?" into a measurable regression test.
Run a one-week document quality assessment as a hard gate. Randomly sample 50 of the documents employees actually use, run the AI against real questions, and score the output. Formalize it into a rubric — freshness, ownership, answerability. No passing score, no project.
Learn from companies that cleaned house first. Before deploying Microsoft 365 Copilot, engine manufacturer Cummins spent the preparation phase on data classification, sensitivity labeling, and retention policies — explicitly because "providing it secure, clean data was critical for generating accurate responses" [11]. IT-services giant Kyndryl ran a six-month archive-and-retention overhaul across its global document corpus before its AI rollout, achieving what it called "AI readiness" along the way [12]. One professional-services firm that cleaned 18.4 TB of redundant and outdated files out of its SharePoint environment saw Copilot answer accuracy double [13].

Problem 3: Your Documents Are a Mess — and Most Knowledge Was Never Written Down Anyway

The Problem

Knowledge management research has converged on an uncomfortable rule of thumb: only about 20% of what an organization knows is captured in formal systems; the remaining 80% is tacit — it lives in people's heads. This 80/20 split is a widely-cited estimate in knowledge management research, traced to Gartner analysis [14]. IDC puts it bluntly: in knowledge-intensive industries, "the proportion of expertise that lives solely inside people's heads is almost certainly larger than leadership assumes" [15].

What's in that missing 80%?

Judgment — why this contract clause can be conceded for this client but not that one.
Situational awareness — why this bug must not ship on a Friday afternoon, even if every test passes.
Informal know-how — which veteran employee a newcomer must talk to before they can learn the real status of a project.

Most of it has never been written down in a form a knowledge base can use — and asking experts to "write documentation" won't change that (the capture mechanisms that do work come later in this section). The knowledge base you're building indexes the least valuable 20%.

And that 20% is in bad shape, too. The real state of enterprise "documentation": slide decks made for presenting, not reading. Word files made for archiving, not consulting. A wiki written three years ago by someone who has since left. Process diagrams drawn to pass an audit. The most accurate information lives in the heads of five veteran employees who have no time to write it down — and no incentive, because writing down your expertise makes you more replaceable.

Feed this pile to an AI and the AI faithfully retrieves equally bad content. Even purpose-built, professionally engineered RAG systems struggle: Stanford's RegLab and HAI benchmarked the leading commercial legal AI research tools and found they hallucinate between 17% and 33% of the time — and these are products built by LexisNexis and Thomson Reuters with full control over their corpora [16][17]. One data-governance study found RAG running on unvetted content produced a 52% fabrication rate, versus near zero on curated content — same architecture, different source quality [18].

Users hit a few wrong answers, stop trusting the system, and never come back to check whether it improved. Once trust breaks, the system is permanently dead for that user.

Root Causes

Documents were written for other purposes. Reporting, archiving, audit compliance — almost never for "answering a colleague's question."
Tacit knowledge has no capture mechanism. The highest-value knowledge is exchanged in chats, meetings, and code reviews, then evaporates.
The incentives point the wrong way. Documentation effort is invisible in performance reviews, and experts quietly understand that hoarding knowledge is job security.
Garbage in, garbage out is unforgiving in AI. A stale page in a wiki is an inconvenience; the same stale page confidently paraphrased by an AI is a trap.

Solutions & Best Practices

For the explicit 20% that exists on paper:

Give every document an owner and an expiry date. Anything past its review date is automatically quarantined from retrieval — better no answer than a stale one.
Govern before you index. Follow the Cummins and Kyndryl playbook from Problem 2: classification, retention, and cleanup before the AI ever sees the corpus [11][12]. One global healthcare organization assessed 600 TB of file-share data and disposed of 245 TB — 116 million redundant or irrelevant files — before migrating what remained into a system AI tools could safely use [19].

For the tacit 80% that lives in people's heads:

Capture knowledge where it already leaks out. Experts answer questions all day — in Slack threads, support tickets, code reviews, CRM notes. Mine those channels instead of begging people to "write documentation."
Build an "answer once, store forever" loop. When a veteran answers a question in chat, a lightweight workflow promotes that answer into curated, reviewed FAQ content. The expert's marginal cost: near zero.
Fix the incentive directly. Count documentation contributions in performance reviews, and attribute answers to their source by name. Writing things down should build reputation, not replaceability.

Problem 4: Built for Everyone, Useful to No One

The Problem

Page one of every project charter says the same thing: "Build an enterprise-wide AI knowledge base that empowers everyone, giving every employee instant access to the knowledge they need."

Sounds wonderful. For the AI, it's a death sentence.

Legal wants to ask, "Is there a problem with this clause?" Customer service wants to ask, "How do I handle this customer's return?" Sales wants to ask, "What was this client's contract value last year?" HR wants to ask, "Can this candidate's work history be verified?" These four questions require completely different knowledge sources, retrieval methods, context, and answer formats.

Building an "ask me anything" system means cramming four different products into one chat box. All four user groups try it, feel that something is vaguely off, and never click again. And when broad chatbots are pushed into the real world anyway, the failures become public:

Air Canada was held legally liable after its website chatbot invented a bereavement-fare refund policy; a tribunal rejected the airline's argument that "the chatbot is a separate legal entity responsible for its own actions" and ordered damages paid [20].
New York City's MyCity business chatbot was found telling employers they could take a cut of workers' tips and businesses they could refuse cash — both illegal — within months of launch [21].
DPD's customer service chatbot swore at a customer, wrote a poem about its own uselessness, and called its employer "the worst delivery firm in the world"; the screenshots got 1.3 million views before the AI was pulled [22].
McDonald's shut down its IBM-powered AI drive-through ordering at over 100 locations after viral videos showed it adding bacon to ice cream and ringing up hundreds of dollars of unwanted nuggets [23].

(None of these four were RAG knowledge bases in the strict sense — they were customer-facing chatbots. I cite them as cautionary examples of the same product mistake this section is about: a broad, under-scoped AI interface deployed where a narrow, well-defined one was needed.)

Now compare what happens when companies go narrow:

Morgan Stanley built a GPT-4 assistant for exactly one audience — its financial advisors — over exactly one corpus — its 350,000-document research library. Adoption reached 98% of advisor teams, and document access jumped from 20% to 80% [24].
Klarna's customer-service AI, scoped tightly to support conversations, handled 2.3 million chats in its first month — two-thirds of all volume, the workload of 700 agents — and cut resolution time from 11 minutes to under 2 [25].
JPMorgan's COIN does one thing: review commercial loan agreements. It eliminated an estimated 360,000 hours per year of lawyer and loan-officer document review [26].
A&O Shearman (formerly Allen & Overy) deployed Harvey AI specifically for contract review; roughly 2,000 lawyers use its ContractMatrix tool daily, saving about 7 hours per contract [27].

Root Causes

"For everyone" feels safe politically. Nobody's department gets snubbed, so the charter writes itself — at the cost of building for nobody in particular.
One interface cannot serve incompatible needs. Answer style, source corpus, and required precision vary so much across roles that a single generalist system is mediocre at all of them.
Vague scope prevents evaluation. You can't build a golden question set for "everything," so quality is never measured and never improves.

Solutions & Best Practices

Pick one role and one high-frequency scenario — go narrow and go deep. My one client that succeeded built a contract clause review assistant for account managers. That narrow: 80 target users, an average of six uses per person per day, full adoption within three months. That is what a real launch looks like.
When you expand, don't widen the product — add a router. A classification layer dispatches each query to the right vertical assistant. Each vertical stays narrow with its own corpus, prompt template, and answer format; the routing layer creates the illusion of breadth.
Expand only after the previous vertical hits its adoption target. Earn the next persona. Morgan Stanley didn't start with all 50,000 employees — it started with advisors, hit 98% adoption, and only then expanded to other tools and audiences [24].
Don't fear being narrow — it is the precondition for success, not a compromise.

Problem 5: People Don't Know What to Type Into the Box

The Problem

The default knowledge-base interface is a chat box waiting for the user to type a question. This design assumes users can describe their own problem.

In reality, they can't.

Here's what actually happens: a user types "I'd like to learn about policy X," the AI returns a paragraph of boilerplate, the user finishes reading with no idea what to ask next, closes the tab, and never returns.

This isn't a hunch — it's one of the best-documented findings in AI usability. Jakob Nielsen calls it the articulation barrier: most people struggle to express their needs precisely in written prose. Nielsen Norman Group estimates that fewer than 20% of people are articulate enough in writing to make advanced use of prompt-driven AI, and that prompt-only interfaces effectively exclude about half the workforce [28]. Their follow-up research found new users "had a difficult time understanding what a GenAI bot can do," and that visible prompt controls — suggested prompts, role-based galleries, quick-action buttons — measurably improved both usage and answer quality [29].

The user's real need is: "I'm in situation X right now — what does company policy say I should do?" That's specific, contextual, and requires judgment. But they won't type it. Partly effort; partly because they haven't yet articulated the problem even to themselves.

Root Causes

The blank box demands recall; good UX provides recognition. Fifty years of HCI research says menus beat memorization — the chat box ignores all of it.
Users don't know the system's capabilities, so they can't calibrate their questions to what it can actually answer.
The burden of context is on the wrong party. The system knows who the user is, what they're working on, and what just happened — and then asks them to type it all out anyway.

Solutions & Best Practices

Replace the blank chat box with scenario-based entry points. In one client redesign of mine, a "Compliance Knowledge Base" chat box was rebuilt as six buttons — "I need to review a contract," "I need to respond to a customer complaint," "I need to draft an outgoing email," "I need to verify a data request," and so on. Each button opens a short 3–4 field form that collects full context up front; on submission the user gets a specific, actionable answer. Same underlying data — usage rose 3–5x. The entire difference lies in the AI actively collecting context versus waiting for the user to describe the problem.
This is exactly what the big vendors converged on. Microsoft's answer to the articulation barrier at enterprise scale is the Copilot Prompt Gallery — curated prompt collections organized by role and function, with shareable team prompts — which it explicitly positions as its mechanism for driving workplace AI adoption [30].
Auto-inject context. Pull the user's role, the CRM record they're viewing, the email thread they're in. The AI should arrive already knowing 80% of the situation.
Add clarifying-question loops. When a query is vague, the assistant asks one targeted follow-up instead of returning boilerplate.
Use proactive triggers. The AI surfaces when an event happens — a contract is uploaded, a complaint ticket opens — rather than waiting in a tab nobody visits.
Don't skip training. Slack's Workforce Index found only 15% of workers feel adequately trained on AI tools — but workers who get training are up to 19x more likely to report that AI actually improves their productivity [31].

Problem 6: Most Companies Asking for One Don't Actually Need One

The Problem

Here I have to say something that may not please the people funding these projects: most clients who say they want a knowledge base are trying to solve problems that don't require building one at all.

After years of asking, I've sorted clients' real needs into three categories:

Category 1: "I have a specific document and want AI to help me understand it." Reviewing a contract, reading a tender, analyzing a financial report. This requires no knowledge base whatsoever — just give the document to the AI as context. Today's frontier models accept a million tokens of context — roughly 750,000 words, longer than the entire Lord of the Rings trilogy [32][52]. For one-off or small, bounded document sets, direct context is more accurate than retrieval and costs a fraction of a RAG pipeline to build — though at high query volumes over large corpora, RAG remains cheaper per query. (Peer-reviewed benchmarks confirm long-context models consistently outperform RAG on accuracy when the corpus fits; RAG's remaining advantage is per-query cost at scale [33].)

Category 2: "I want AI to perform a specific action for me." Generating weekly reports, drafting emails, replying to customers. This needs skills plus business-system integration, not a knowledge base. Encapsulate the rules as reusable workflows and connect them to the CRM or ERP — the results far outperform querying a static document index.

Category 3: "I want AI inside my existing work environment." Directly callable in Slack or Teams, auto-drafting in email, surfacing suggestions inside the CRM. This needs embedding and plugins, not a knowledge base. A standalone knowledge base is an island — employees must deliberately switch tabs to use it, and that single extra step is enough to cut active usage in half. The evidence for embedding is overwhelming: GitHub Copilot — AI embedded directly in the editor developers already use — reached 90% of the Fortune 100, with Accenture's 50,000-developer study showing 96% success among initial users precisely because there was no new tab and no new habit to form [34]. Glean, which surfaces answers inside Slack, the browser, and email rather than in a separate portal, reports users averaging five queries a day with a daily-to-monthly active ratio near 40% — double to quadruple typical enterprise software [35]. Gong embeds AI insights directly in the sales workflow and saw AI feature usage grow 50% in a year [36].

In my consulting work, these three categories cover about 80% of the clients who say "we want a knowledge base" — a tally from my own engagements, not a market statistic. Every one of them is cheaper, faster, and more effective than building one.

The scenarios where a knowledge base genuinely fits are narrow: massive volumes of heterogeneous documents (thousands or more), low-frequency but high-stakes queries (not daily high-frequency tasks), and strong recall-completeness requirements (compliance, audit, legal discovery). In a typical enterprise, such scenarios account for less than 20% of the demand.

Root Causes

"Knowledge base" is the only vocabulary executives have for "AI that knows our stuff." The real need hides behind the label.
Vendors sell platforms, not triage. Nobody's sales team is incentivized to say "you don't need this."
The simpler alternatives are invisible because they don't require a procurement process — pasting a document into a long-context model doesn't generate an RFP.

Solutions & Best Practices

Run every request through this decision tree before building anything:

Only the bottom-right branch justifies a knowledge base. Everything else is cheaper, faster, and more accurate without one.

And note the default at the bottom: agentic search — letting the AI navigate documents with plain-text indexes and search tools, loading files on demand — is increasingly the right starting point even for large corpora. More on that in the closing section.

Problem 7: Nobody Keeps It Up to Date — So It Quietly Goes Stale

The Problem

The seventh failure is the least dramatic-looking — and the most reliably fatal: maintenance costs.

A knowledge base is not a build-once asset. It is a perishable good with a shelf life. Documents change every month — policy updates, process revisions, product iterations, reorgs. A knowledge base that doesn't keep pace starts serving expired answers within a quarter. Practitioner analyses of enterprise RAG deployments report measurable accuracy degradation within 90 days of going live in most of the deployments they examine — with document staleness as the usual culprit [37]. That's industry commentary rather than a controlled study, but it matches what I've seen in the field. Research on embedding decay shows stale content can quietly degrade retrieval accuracy by up to 20%, with no warning signal to users, and identifies staleness as a primary reason RAG deployments lose adoption three to six months after launch [38].

And no role inside the enterprise has any incentive to maintain it: document authors forget their files the moment they're written, IT owns the system but not the content, and business teams assume maintenance is IT's job.

The result is a knowledge base that becomes a machine for generating confident, outdated answers — still running on the surface, but misleading users daily. This is worse than having no knowledge base at all. As one practitioner put it: if your AI tells a customer "our return policy is 30 days" when it changed to 14 days six months ago, "you don't have a data quality problem — you have a trust problem" [39].

There's also a rarely-discussed technical reason "manual maintenance" cannot work. Updating one document isn't just re-uploading a file. You must re-chunk it, regenerate the embeddings, delete the old vectors, and write the new ones. For a 1,000-document knowledge base with 10% monthly churn, that's 100 such operations every month — each dependent on a human noticing what changed. No organization sustains that for more than six months.

Root Causes

Ownership vacuum. Authors, IT, and business units each assume someone else maintains the content.
The architecture multiplies the cost of change. Chunking and embedding turn "edit a document" into a multi-step pipeline operation.
Staleness is invisible until trust is gone. Unlike a crashed server, a wrong answer doesn't page anyone.

Solutions & Best Practices

Make event-driven sync the acceptance criterion — or walk away. Either changes in the source systems (CMS, CRM, HR) automatically trigger re-processing of only the changed document, or don't build the project. Schemes like "scheduled quarterly updates" or "appoint a knowledge-base administrator" should be refused outright — the workload is architecturally unsuited to manual handling.
This is a solved engineering pattern. Change-data-capture (CDC) pipelines watch source systems and emit every insert, update, and delete as an event; downstream workers re-embed only what changed. Production implementations achieve source-change-to-updated-index latency of seconds to minutes [40]. Modern vector databases like Pinecone index updates within seconds without full re-indexing [41]. Commercial platforms do the same: Glean's connectors combine scheduled crawls with webhooks that process changes within 1–5 minutes for supported systems [42].
Defend against staleness in the answer layer too. Every answer cites its source documents with last-updated dates; documents past their review SLA are excluded or visibly flagged.
Monitor the feedback loop. Log unanswered and thumbs-down queries weekly. That list is simultaneously your maintenance queue and your tacit-knowledge capture queue (Problem 3) — the two hardest problems feeding each other's solution.

Closing Thoughts: Even the Way These Systems Are Built Is Going Out of Date

One more thing worth saying — about the shelf life of this architecture itself.

In 2026, frontier models offer context windows of a million tokens or more — Anthropic now ships a 1M-token window at standard per-token pricing [52], and Google and OpenAI offer comparable windows [32]. For most enterprises' actual document volumes, you don't need a vector database, you don't need chunking, you don't need embeddings — put the documents directly into context. It's more accurate, faster to build, and easier to debug than RAG [33].

There's a striking proof point. Claude Code, Anthropic's own coding agent, uses no vector database. Its creator, Boris Cherny, has described publicly — in interviews and in his own Hacker News comments — how early versions used RAG with a local vector store, but the team dropped it because plain agentic search — the model running ordinary search tools like glob and grep in as many cycles as it needs — "just outperformed everything" while eliminating the security, staleness, and reliability problems of an index; an Anthropic engineer in the same discussion added that it won "by a lot" [43][44]. When the people best positioned to build RAG systems choose not to use RAG in their own flagship tool, that's worth pausing on.

The standalone RAG knowledge base is a 2023 solution. Back then, context windows were 8,000 tokens and retrieval was mandatory. Using 2023 architecture for 2026 problems is, much of the time, manufacturing unnecessary engineering complexity.

"Build a knowledge base" is also a continuation of traditional IT thinking — centralize the information, unify the interface, and wait for users to come. That paradigm has failed for 30 years. The more effective direction is the opposite: push AI to where the work happens — inside Slack and Teams, inside email, inside the CRM. Not users going to find the AI, but the AI showing up at the scene of the user's work. That doesn't require a knowledge base. It requires integration.

What comes after the knowledge base: connect the work itself

After the original version of this essay circulated, the most interesting pushback came from teams a step ahead. Two ideas are worth recording.

First: "We let AI digest the documents into a wiki." One team described their pipeline: feed the raw document pile to a frontier model, have it rewrite everything into a clean canonical wiki, then run repeated cross-review between models before anything is published. Expensive, deliberate, and by their account worth it. Open-source tooling for exactly this pattern now exists, with paragraph-level provenance markers linking every claim back to its source file and line range [45].

This is a genuine upgrade to the Problem 3 playbook — AI fixes form at a scale humans never will. Presentation decks become readable pages; archived files become consultable references. But two cautions:

Beware error laundering. AI digestion can rewrite a stale or wrong source into a confident, polished page. The ugly original at least signaled "old document"; the digested version reads as authoritative. Every digested page must keep dated provenance links to its sources, high-stakes pages need human expert sign-off, and the cross-review between models should be framed as contradiction hunting, not polish [46].
The digested wiki still expires. Problem 7 applies in full. Without source-change triggers for re-digestion, you've built a better-written machine for outdated answers.

Second — and deeper: the context graph. The sharpest observation was this: what teams actually need to align on are decisions, reasoning, and ideas. Document content only matters as evidence supporting those. The wiki page is the footnote; the decision in flight is the text.

Follow that logic and you arrive where this essay's closing argument was already pointing: each enterprise's unique context doesn't live in the wiki. It lives in goal tracking, project management, requirements, customer feedback, and code — alignment artifacts produced as a byproduct of work, requiring zero documentation effort, and inherently fresh because they are the work. Platforms that link these systems into a connected graph and expose it to AI agents attack the seven problems structurally:

Problem 3 disappears: the tacit 80% — judgment, rationale, decisions — gets captured where it's made, not where someone is begged to write it down.
Problem 7 takes care of itself: the source systems are the source of truth. There is no separate corpus to go stale.
Problem 6's island effect vanishes: the agent lives where the context lives.

This is now a real product category, not a thought experiment. Atlassian's Teamwork Graph connects over 150 billion objects and relationships across Jira, Confluence, Goals, and 75+ third-party tools, and exposes that graph to any AI agent via a Model Context Protocol (MCP) server [47][48]. Microsoft's Work IQ layer makes Microsoft 365 work data available to agents with the explicit pitch "no need to manage vector stores, data sync jobs, or custom compliance enforcement" [49]. Glean built its enterprise search on a knowledge graph linking content, people, and activity rather than a flat document index [50]. And the Model Context Protocol (MCP) — the open standard Anthropic launched in 2024, since adopted by OpenAI, Google, and Microsoft — is the connective tissue that lets external agents query all of these systems without centralizing the data [51].

Two honest caveats before anyone over-rotates:

The context graph inherits the culture problem. If tickets have empty descriptions and goals are annual theater, the graph is a network of voids. Garbage-in still applies — just to a different substrate. Technology is still the smallest lever.
Agent permissions are mostly unsolved. A graph spanning HR, CRM, and legal needs per-user, per-agent access boundaries. Most organizations haven't designed these, and "the agent saw a document the user couldn't" is a worse incident than any stale answer.

But the direction is right, and it's the same direction this whole essay argues: stop centralizing documents and start connecting context. The knowledge base asked, "Where do we put what we know?" The context graph asks the better question: "How does the AI see what we're actually doing?"

Before approving your next "AI knowledge base project," stop and ask yourself:

Is the problem you actually want to solve being held hostage by the three words "knowledge base"?

References

Sources that are not primary research or first-party announcements are annotated inline: (vendor case study), (vendor commentary), (practitioner commentary), (industry commentary), or (secondary summary of primary research).

McKinsey Global Institute, The Social Economy: Unlocking Value and Productivity Through Social Technologies (2012) — https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/the-social-economy
Starmind, Future of Work Report: The High Cost of Inaccessible Knowledge (2022) — https://www.starmind.ai/hubfs/Assets%202022/Future%20of%20Work%20Report%20-%20The%20High%20Cost%20of%20Inaccessible%20Knowledge/Future%20of%20work_Research%20report.pdf
Slite, Enterprise Search Survey Report (2026) — https://slite.com/learn/enterprise-search-survey-findings
Gartner portal adoption findings (2012 Portal, Content and Collaboration Summit), via Prescient Digital (secondary summary of primary research) — https://prescientdigital.com/articles/intranet-articles/five-common-portal-problems-and-their-solutions
Gartner press release, Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025 (July 2024) — https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025
Gartner, Why Half of GenAI Projects Fail: Avoid These 5 Common Mistakes (January 2026) — https://www.gartner.com/en/articles/genai-project-failure
MIT NANDA, The GenAI Divide: State of AI in Business 2025 (August 2025), via Tech.co (secondary summary of primary research) — https://tech.co/news/mit-enterprise-ai-pilots-fail-revenues
Gartner press release, Lack of AI-Ready Data Puts AI Projects at Risk (February 2025) — https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
NTT DATA, Global GenAI Report 2024 — https://www.nttdata.com/global/en/insights/focus/2024/between-70-85p-of-genai-deployment-efforts-are-failing
Truescreen, Why GenAI Projects Fail: The Data Authenticity Problem (citing Gartner) (secondary summary of primary research) — https://truescreen.io/articles/why-genai-projects-fail-data-authenticity/
Microsoft Customer Stories, Cummins: Data Governance Before Copilot — https://www.microsoft.com/en/customers/story/18830-cummins-microsoft-365-e5
Iron Mountain, Kyndryl: Streamlining Data Complexity to Achieve Audit and AI Readiness — https://www.ironmountain.com/en-ca/resources/case-studies/s/streamlining-data-complexity-to-achieve-audit-and-ai-readiness
Aparavi, Microsoft Copilot Case Study: Professional Services (2025) (vendor case study) — https://aparavi.com/wp-content/uploads/2025/11/Microsoft-Copilot-case-study-Professional-Services-9.pdf
KMHelpDesk, Tacit vs. Explicit Knowledge (citing Gartner's estimate that only ~20% of enterprise knowledge is captured in formal systems) (secondary summary of primary research) — https://www.kmhelpdesk.com/tacit-vs-explicit-knowledge.php
IDC, The Knowledge Your AI May Never Have — https://www.idc.com/resource-center/blog/the-knowledge-your-ai-may-never-have/
Stanford RegLab & HAI, Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, Journal of Empirical Legal Studies (2025) — https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf
Stanford HAI, AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries — https://hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-or-more-benchmarking-queries
Atlan, LLM Knowledge Base Data Quality (vendor commentary) — https://atlan.com/know/llm-knowledge-base-data-quality/
Thrivence, Enhancing Data Governance for a Global Healthcare Organization — https://www.thrivence.com/insights/enhancing-data-governance-and-streamlining-information-management-for-a-global-healthcare-organization/
Ars Technica, Air Canada Must Honor Refund Policy Invented by Airline's Chatbot (February 2024) — https://arstechnica.com/tech-policy/2024/02/air-canada-must-honor-refund-policy-invented-by-airlines-chatbot/
The Markup, NYC's AI Chatbot Tells Businesses to Break the Law (March 2024) — https://themarkup.org/artificial-intelligence/2024/03/29/nycs-ai-chatbot-tells-businesses-to-break-the-law
BBC News, DPD AI Chatbot Swears, Calls Itself 'Useless' and Criticises Firm (January 2024) — https://www.bbc.com/news/technology-68025677
CNBC, McDonald's to End IBM AI Drive-Thru Test (June 2024) — https://www.cnbc.com/2024/06/17/mcdonalds-to-end-ibm-ai-drive-thru-test.html
OpenAI Customer Stories, Morgan Stanley — https://openai.com/customer-stories/morgan-stanley
Klarna press release, Klarna AI Assistant Handles Two-Thirds of Customer Service Chats in Its First Month (February 2024) — https://www.klarna.com/international/press/klarna-ai-assistant-handles-two-thirds-of-customer-service-chats-in-its-first-month/
The Independent, JPMorgan Software Does in Seconds What Took Lawyers 360,000 Hours (2017) — https://www.the-independent.com/news/business/news/jp-morgan-software-lawyers-coin-contract-intelligence-parsing-financial-deals-seconds-legal-working-hours-360000-a7603256.html
Harvey, Customer Story: A&O Shearman — https://www.harvey.ai/customers/a-and-o-shearman
Nielsen Norman Group, The Articulation Barrier: Prompt-Driven AI UX Hurts Usability — https://www.nngroup.com/articles/ai-articulation-barrier/
Nielsen Norman Group, Prompt Controls in GenAI Chatbots — https://www.nngroup.com/articles/prompt-controls-genai/
Microsoft Tech Community, New Copilot Prompt Gallery Helps You Discover, Save, and Share Your Favorite Prompts (November 2024) — https://techcommunity.microsoft.com/blog/microsoft365copilotblog/new-copilot-prompt-gallery-helps-you-discover-save-and-share-your-favorite-promp/4279600
Slack, Workforce Index (June 2024) — https://slack.com/blog/news/the-workforce-index-june-2024
Presenc AI, The LLM Context Window Race 2023–2026 (industry commentary) — https://presenc.ai/research/llm-context-window-race-2023-2026
Li et al., Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach (arXiv:2407.16833, 2024) — https://arxiv.org/html/2407.16833v2
GitHub Blog, Research: Quantifying GitHub Copilot's Impact in the Enterprise with Accenture — https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-in-the-enterprise-with-accenture/
Glean, Glean Achieves $100M ARR in Three Years (Business Wire, February 2025) — https://www.businesswire.com/news/home/20250205543527/en/Glean-Achieves-100M-ARR-in-Three-Years-Delivering-True-AI-ROI-to-the-Enterprise
Gong press release, Revenue Organizations Using AI in 2024 Reported 29% Higher Sales Growth Than Their Peers — https://www.gong.io/press/revenue-organizations-using-ai-in-2024-reported-29-percent-higher-sales-growth-than-their-peers-according-to-new-report-from-gong
Tianpan, Enterprise RAG Knowledge Base Governance (April 2026) (practitioner commentary) — https://tianpan.co/blog/2026-04-17-enterprise-rag-knowledge-base-governance
Atlan, LLM Knowledge Base Staleness (vendor commentary) — https://atlan.com/know/llm-knowledge-base-staleness/
Prashant Dudami, RAG: The Data Architecture Problem Nobody Talks About (practitioner commentary) — https://www.prashantdudami.com/blog/rag-data-architecture
RisingWave, Building Real-Time Data Pipelines for RAG Applications — https://risingwave.com/blog/real-time-data-pipeline-rag-applications/
Pinecone, How Pinecone Works — https://www.pinecone.io/how-pinecone-works/
Glean Docs, Connector Crawling Frequency — https://docs.glean.com/connectors/crawling-frequency
The Pragmatic Engineer, Building Claude Code with Boris Cherny (interview with Claude Code's creator on dropping vector-store RAG for agentic search) — https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny
Boris Cherny and Anthropic engineers, Hacker News discussion on Claude Code's retrieval approach ("agentic search outperformed [RAG] by a lot") — https://news.ycombinator.com/item?id=43164253
atomicstrata, llm-wiki-compiler (open-source LLM wiki compilation with provenance) — https://github.com/atomicstrata/llm-wiki-compiler
Longterm Wiki, Reducing AI Hallucinations in Wiki Content (practitioner commentary) — https://longterm-wiki.vercel.app/approaches/reducing-hallucinations
Atlassian, Teamwork Graph at Team '26 — https://www.atlassian.com/blog/company-news/teamwork-graph-team-26
Atlassian Community, Use Teamwork Graph in Rovo MCP Server (Open Beta) — https://community.atlassian.com/forums/Atlassian-AI-Rovo-articles/Use-Teamwork-Graph-in-Rovo-MCP-Server-Open-Beta/ba-p/3227595
Microsoft Learn, Work IQ API Overview — https://learn.microsoft.com/en-us/microsoft-365/copilot/extensibility/work-iq/api-overview
Glean Docs, Knowledge Graph — https://docs.glean.com/security/knowledge-graph
Anthropic, Introducing the Model Context Protocol (November 2024) — https://www.anthropic.com/news/model-context-protocol
Anthropic, 1M Context Is Now Generally Available for Opus 4.6 and Sonnet 4.6 (March 2026) — https://www.claude.com/blog/1m-context-ga

DEV Community

Don't Build That RAG Knowledge Base — Seven Reasons It Will Fail, and What to Build Instead

Companies Have Been Failing at This for 30 Years — AI Won't Change That by Itself

Problem 1: A Big Launch, Then Nobody Uses It

The Problem

Root Causes

Solutions & Best Practices

Problem 2: The Technology Matters Far Less Than You Think

The Problem

Root Causes

Solutions & Best Practices

Problem 3: Your Documents Are a Mess — and Most Knowledge Was Never Written Down Anyway

The Problem

Root Causes

Solutions & Best Practices

Problem 4: Built for Everyone, Useful to No One

The Problem

Root Causes

Solutions & Best Practices

Problem 5: People Don't Know What to Type Into the Box

The Problem

Root Causes

Solutions & Best Practices

Problem 6: Most Companies Asking for One Don't Actually Need One

The Problem

Root Causes

Solutions & Best Practices

Problem 7: Nobody Keeps It Up to Date — So It Quietly Goes Stale

The Problem

Root Causes

Solutions & Best Practices

Closing Thoughts: Even the Way These Systems Are Built Is Going Out of Date

What comes after the knowledge base: connect the work itself

References

Top comments (0)