DEV Community

zu
zu

Posted on

Stop Optimizing RAG - The LLM Wiki Knowledge Base for Agents

Introduction: A Common Knowledge Base Trap in Agent Development

By 2026, the conversation around RAG knowledge bases has changed. Many developers have grown skeptical of using RAG to build knowledge bases for Agents, as Agent engineering moves toward more precise context management.

When the idea of RAG-powered Agent knowledge bases first gained traction, many teams and leaders became obsessed with dumping every long, outdated legacy document in the company into a knowledge base. They expected RAG retrieval to breathe new life into those documents—or assumed that feeding all existing internal documentation into RAG would allow an Agent to quickly understand internal problems and reduce the burden on engineers.

Traditional RAG is not well suited to serving directly as an Agent's core external knowledge layer. This should be obvious.

Even when an Agent needs a knowledge base, it needs one that is highly semantic—not merely a retrieval-oriented system like RAG.

Many of the technical applications of Agents are still being explored, so no conclusion can cover every business scenario. But a few simple observations are already clear:

  1. For both companies and individuals, only a small portion of their documentation contains genuinely valuable information. Low-value material does not become valuable simply because it is handed to an LLM.
  2. The combination of a powerful knowledge base and an all-purpose Agent is appealing, but helping an LLM understand new knowledge through context requires considerable care. RAG retrieval is too crude compared with actual task execution.
  3. Leading LLMs already possess sufficiently broad and deep general knowledge. If an Agent product only works after constantly “connecting it to new knowledge,” its use case and product boundaries probably have not been thought through.
  4. Scenarios in which an Agent depends on large amounts of external knowledge will inevitably remain niche.
  5. Treating large-scale external knowledge and better RAG retrieval as the main path to improving Agent capabilities delivers poor efficiency, user experience, and cost-effectiveness. Unless the use case strongly depends on it, investing heavily in this direction is generally not worthwhile.

The Evolution of Knowledge Base Technologies for Agents

RAG knowledge base technology has evolved from fixed-size chunking to semantic chunking and graph-based knowledge organization. More recently, Karpathy's LLM Wiki has introduced a path distinct from traditional RAG: let AI read and organize the material in advance, then continuously maintain a structured, interconnected Markdown knowledge base.

This article examines the evolution of RAG knowledge bases in Agent scenarios, as well as the LLM Wiki concept that is gradually entering the field of Agent engineering.

The evolution of knowledge base technologies for Agents

Outline

  • Traditional RAG: Fixed Chunking and One-Dimensional Retrieval
  • Advanced Semantic RAG: Late Chunking and Parent-Child Decoupling
  • Graph RAG: Graph-Enhanced Knowledge Organization and Reasoning
  • Key Point: The Limit of RAG—Finding Material Rather Than Understanding Knowledge
  • LLM Wiki: From Stateless Retrieval to Stateful, Continuous LLM Compilation
  • Karpathy's Three-Layer LLM Wiki Design
  • Building a Personal Knowledge Base with LLM Wiki
  • Building an Enterprise Agent Knowledge Base with LLM Wiki
  • Where LLM Wiki Works Best
  • Limitations of LLM Wiki
  • A Comparison of Knowledge Base Technologies for Agents
  • Production Architecture: A Layered Hybrid Approach
  • Final Thought 1: Why Is RAG Alone Unsuitable for Agents?
  • Final Thought 2: How Should You Choose the Right Technology?

Traditional RAG: Fixed Chunking and One-Dimensional Retrieval

Traditional RAG is essentially stateless “coarse filtering and hard concatenation.” The system mechanically divides long documents into physical chunks of a fixed character or token length, maps them into a high-dimensional geometric space using an Embedding model, and then performs nearest-neighbor retrieval based on vector similarity.

Core Mechanisms

  • Sliding Window: Adjacent text chunks overlap—typically by 10% to 20%—to mitigate semantic breaks at chunk boundaries.

  • Similarity Calculation: Geometric metrics such as cosine similarity are used to match the user's query vector against chunk vectors in the knowledge base.

Limitations and Pain Points for Agents

  • Fragmented information and unclear pronoun references: Mechanical chunking can easily separate complete causal relationships from their contextual entities. As a result, an Agent may retrieve noisy passages filled with words such as “it” or “the company,” without a clear referent.

  • Semantic isolation across documents: Each document is encoded independently by the Embedding model. Complex references, comparisons, and evolutionary relationships between Document A and Document B cannot be represented by isolated geometric points in vector space.

  • Hallucinations caused by mechanical assembly: During retrieval, the Top-K isolated chunks are often joined with line breaks and little else, destroying the logical structure of the original language. When an Agent encounters conflicting fragments or information from different points in time, it can easily lose logical coherence.

The failure chain of traditional RAG

Advanced Semantic RAG: Late Chunking and Parent-Child Decoupling

To preserve as much semantic completeness as possible within an individual document—without changing the underlying vector retrieval architecture—RAG systems began introducing preprocessing enhancements and structural decoupling.

Common approaches include:

  • Parent-Child Documents: Use small chunks for retrieval, then restore the larger parent block after a match to avoid overly fragmented context.
  • Semantic Chunking: Split content at semantic boundaries instead of mechanically using a fixed length.
  • Late Chunking: Encode the long document as a whole before dividing it, allowing each chunk to retain more global context.
  • Contextual Retrieval: Before indexing, use a lightweight LLM to add context to each chunk, such as its section, referenced entities, and position in the document.

Limitations and Pain Points for Agents

These approaches clearly improve retrieval quality within a single document, but the overall architecture still revolves around chunks. They mitigate the semantic loss caused by chunking, but they do not truly solve cross-document knowledge relationships or long-term knowledge accumulation.

Graph RAG: Graph-Enhanced Knowledge Organization and Reasoning

To overcome the physical isolation of intersecting knowledge across documents, Graph RAG moves knowledge organization from unstructured text retrieval toward structured knowledge reasoning.

Graph RAG expands the representation of knowledge from one-dimensional points into a network topology. During ingestion, it applies LLM capabilities upfront, restructuring the entire document collection into a logical network of entities (Nodes) and relationships (Edges).

Core Mechanisms

  • Entity and relationship extraction: Use an LLM to read documents in batches and extract entities and their relationships.
  • Community detection and group summaries: Cluster the graph, then generate high-level summaries for each knowledge community.
  • Hybrid retrieval: Use community summaries for global questions, while expanding context along entity relationships and vector results for local questions.

Limitations and Pain Points for Agents

  • Write amplification and high indexing costs: During ingestion, large numbers of LLM calls are required for extraction and summarization, resulting in high token consumption and write latency.

  • Error compounding: If the LLM hallucinates a relationship during offline extraction, that error becomes embedded in the graph and continues to contaminate subsequent retrieval.

Graph RAG is effective at representing explicit entity relationships and cross-document paths, but the graph itself remains an intermediate representation produced by a one-time extraction process. It can help an Agent follow relationships to locate knowledge, but it does not naturally organize every reading, Q&A, and synthesis result into knowledge pages that can be directly read, edited, and reviewed.

Graph RAG ingestion flow and risks

Key Point: The Limit of RAG—Finding Material Rather Than Understanding Knowledge

Looking at these approaches together reveals a clear pattern. From fixed chunking to semantic chunking, parent-child documents, Late Chunking, Contextual Retrieval, and finally Graph RAG, the engineering becomes increasingly complex while optimizing for essentially the same goal:

How can we find potentially relevant material in an external knowledge base and feed it to an LLM?

The primary problem being solved is knowledge selection, not knowledge formation.

RAG remains highly useful for simple factual Q&A, long-tail material, and infrequently accessed documents. But promoting it to the core external knowledge layer of an Agent creates a mismatch. When an Agent accesses a knowledge base, it needs semantic context that can be used directly in the current task—not several passages that merely look similar.

The context an Agent needs should resemble material provided on demand by someone who understands the business: someone who knows the scenario behind the question, which sources are genuinely relevant, and which outdated conclusions, implicit assumptions, or conflicting information could affect the current decision. Ordinary RAG rarely provides this kind of context. It usually returns a set of fragments selected by an algorithm.

Context should consist of high-quality, high-precision semantic prompts—not a pile of undifferentiated text.

The limit of RAG

The shared limitations of the preceding approaches can be summarized in three layers:

  • Traditional RAG treats the knowledge base as a search engine. It can locate material, but it cannot turn that material into knowledge.
  • Semantic RAG makes retrieval smarter, but it still searches for material on demand at query time. Complexity continues to accumulate around chunking, retrieval, ranking, context length, and query rewriting.
  • Graph RAG reaches the relationship layer and works well for cross-document relationships and global summaries. But graph structures, vectors, metadata, and community summaries are still fundamentally designed to answer the question: “How can we select better material?”

This is the ceiling of the RAG approach: it keeps trying to use retrieval algorithms to compensate for knowledge that has never truly been understood or organized. No matter how complex the relationship graph becomes, it cannot fully carry integrated semantics.

What Agents Actually Need Is Organized Knowledge

The valuable knowledge in an Agent's context often comes from conclusions supported by several documents, disagreements recorded in meeting notes, constraints hidden in legacy system documentation, implicit lessons from incident reports, and the tradeoffs among all of them.

Before this material is suitable for an Agent's context, it must first be read, interpreted, compressed, connected, and revised.

Multiple knowledge sources amplify the problem. New material appears; old material expires; sources contradict one another. Old conclusions must be updated, new information merged, conflicts marked, and relationships discovered in a Q&A session may also deserve to be preserved. These are no longer retrieval problems. They are knowledge management problems.

Knowledge itself must be organized, interpreted, compressed, connected, and continuously revised in advance.

This is why I believe LLM Wiki deserves separate attention. It shifts the focus away from “how to retrieve chunks” and changes the basic unit of the knowledge base from fragments waiting to be recalled into Wiki pages that can be read, linked, edited, and evolved. At present, this appears to be an effective direction for future Agent knowledge bases.

On-demand retrieval versus continuous knowledge accumulation

LLM Wiki: From Stateless Retrieval to Stateful, Continuous LLM Compilation

The core idea of LLM Wiki can be summarized as follows: compile knowledge in advance to reduce ad hoc chunk retrieval for every question.

It treats multiple source documents as “source code” (Raw Sources), then uses an LLM to continuously compile them into a highly structured network of Markdown knowledge pages connected by bidirectional links (Backlinks). The system moves from stateless retrieval to stateful knowledge maintenance, and the knowledge base continues to evolve as new material arrives.

When you add an article, a book excerpt, a PDF, or a meeting record, the LLM proactively reads it, extracts key concepts, creates or updates pages, adds links, marks conflicts, and records the changes.

The knowledge base therefore grows richer as more material is added. Relationships and synthesis have already been completed in advance. When a question is asked later, the AI encounters an organized knowledge system instead of having to assemble fragments from raw sources on the fly.

The continuous maintenance loop of LLM Wiki

There is only one fundamental principle: Agents need knowledge that has already been understood and organized. This work has traditionally centered on the human mind rather than pure engineering. If it is to be automated, the LLM must be at the center. The right direction is to consider how an LLM can take over the cognitive work, rather than designing increasingly complex engineering systems as substitutes.

Karpathy's Three-Layer LLM Wiki Design

Karpathy's LLM Wiki design can be divided into three simple layers: raw, wiki, and schema.

1. Raw: The Source Material Layer

This layer contains original files such as PDFs, web articles, meeting notes, text files, and Markdown documents. They are the source of truth. The AI may read them, but should not modify them. This preserves the original material so that any questionable Wiki content can later be checked against its source.

2. Wiki: The AI-Generated and AI-Maintained Knowledge Base

This layer is a collection of Markdown files, including home pages, index pages, concept pages, entity pages, topic summaries, and comparison pages. Links connect these pages into a knowledge network that humans can understand and modify.

3. Schema: The Knowledge Base Rules

One example is CLAUDE.md in Claude Code. It tells the LLM:

  • What the knowledge base is for.
  • How its directory structure is organized.
  • How new material should be processed.
  • How pages should be formatted.
  • Where to look first when answering questions.
  • How to cite sources.
  • How to handle uncertain information.

Together, these layers form the core LLM Wiki architecture: Sources → Wiki → Schema. When a new document arrives, the LLM acts as a knowledge bookkeeper, reads the source in full, and directly rewrites, merges, or corrects existing Wiki pages.

At retrieval time, the system can move away from relying entirely on geometric vector matching and instead use model-driven navigation (Index-Driven Navigation). A lightweight index distilled from the Wiki pages is first provided to a long-context Agent. The Agent then uses linguistic reasoning to determine its own navigation path and read the exact Wiki pages or text blocks it needs.

The three-layer LLM Wiki architecture and data flow

Building a Personal Knowledge Base with LLM Wiki

For personal use, building an LLM Wiki is not complicated. In fact, it can be remarkably simple. A common approach today is to use Obsidian—a Markdown renderer and knowledge management application—to manage local Markdown files, while local Agents such as Claude Code, Codex, or Cursor read and write files, organize source material, generate Wiki pages, and maintain links.

A minimal structure is usually enough:

wiki-project/
  raw/       # Original source material, read-only
  wiki/      # Knowledge pages organized by AI
  AGENTS.md  # Or CLAUDE.md, defining the maintenance rules
Enter fullscreen mode Exit fullscreen mode

When adding new material, place the original document in raw, then ask the Agent to read and summarize it, split it into pages, create links, and update existing pages. Obsidian is only the browsing and management layer; underneath, the system remains a collection of Markdown files.

There are many tutorials online, and you can also download an LLM Wiki skill to handle the entire process. It is unusually straightforward.

However, unless you work with a large body of knowledge, a so-called personal knowledge base offers little practical value to most people. That does not mean it has no value at all—you can at least get an attractive knowledge graph and some emotional satisfaction from it.

Knowledge graph

Building an Enterprise Agent Knowledge Base with LLM Wiki

Enterprise Agents provide a stronger case for the practical value of LLM Wiki. But if you are looking for a general-purpose framework or a standard set of best practices, you may be disappointed. LLM Wiki is closer to a design philosophy than a standard framework that must be adopted.

So far, it has not converged on a relatively uniform engineering implementation in the way GraphRAG has. The reason is simple: its core principle is not complicated—knowledge should be understood, organized, and maintained during the write phase—and the implementation is highly specific to the business domain.

This actually creates room for Agent developers. Many Agents do not need a generic, complex, and highly abstract LLM Wiki system. A more practical approach is to design a sufficiently small knowledge layer from scratch around the actual knowledge structure of the project:

  • If the Agent depends on SOPs, design the knowledge base around task workflows, rule boundaries, exception handling, and case pages.
  • If the Agent depends on product knowledge, organize it around product modules, functional limitations, API contracts, user scenarios, and change records.
  • If the Agent depends on research material, organize it around topic pages, paper pages, concept pages, evidence pages, and controversy pages.
  • If the Agent depends on project context, organize it around architecture documentation, key decisions, directory boundaries, known issues, and current status.

In other words, do not begin by researching which LLM Wiki frameworks you can integrate. First ask:

What exactly does this Agent need to remember over the long term? How should that knowledge be organized into pages? When should it be updated, and who should validate it?

Once those questions are answered, the implementation can remain lightweight. You can use Codex and Vibe Coding to create a directory structure, page templates, update rules, and validation scripts based on the project's actual needs. For example, retain only four core components—sources/, wiki/, index.md, and AGENTS.md—then define how new material is archived, how pages are updated, how sources are attributed, how conflicts are handled, and how outdated content is checked periodically.

You can then design a retrieval workflow around the actual LLM Wiki structure. A common approach is to build a dedicated retrieval Agent that is tightly coupled to the current Wiki and invoke it as a sub-Agent within the main workflow.

Not pursuing a large, all-encompassing system from the outset is one of the most basic principles of Agent development today.

An entry point for enterprise Agent Wiki design

Where LLM Wiki Works Best

LLM Wiki is not a universal knowledge base solution. It is best suited to scenarios where the knowledge is high-value, stable over time, repeatedly reused, and continuously revised.

Typical examples include long-term research topics, course materials, core SOPs, critical policies, customer interviews, product decisions, project retrospectives, and expert experience. For Agents, LLM Wiki works best as the core knowledge layer that holds task workflows, domain rules, product boundaries, key decisions, and high-value expert knowledge.

Large volumes of rarely accessed material, historical logs, chat records, and archived documents remain better suited to conventional RAG or search systems as a fallback. They should not all be forced into an LLM Wiki.

Limitations of LLM Wiki

LLM Wiki still has clear boundaries.

1. The Scale Wall

It is better suited to a small collection of high-value material. Karpathy's example was roughly on the order of 100 articles. Managing tens of thousands of pages requires more than Markdown files, local folders, and a long-context Agent; it demands more sophisticated infrastructure.

If your product requires feeding a massive document collection into an LLM Wiki, I believe the product design itself is almost certainly flawed.

2. Low Compilation Throughput

Every additional source may trigger a complex chain of LLM-driven rewrites, creating significant write amplification. This makes the approach unsuitable for high-frequency, real-time ingestion. Core knowledge is better consolidated on a schedule: spend a large number of tokens at once in exchange for higher Agent retrieval precision. If you require frequent writes, this is almost certainly the wrong scenario for LLM Wiki.

3. Input Quality Determines Output Quality

If the source material is low-quality, outdated, or disorganized, the Wiki will inherit those problems. A small body of high-value knowledge should be curated for the Agent instead of consisting of raw human documentation. In practice, the volume is limited, so you can simply ask Codex to filter and organize it first.

4. AI Makes Mistakes

AI may misclassify information, connect concepts incorrectly, or omit important details. Human review therefore remains necessary, especially during the early stages of building the knowledge base. Wiki linting, consistency validation, citation checks, and read-only source material all help reduce these risks.

Because the compiled Wiki remains human-readable, people can participate as needed and make straightforward corrections.

A Comparison of Knowledge Base Technologies for Agents

Dimension Traditional Vector RAG Advanced Semantic RAG (Late/Parent-Child) Graph RAG Compiled LLM Wiki
Underlying mathematical or logical foundation Geometric distance in high-dimensional vector space Deep Attention integration and structural decoupling Graph topology and community clustering algorithms Linguistic navigation and structured Markdown
Support for cross-document semantics Very poor; documents remain isolated Poor; still constrained by individual document boundaries Excellent; explicit logical connections form a knowledge network Excellent; offline LLM deduplication, merging, and bidirectional linking
Elimination of pronoun and boundary noise Weak; semantic breaks are common Strong; achieved through context injection and parent-block restoration Moderate; depends on triple extraction accuracy Excellent; the LLM resolves and rewrites them during compilation
Knowledge base construction cost (write) Very low; only Embedding encoding is required Relatively low; lightweight LLM-assisted labeling or long-text encoding Very high; requires extensive offline refinement with LLM tokens High; incremental writes cause significant write amplification
Retrieval determinism (read) Weak; depends on Top-K match quality Moderate; detailed recall improves substantially Strong; causal paths can be traced Strong; logical routing follows indexes and page structure
Best-suited Agent scenarios Simple Q&A and inactive long-tail data Technically or legally structured manuals with clear, independent rules Relationship audits, multi-entity comparisons, and complex reasoning Core SOP preservation and high-value domain expert knowledge bases

Production Architecture: A Layered Hybrid Approach

A layered Hybrid architecture separates knowledge according to data value, scale, and query patterns. It does not expect one approach to simultaneously handle massive scale, write efficiency, fine-grained recall, and cross-document relationships.

  1. Core knowledge layer: LLM Wiki

Compile a manageable amount of high-value material—core business SOPs, critical policies, company-wide entity definitions, and expert knowledge—into a stateful Wiki. This becomes the Agent's core memory for decision-making.

  1. Relationship graph layer: Graph RAG

Organize documents involving multiple projects, financial reports, or strong causal relationships into a knowledge graph that supports complex, networked, cross-document reasoning.

  1. Large-scale storage layer: Advanced Vector RAG

Use techniques such as Late Chunking, parent-child documents, and Contextual Retrieval for hundreds of millions of long-tail historical logs, past conversations, and infrequently accessed sources. The Agent can initiate vector retrieval when it needs specific details.

These technologies solve problems at different levels: Vector RAG retrieves long-tail material, Graph RAG handles relationship paths, and LLM Wiki continuously compiles and reuses high-value knowledge.

A layered Hybrid knowledge base architecture for Agents

Final Thought 1: Why Is RAG Alone Unsuitable for Agents?

RAG alone is unsuitable as an Agent's core knowledge base because its strengths are fundamentally misaligned with what an Agent requires from knowledge.

LLMs already possess sufficiently broad and deep general knowledge. An Agent usually needs an external knowledge base in only two situations: when it needs deep domain knowledge, or when it needs internal knowledge absent from the model's training data. Both require highly precise knowledge.

This creates a contradiction: RAG is good at vector retrieval across massive knowledge collections, but an Agent actually needs precise, executable semantic context. Knowledge that is genuinely valuable to an Agent should usually be small and focused. If the volume is exceptionally large, it often means the collection contains a great deal of low-value material—or serves an unusually narrow use case.

The limit of RAG is therefore clear. It can find potentially relevant material, but it cannot reliably guarantee that the material is complete, accurate, conflict-free, and aligned with the boundaries of the current task. Similarity retrieval is not understanding. A more sensible division of responsibilities is: use RAG as a fallback for long-tail material, while LLM Wiki or explicit context management carries the core knowledge.

Final Thought 2: How Should You Choose the Right Technology?

Doesn't a Hybrid architecture sound like the ultimate knowledge base solution? It seems to include everything and cover every concern.

Hybrid architectures sound attractive and theoretically comprehensive. But as a business or technical leader, I still believe they are unsuitable for most scenarios—or simply too complex and redundant. Trying to accommodate every technology usually means you lack a clear direction for the design and development of your Agent.

As noted earlier, scenarios in which an Agent depends on large amounts of knowledge are already niche. Scenarios that simultaneously require three technologies in a Hybrid architecture are even more so. In most Agent architectures, the knowledge base should not carry too much weight. Start simple.

In most cases, I believe a straightforward LLM Wiki is enough. Alternatively, connect a basic RAG retrieval system—it is better than nothing.

Originality Statement

This is an original article. When reposting, please credit the author and include a link to the original article. The illustrations were generated with AI. Follow my official account for more articles.

AI Transparency Statement for Content Creation

AI transparency statement for content creation

Top comments (0)