Bringing the LLM Wiki Idea to a Codebase

#ai #llm #wiki #rag

Andrej Karpathy’s LLM Wiki idea is compelling because it treats knowledge as something that can be gradually ingested, organized, queried, and improved over time. That same idea applies surprisingly well to a codebase.

A repository is not just a pile of files. It is a living body of knowledge: architecture, concepts, flows, conventions, tradeoffs, and accumulated decisions. The challenge is that this knowledge is often scattered across source files, configs, docs, and human memory. A wiki built for an LLM can help turn that into something navigable.

For code projects, the core ideas of ingest, query, and lint still hold:

Ingest builds and updates the project wiki from the source
Query uses the wiki plus source verification to answer questions about the codebase
Lint checks the wiki itself for drift, contradictions, missing coverage, and weak links

That already makes the LLM Wiki concept useful for software projects. But codebases have one advantage that many other knowledge collections do not: git.

Why codebases are a particularly good fit

Most project knowledge systems either rescan everything every time or rely on manual upkeep. A codebase, however, already has built-in version control. That gives us a natural way to make wiki maintenance incremental.

The initial ingest can be based on the repository at HEAD. After that, the wiki records the last ingested commit. Future ingests do not need to rescan the whole repo. They only need to look at what changed between that saved commit and the new HEAD.

That means git gives us, almost for free:

changed-file detection
rename and deletion tracking
a natural checkpoint for incremental updates
a practical way to mark stale wiki pages
a much cheaper ingest loop over time

So the wiki stays grounded in the current codebase, while git provides the mechanism for maintaining it efficiently.

The model

The model is simple:

Ingest the project from HEAD
Save the current commit SHA in the wiki index
On the next run, diff from last_commit to HEAD
Update only the affected pages
Advance the checkpoint only when the changed set has been fully processed

This keeps HEAD as the source of truth while using git history as the maintenance engine.

That distinction matters. The wiki is not meant to become a commit log or historical archive. Its main job is to explain the project as it exists now. Git just makes it possible to keep that explanation fresh without starting from scratch every time. The uploaded skill draft reflects exactly that approach: source of truth is git-tracked files at HEAD, while incremental ingest is driven by the saved last_commit.

What the wiki should do

A code-project wiki is most useful when it acts as a structured knowledge layer over the repository.

It should capture things like:

major features and modules
important concepts and abstractions
entities such as schemas, models, and types
request or execution flows
notable fixes and architecture shifts
open gaps and stale areas worth revisiting

From there, the three core workflows, ingest, query and lint become very natural.

That is where the LLM Wiki idea becomes more than passive documentation. It becomes an active system for maintaining project understanding.

The agent skill

I packaged this idea as an agent skill:

npx skills add yysun/awesome-agent-world --skill git-wiki

The goal is to make the pattern reusable: ingest the codebase once, keep a wiki under .wiki, and then let the agent maintain it incrementally as the project evolves.

Closing thought

What makes the LLM Wiki idea powerful is not just that it produces summaries. It creates a feedback loop between source material, structured knowledge, and future questions.

For a codebase, that loop is even stronger because git gives us a built-in notion of change. Instead of rebuilding understanding from scratch, we can carry it forward commit by commit.

That is the real opportunity here: treat the repository not just as code, but as a knowledge system with memory, structure, and maintenance built in.