Ustaad: Building a Wiki That Thinks

#llm #wiki #productivity #ai

I have a problem. I hoard information. Browser tabs, Notion pages, PDFs, chat transcripts with AI assistants. They pile up and slowly rot. Two months later I go looking for that one brilliant insight from a conversation with Grok about distributed systems, and I can’t find it. The knowledge existed. I just couldn’t access it.

Traditional note-taking apps solve the storage problem. They don’t solve the synthesis problem.

Then I came across Andrej Karpathy’s gist about using an LLM to maintain a living wiki. The idea stopped me mid-scroll. Instead of using an LLM to answer questions from a pile of documents, you use it to build and maintain a structured, interlinked knowledge base. The LLM handles all the tedious bookkeeping, cross-referencing, deduplication, consistency. You get a wiki that compounds over time. Every new source makes it smarter.

I built that. I call it Ustaad (ਉਸਤਾਦ).

Why Not RAG?

The default answer when people say “I want to query my documents with an LLM” is RAG(Retrieval-Augmented Generation). Chunk your documents, embed them, stuff the relevant chunks into the context window at query time.

RAG works. But it’s fundamentally stateless. Every single query rediscovers the same knowledge from scratch. There’s no memory, no synthesis across sources, no understanding that “entity X from document A is the same as entity X in document B.”

The wiki approach is different. You pay the synthesis cost once, at ingestion time. The LLM reads the new document, looks at the existing wiki, and decides: what pages need to be created? What existing pages need updating? What cross-references are missing? The result gets persisted. The next ingestion starts from a richer base.

How It Works

The system has four core modes: Ingest, Query, Lint, and Watch.

Ingest

You drop in a source like a Markdown, PDF, a raw dump of a Google Chat thread, anything. The backend (Spring Boot + Spring AI) reads the entire current wiki and passes it to the LLM along with the new source.

The prompt turns the LLM into a strict wiki editor. It returns structured JSON: a list of file operations (create this page, update that page, add these backlinks). The backend safely applies the changes.

The wiki follows Obsidian-flavored Markdown with mandatory YAML frontmatter. Everything lives in typed folders: topics/, entities/, adrs/, code-snippets/, etc. There’s a master index.md catalog and an append-only log.md that records every ingestion.

The [[backlinks]] are what make it feel alive. After a few ingestions, pages start referencing each other in ways I never would have manually connected.

Query

Ask a question in plain English. The backend loads the full wiki as context and streams the answer back via Server-Sent Events.

This is where the upfront synthesis cost pays off. The LLM isn’t hunting through raw chunks, it’s reasoning over a pre-organized knowledge graph. The answers feel coherent because the underlying structure is coherent.

Lint

Wikis drift. An entity page goes stale. Two topic pages contradict each other. A code snippet is missing a language tag.

Lint runs nightly (and on demand). It finds orphan pages, flags contradictions with ⚠️ [NEEDS_REVIEW], and ensures every code block is properly tagged. It’s like a CI pipeline for knowledge.

Watch

Register any local folder and Ustaad polls it every 60 seconds. New or modified files get auto-ingested. Drop a PDF into a watched folder and it quietly becomes part of the wiki.

The Stack

Backend is Spring Boot 3 with Spring AI for clean LLM abstraction. I defaulted to Google Gemini because of its large context window (ingesting a source + the entire wiki can get heavy). Ollama works great for fully local runs. OpenAI and Claude support are planned next.

Streaming responses use Server-Sent Events, the same pattern I wrote about earlier when building the hospital queue dashboard.

The wiki itself is just plain Markdown files on disk. No database, no vector store. Just a directory that Git can version-control.

What Actually Surprised Me

The cross-referencing quality blew me away. After ingesting a handful of architecture docs and chat logs, the LLM started creating backlinks I wouldn’t have thought to add myself. It noticed the same database entity mentioned in an ADR and a performance log and quietly linked them.

The lint step catches more than just style issues. It surfaces real inconsistencies between early and later ingestions.

What I underestimated: context length. As the wiki grows, passing the entire thing to the LLM on every ingestion becomes expensive and slow. I’m now thinking about smarter relevance-based chunking. The hard part is deciding what’s “related” without first reading everything.