Phil Rentier Digital

Posted on Apr 16 • Originally published at rentierdigital.xyz

The One Line in Karpathy's Wiki Gist That 99% of Builders Missed — And Why It's the Whole Point.

#ai #programming #claude #aitools

We all got the point of a second brain a long time ago. Condense your courses, your books, your PDFs, your notes into one place where you can actually find them again. The concept has been around for ten years, it is digested, plenty of people tried. The problem was never the idea. The problem is maintenance. You feed your system for three months, you end up with a hundred and fifty files, you get lost in them, you spend more time reorganizing than adding. Six months in, the brain is sitting in some corner of your disk and you never touch it again.

Karpathy posted a gist two weeks ago that solves exactly that. He adds an auto-organization layer on top: the system files its own pages, merges redundancies, keeps itself coherent. Everyone started building it this week. Three folders, a CLAUDE.md, Obsidian on top. Tutorials everywhere.

Except one thing escaped 99% of builders. And it is a shame, because it is the whole point. Without it, you have a folder that tidies itself. With it, you have a brain that learns from every question you ask it, reads what your tools write to it, and eventually starts building the tools it needs.

TLDR: The architecture is the visible half. One sentence buried in the gist activates a feedback loop that makes the base grow denser every time you use it. Then there is a third channel nobody formalized: your infrastructure feeding the base directly, and the base surfacing which new tools to build, or even building them itself. I activated both on my repo last week. Here is what actually changed.

The Knowledge Base I Already Had (And Never Really Used to Its Full Potential)

I did not start from scratch six months ago. Like most devs who have been doing this for a while, I already had repos scattered on my disk where I had organized knowledge, skills, processes, tools, docs. One folder per domain. Markdown files carefully structured. SEO notes here, code review patterns there, snippets I kept reusing, architecture decisions I did not want to forget. Docker compose recipes I had ended up rewriting at least three times because I could never find the previous version. Infra diagrams. Deploy checklists. Incident post-mortems I had written down for myself and never reread. I was using these repos every day, loading them into Claude Code as context when I needed them, asking questions against them, copy-pasting rules into new projects.

It worked. It was useful. And if you are reading this, odds are you have the same thing somewhere. The repo of stuff you ingested, cleaned up, committed. The one you feel good about on Sunday evening after you add a new file.

The big shift Claude brought, compared to the previous ten years of doing this, is that I stopped asking myself "where did I put that damn thing again." For a decade, the bottleneck of any personal knowledge base was the same: you had the information somewhere, you remembered vaguely writing it down, but finding it meant grep, Spotlight, opening three folders, rereading half a file to check if it was the right one. Now I just ask Claude. The repo is in context, the question gets answered, done. That alone was a huge unlock. It made the repo actually usable for the first time.

But I was still under-exploiting it. Badly. My repo was a very well-organized library that I had to walk through myself every time I wanted to pull a book off the shelf (except now Claude was walking it for me instead of me). Better, faster, but still one-way. I asked, it answered, the conversation ended. The repo learned nothing from any of it. Tomorrow I would ask a related question, Claude would walk the same shelf again, give me a slightly different answer, and that second answer would evaporate too.

I think this is why Karpathy's gist resonated so hard when he dropped it. It was not the architecture. Plenty of us already had something similar. The gist gave a name to a vague feeling most of us had been sitting with for months: this thing I built is useful, but it is clearly not doing what it could be doing. The missing piece was the auto-organization layer. A second brain that files its own pages, merges its own redundancies, maintains its own coherence while you sleep. That is what Karpathy put in front of us.

And that is what everyone started building this week.

Karpathy Posted the Architecture. Everyone Copied the Wrong Half.

The gist is called llm-wiki. Two folders where it matters. raw/ for the source material, filtered and structured but complete. A full SEO course distilled into 900 lines. A coding book condensed into 600. An ops playbook drilled down from three conference talks into 700 lines of what actually applies to your stack. Nobody reads this in production. It is the archive, the place you go back to when the wiki says something that feels off and you want to check the source.

wiki/ is the operational version. One file per domain. Fuses every raw in that domain into actionable rules. 150 to 200 lines max. This is what the agents load before producing anything. A fraction of the raw size, and it maintains itself. Add a new course on the same domain, the wiki absorbs the new rules without growing. Contradictions get resolved, obsolete patterns get dropped.

On top of that, a CLAUDE.md at the root telling the model how to navigate the whole thing, and Obsidian as a frontend so you can browse it like a normal human.

The architecture is clean. It is also the obvious part. Of course you separate raw from synthesized. Of course you give the model navigation rules. Of course Obsidian is a good viewer.

Then the tutorials hit. Every tech YouTuber with a ring light reposted variations of the same diagram this week. Three folders. CLAUDE.md. Obsidian. Build it like Karpathy. Ship a screenshot. Move on.

I was part of that wave for two days. Rebuilt the structure on top of my existing repo. Ingested more sources. Asked questions. The setup was better than what I had, faster, denser. But the base still sat there, growing only when I manually fed it new things. The auto-organization layer was doing its job on what I put in. It was not doing anything with what I did with the base afterward.

That is when I went back to the gist and read it slower. Not the architecture section. The query section. The part that says what happens after the model answers.

The One Sentence That Turns a Static Archive Into a Living Base

Static Archive vs Base with Feedback

The sentence, from Karpathy's own post about the workflow:

"Often, I end up filing the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always add up in the knowledge base."

Read it twice. It describes a behavior, not an architecture.

What it says, plainly: when you ask a question and the model gives you a useful answer, that answer goes back into the wiki as a new page. The next query, on the same topic or adjacent, starts from a base that already contains the previous answer. The base grows from your usage, not just from your ingestion.

Without this loop, your repo is a RAG with prettier folders. You ingest, you query, the answer flashes on your screen and dies in the chat history. Tomorrow you ask a similar question, the model retrieves from the same source pages, synthesizes the same answer from scratch. You pay for the synthesis every single time (my favorite form of recurring waste, honestly).

With the loop, the base is stateful. The model's job shifts from "synthesize from raw sources" to "find the page where this is already answered, refine it if needed." Faster, cheaper, denser over time.

One paragraph in the gist. The whole reason to build this.

What a Dead Container Taught My Knowledge Base

My repo does not only hold courses and books. It also holds the living documentation of my own services: configs, deploy notes, past incidents, architecture decisions I took at 2am and wrote down before I forgot why. Same pattern as the rest, raw/ with the full history, wiki/ with the operational rules. This is where things started getting interesting.

My distributor catalog sync stopped. The container was up, the process was alive, but it had not pulled a new feed in 34 hours. I noticed because the partner-side product count drifted from what was on my storefront. Customers started ordering things that were no longer in stock upstream.

I opened Claude Code and asked: "what is the state of the distributor sync, when did it last run successfully, and what is the most likely cause of the silence?" The model went through the wiki, pulled the relevant service page, checked the recent log entries I had ingested, and answered: probably a memory leak in the parser, the container is consuming RAM but not crashing because the OOM killer threshold is set too high. Recommended a restart and a memory cap.

Classic Claude Code answer. Useful. Specific. Would have died in chat history.

Except the loop was activated. The answer got filed back into the wiki as a new page under services/distributor-sync/incidents/2026-03-29-silent-failure.md. The page had the symptom, the diagnosis, the resolution, and a flag noting that this service had now failed silently once. Total cost: one query, one filed page.

A week later, I asked an unrelated question about the partner API webhook. The model answered, then added the polite version of "by the way, you might want to look at this": "note that your distributor sync had a silent failure 6 days ago, you currently have no monitoring on its heartbeat, you might want to add one before this happens again." It surfaced that on its own because the wiki had the incident page, and the model had read it while looking for context on adjacent services.

A week earlier, that information would have been gone. The chat session where I diagnosed it would have been closed. The next time the sync died silently, I would have rediscovered the same root cause from scratch.

The wiki did not just remember. It connected.

The Third Channel: When Your Tools Feed the Base, And the Base Builds Its Own Tools

Here is the thing that bugged me about the incident above. I learned the container had failed silently by accident, because customers started complaining. Claude filed the page only after I asked. If I had not asked, the incident would never have existed in the base.

What if the container itself filed the page?

Karpathy's loop is human-driven. You ask, the model answers, the answer gets filed. Two channels feed the base: documents you ingest manually, and queries you run. There is a third channel. It is not in the gist.

Your infrastructure already produces signal continuously. Cron jobs succeed or fail. Containers restart. Services time out. Webhook callbacks return non-200 codes. Most of this signal goes to logs nobody reads, or to alerting systems that fire once and forget. None of it ends up anywhere the model can use.

What I built on top of Karpathy's pattern is a thin layer that lets the infra itself file pages. A CLI any service can call to append an observation directly into the base. The catalog sync writes a page when it succeeds, with the row count. The webhook handler writes a page when it sees a malformed payload. A cron writes a page when it skips because the previous run was still going. Short pages. Timestamped. They land in a signals/ folder the model knows about.

The reason this works is the same reason CLIs make better signal channels than MCP wrappers for any agent task: a curl one-liner from a Bash script writes to the wiki, no protocol negotiation, no schema dance. The container does not need to know what an LLM is. It just appends a markdown file to a folder.

And then the second-order thing started happening.

Once enough signal accumulated, the model began reading the signals/ folder during queries and pointing out gaps. Not "your container failed" (that part was expected). Things like: "you have three services writing success pages but no failure pages for the webhook handler, which means I cannot tell whether it is working or just silent. You might want to add a failure emitter in that handler." Or: "your cron job for distributor sync writes when it runs, but nothing writes when it skips a cycle. You need a skip emitter."

Then it stopped asking. It started building.

A concrete example from last week, and not even an infra one. I was wondering whether to buy 32 or 48 GB of RAM on my next MacBook. Classic question, classic answer from the guy at the Apple store: "you will be fine with 24, trust me." I did not trust him. I asked Claude Code instead, with my repo in context: "how do I know what I actually need?" The model did not give me a ballpark. It proposed building a monitoring CLI (one script to sample RAM metrics every 5 minutes into a CSV, a second script to compute the summary and recommend a size based on the observed peak plus a 30% margin). Wrote both scripts. Ran them. Three days of data later, the verdict was in my wiki: RAM used was pinned at 22 to 23 GB on a 24 GB machine, 77 MB free at the low point, compressor working overtime at 4.5 GB. Recommendation: 32 minimum, 48 if I wanted peace of mind.

Not a guess. Not marketing. Actual numbers from my actual usage, collected by tools the base built for itself because it knew it did not have the answer yet.

The base was no longer just learning. It was closing its own blind spots.

You ingest. You query. Your tools write. And then the base builds the next tool on its own.

Three Traps Before You Activate the Loop

Three traps I walked into in the first two weeks. Make these calls before you flip the switch, or your base turns into a dumpster fast.

What gets filed back. I started by filing every response. After four days the base had forty-seven variations of "yes that docker command is correct" and I could not find anything useful. The rule now: file back only if the answer reveals something the base did not already know, documents an incident, or makes a decision explicit. Conversational scaffolding dies in chat history where it belongs.

Who decides quality. Self-judging models are too generous with themselves, discovered that one quickly when Claude filed a page declaring a deprecated API endpoint as "current best practice." Full human review does not scale past a hundred pages. I landed on a middle ground: the model files into pending/, a daily cron promotes anything I did not delete within 24 hours. Silence means approval. Laziness is the gate.

How you stop errors from compounding. This one I learned from the gist comments, not from Karpathy. If the model files a wrong answer, that wrong answer becomes a "fact" in the base. Next query reads it, treats it as ground truth, produces a second wrong answer depending on the first. Three weeks in, your base is gaslighting you. The fix is the same kind of contract I described in my prompt contracts framework: every filed page declares its sources, a confidence level, a re-validation date. No sources, fast expiry. High confidence with verified sources, long life. The base self-prunes.

Nail these three. The rest runs on its own.

30 Days Later: Two Bases, Two Different Systems

In 30 days, plenty of devs will have built exactly the same setup. Three folders, a CLAUDE.md, Obsidian on top. Identical down to the folder names.

Half of them will have a dead archive that needs to be hand-fed to stay relevant (which honestly will be forgotten within a month, let's be real). The other half will have a base that learns from every question, reads what the infra writes to it, and builds the tools it needs when it notices a gap. Same architecture. One loop and one channel of difference.

That is what a real personal knowledge base looks like. Not a folder. A system that gets denser every time you use it, and sharper every time it hits something it does not know.

Karpathy wrote the line. Nobody read it.

Sources

Andrej Karpathy's llm-wiki gist on GitHub

(*) The cover image was generated by an AI which, to be fair, has been filing its own pages since before we made it a hobby.

DEV Community