Six months ago, I started building TaterTOS64, an x86_64 kernel. As any systems dev knows, once you hit the 10,000-line mark across a mix of C, Assembly, and Linker scripts, your brain starts to leak. I needed a way to document the architectural "why"—how the interrupt vectors hand off to the scheduler, how the paging logic relates to the physical memory map.
Naturally, I tried the modern approach: I fed the code to LLMs.
The Result was a Disaster.
Generic "AI Doc" tools failed me in three specific ways:
-
The Context Amnesia: They'd understand a single
.cfile but completely hallucinate the#includechain. They had no idea where thepaging.hconstants were actually defined in my repo structure. - The Hallucination Loop: They would confidently explain my scheduler's "logical flow" while citing methods that didn't exist, or worse, misinterpreting raw Assembly entry points as high-level C signatures.
- The SaaS Tax: I'm building a local kernel. I don't want to pay $20/mo to a cloud service to "rent" access to my own local documentation pipeline.
Building the Solution: TaterBookBuilder
I decided to stop building the kernel for two weeks and build the documentation compiler I actually wanted. I call it TaterBookBuilder.
Instead of a simple "text-to-prompt" wrapper, I built a deterministic analysis engine first.
How it actually works:
-
Physical Inclusion Graphing: Before the LLM ever sees a prompt, the engine walks the repo and maps every
#include(C) and%include(Assembly) to its canonical repository node. No more guessing where types come from. - AST-Aware Ingestion: Using Roslyn and custom regex parsers, it builds a logical hierarchy of your system. It identifies "Kernel Boundaries" vs "User Space" based on the directory topology and hot-path signals (like syscall entry points).
-
The "Evidence Map" (The Game Changer): I was tired of second-guessing the LLM. I implemented an Evidence Map system. Every claim the book makes is backed by a deterministic ID that points to a specific file and line range in the repo. If the book says "The scheduler uses a Round-Robin approach," there is a footnote pointing exactly to
src/kernel/sched.c:L45-L120.
The Philosophy: Local-First and Perpetual
Documentation is a permanent asset. It shouldn't depend on a cloud subscription.
I'm shipping TaterBookBuilder as a 77MB Linux AppImage. It's completely turnkey—I even bundled a static binary of Pandoc inside it so you don't have to install a single dependency.
And for the pricing? I'm using the JetBrains Model. You buy it once, you own that version forever. You get a year of maintenance, and if you don't want to renew, your documentation pipeline keeps working exactly as it did on day one.
Documentation should be as rock-solid and local as the code it describes.
Check out the workbench and download the trial here:
https://taterlabs.shop/taterbook.html
I'd love to hear from other systems devs—how are you handling the "trust gap" with AI-generated architecture maps?
Top comments (0)