I run an open-source MCP server. It exposes browser-automation tools to AI coding agents — click this, read that page, fill this form. The README has a big proud table: "80 tools." It's in the tagline. It's in the nav anchor. It's in the alt text of the social-preview image. It's in two comparison tables. It's even in the pre-written tweet text for the share button.
Eighty. Eighty everywhere.
Last week I ran a boring audit — just diffing what the README claims against what the code actually registers at startup. The code registered 96.
Not 80. Ninety-six. The number had been wrong in nine different places, and I'd been the one typing it each time.
How a number rots
Nobody decides to lie in their README. Drift happens one merge at a time.
You ship a feature. It adds three tools. You update the tool list (because that's the part reviewers look at) but you forget the count in the tagline. Next release adds four more. Now the list says one thing and the tagline says another, and both are behind reality. A month of "small, fast" releases later, the gap is sixteen.
The worst part wasn't even the tagline. When I actually counted the entries in the tool list itself, it documented 83 tools. So I had three different truths living in one file:
- The marketing number: 80
- The documented list: 83
- The code: 96
Thirteen tools existed, worked, shipped to npm, and ran on real users' machines — completely undocumented. They weren't in the README at all. If you only read my docs, you didn't know they existed.
The tools you forget are the ones that matter
Here's the part that turned a docs cleanup into a genuinely uncomfortable afternoon.
I expected the undocumented thirteen to be boring helpers — the safari_wait_for_new_tabs of the world. Some were. But four of them were the native input tools: synthetic keyboard and mouse events driven through the OS-level event API (CGEvent), not through JavaScript injected into the page.
Those are the most powerful tools in the whole project. JavaScript-level automation is sandboxed by the page; OS-level input is not. It types into anything that has focus. It's the difference between "fill this <input>" and "press these physical-looking keys at the system level." It's exactly the category a security-conscious user would want to read about before granting Accessibility permissions.
And it was the category I'd never written down.
That's the real lesson, and it's not "keep your docs updated." It's this: documentation drift is selection-biased toward the things you'd least want undocumented. The tools you forget to document are the ones added in a hurry, in a feature branch, under a deadline — which correlates almost perfectly with the tools that are powerful, sharp, or security-relevant. The mundane stuff gets documented because it's easy. The dangerous stuff gets a TODO.
If you want to find the riskiest, least-reviewed surface of any project, don't read the docs. Read the gap between the docs and the code.
"Just update the README" is the wrong fix
The tempting fix is to fix the number. Change 80 → 96, add the thirteen missing entries, commit, done. I did that part. But it's treating the symptom.
The number was wrong because it was hand-maintained. Any fact a human types by hand will eventually disagree with reality. The only durable fix is to make the fact impossible to get wrong — derive it from the source of truth instead of restating it.
The irony: my project already had this solved, in one place. The smoke test — contributed by someone else, not me — boots the server over a real stdio transport, asks it how many tools it registered, and asserts against a count derived from the source. No hardcoded number. When you add a tool, the test's expectation moves with it automatically. That test could never have drifted to 80, because it never stored an 80 to drift from.
So the fix isn't "be more disciplined about the README." Discipline is what failed for a month straight. The fix is:
- Counts → generate them. A tiny script that reads the registrations and writes the number into the README at build time. The human never types it.
- Tool lists → generate them too, ideally from the same registry the server reads at startup. The schema is already structured data. Rendering it as a Markdown table is a formatting problem, not a writing problem.
- The prose around them → that's the part humans should actually spend their attention on, because it's the part a generator can't write.
Every fact in your docs is either derivable or it isn't. Derivable facts should never be typed by hand. They are drift waiting to happen.
The check that takes thirty seconds
If you maintain anything with a "we have N features / N tools / N integrations" claim, here's the audit. It took me one command:
- Count the real thing in code (registrations, exported functions, route handlers — whatever your N actually counts).
- Count what your README claims.
- If they disagree, you don't just have a stale number. You have a list of things that exist and that nobody chose to write down. Go read that list. It's the most interesting list in your repo.
I found mine had grown to a sixteen-tool gap, with the project's sharpest tools sitting inside it. Yours might be smaller. But "we never check" and "the number is correct" are not the same state, and the only way to tell them apart is to look.
This is from maintaining Safari MCP, an open-source Safari automation server for AI agents on macOS (the one that now correctly says 96). I write up the unglamorous parts of running a small OSS project at achiya-automation.com.
What's the widest doc-vs-code gap you've ever found in your own project — and was the stuff in the gap boring, or was it the scary stuff?
Top comments (0)