I write a lot of READMEs. I ship faster than I document. I work with AI agents that write code in seconds and READMEs in minutes, and somewhere between the first commit and the third refactor, the README I wrote on Tuesday stops matching the code I wrote on Friday.
The install command says npm start. The package.json defines start:prod. Anyone copying that command would have failed instantly. I'd never know.
This weekend, for the Replit 10 Year Buildathon, I built README Clew — a tool that audits your own GitHub repo for drift between what your README claims and what your code actually does.
Findings only. No rewrites. No grading. Nothing saved.
README Deployed Try it →
Github Repo GitHub →
How it works
Paste a public GitHub repo URL. README Clew:
- Fetches your README, your package.json, and a slice of your file tree
- Uses Claude Sonnet 4.5 to extract every checkable claim from the README
- Runs five deterministic verifiers against the actual code
- Returns findings in four buckets
The five categories I check:
- Declared dependencies — does your README say "uses X" but X is not in package.json?
- Code-vs-package coverage — does your code import packages your README never mentions?
-
Install and run commands — does
npm startactually exist as a script? - Environment variables — does your code read env vars your README forgot to document?
- File and link references — do the files and URLs your README points to actually resolve?
The four buckets, in display order:
- Verified — README matches the code (the receipts)
- Unverifiable — claims like "blazingly fast" or "85 passing tests" that are valid but not statically checkable (the honest limit)
- Missing — code does things the README never mentions (the gaps)
- Contradicted — README and code disagree (the drift)
Features
(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jprc7a5ffhpjdr4h4w1f.png)
I shipped more than I planned to:
-
Deep-link URLs — every scan has a shareable
?repo=query param. Paste it anywhere, the scan runs on load. - OpenGraph cards — when you share a scan link on social, the preview card includes the bucket counts and the synthesis line. The card itself summarizes the audit.
-
SVG status badge —
/api/badge?owner=...&repo=...returns an SVG you can paste into any README. Like a CI status badge but for documentation honesty. - In-memory cache — the 50 most recent scans power fast OG cards and badges without re-scanning.
- Copy link — one-click copy of the current scan URL.
- Share anywhere — direct buttons for X, LinkedIn, Instagram, and Dev.to from the results page.
- JSON export — full scan result as JSON for automation, archiving, or piping into your own tooling.
- Scan this repo — pre-fills the URL bar with the repo you're viewing so you can re-scan instantly.
- Two-frontend architecture — Vite-built static frontend served separately from the Express API. Scales independently.
The badge is my favorite. Audit your own receipts, then embed your status in the receipt itself.
What it doesn't do (limitations are a feature)
- Doesn't rewrite your README. You see the drift, you fix it.
- Doesn't grade your writing. Subjective claims aren't bad — they're just unverifiable.
- Doesn't store anything. Stateless. Run it. Close the tab. The findings are yours.
- Doesn't analyze private repos in v1. Public only.
- JavaScript and TypeScript only in v1. Python and Go later.
I ran it on my own repos
The findings stung in good ways.
On Memoria Clew (4th of 250 in the AI Vibe Coding hackathon, February): twenty undocumented items in my package.json — dependencies my code uses, scripts I'd added, env vars I'd never written down. One actual contradiction in the README too.
On petit-mot: 16 of 21 claims verified. All file references resolve. 5 unverifiable because the project has no package.json — deployment commands fall outside what static analysis can check.
On README Clew itself: zero contradictions, but seventeen things I'd shipped without documenting. The tool found drift in its own creator's documentation. That's the point.
How to trust it
The tool is intentionally narrow. Five deterministic verifiers run on actual code. One Claude API call extracts claims from your README. The system prompt instructs Claude to use only verbatim quotes from the README — not paraphrase, not invent. That's a prompt-level guardrail, not a post-processing validation step. Honest distinction worth making.
Every finding shows you the README quote and the code reference. Don't trust the verdict, trust the receipts. You can always click through and verify yourself.
Known limitations I'm honest about:
- The verbatim-quote rule is enforced by prompt instruction, not by code that validates the quote against the README source. In practice Claude follows the rule; in principle this is a soft guardrail, not a hard one.
- Prose labels in dependency claims produce false positives in the contradicted bucket (Claude sometimes pulls "Frontend (Vite)" as a package name; v2 fix)
- Monorepo support is conventional-path-only —
packages/,apps/,server/,frontend/,client/,backend/,api/,web/. Custom paths won't be discovered. - Source file scan caps at 20 files for envvar checking
- Two Claude calls per scan (extraction plus a short synthesis pass), adding 10–20s. The synthesis is fail-open: if it errors or times out, you still get full findings.
The transparency is the trust.
Stack
- Node.js + Express 5, TypeScript, esbuild
- Vanilla HTML/CSS/JS frontend (no framework)
- Claude Sonnet 4.5 for claim extraction and synthesis
- GitHub REST API for repo data
- Hosted on Replit Deployments
- 50-entry in-memory scan cache (no database)
Hybrid architecture: deterministic where it matters, AI where it adds value. The AI's role is narrowly scoped to extraction and synthesis, never to the verification itself. Five verifiers, one orchestrator, one extraction call, one optional summarization call. That's the whole pipeline.
Security posture
Building anything that takes user input and calls external APIs needs to take the threat model seriously, even at hackathon scale.
-
Prompt injection protection on every LLM call: README content wrapped in
<readme>delimiters, system prompt explicitly instructs the model to treat input as data not commands. Tested withIGNORE ALL PREVIOUS INSTRUCTIONSinjection — extraction correctly ignored the injection and returned only the legitimateexpressclaim. -
Strict input validation at the server boundary. GitHub URL must match
github.com/owner/repoexactly. No SQL surface (no DB), no command injection surface (no shell), no path traversal. -
Rate limiting at 10 scans/hour per IP. Reads
X-Forwarded-Forcorrectly behind Replit's proxy. -
API keys server-side only via
process.env. Zero references toprocess.envin any frontend file. -
XSS-safe DOM rendering throughout. All user-supplied data inserted via
.textContent, never.innerHTML. Explicitly commented in the code. - Hard timeouts: 60 seconds on the full scan, 9 seconds on the optional synthesis call. README content truncated at 50KB before sending to Claude. File scan capped at 20 files.
- Logging captures no PII or repo content — request ID, method, URL path (without query params), status code, response time. Nothing else.
- Helmet middleware for security headers (CSP, HSTS, X-Content-Type-Options, X-Frame-Options).
- Body size limit of 4kb on POST requests.
- Fail-open architecture: if the optional synthesis call fails, the scan still completes and returns findings. The user gets data even when non-essential calls degrade.
Not production-hardened (no auth, no audit log, intentional). Solid for a public stateless tool.
Try it
🔗 readme-clew--earlgreyhot.replit.app
📦 github.com/earlgreyhot1701D/readme-clew
🎂 Built for the Replit 10 Year Buildathon
Run it on one of your own repos. Tell me what it finds.
Apache 2.0 licensed.
AI assisted. Human approved. Powered by NLP.
Top comments (0)