Every AI coding agent reads an instruction file. CLAUDE.md, AGENTS.md, .cursorrules, whatever your agent uses. You write rules in it. The agent says "Done." And you have no idea whether it followed any of them.
We wanted to know what's actually inside these files. Not what people think they contain, but what a machine can extract and verify through static analysis. So we scraped instruction files from 568 public GitHub repos with 10+ stars, ran them through a parser backed by 102 matchers across 8 verifier engines (AST, filesystem, regex, tree-sitter, preference, tooling, config-file, git-history), and counted what came out.
The short version: across the entire corpus, 3.8% of lines were extracted as verifiable coding rules. The other 96% is markdown headers, code examples, project descriptions, build commands, agent behavior directives, and contextual prose.
The dataset
580 instruction files from 568 repos, including Sentry (43k stars), PingCAP/TiDB (40k), Lerna (36k), Dragonfly (30k), Kubernetes/kops (17k), javascript-obfuscator (16k), RabbitMQ (14k), Google APIs (14k), Redpanda (12k), and hundreds of others. Six file formats represented: AGENTS.md (149 files), CLAUDE.md (111), .cursorrules (102), .windsurfrules (95), GEMINI.md (89), and copilot-instructions.md (34). This sample skews toward larger public repos. Enterprise internal repos with stricter governance, or solo projects with tightly scoped instruction files, may look different. We'd like to see that data.
The parser reads each file and classifies every line: is this a rule that can be checked against code, or is it something else? "Something else" includes headers, blank lines, code blocks, explanatory prose, build instructions, and agent personality configuration.
What instruction files actually contain
The 96% that isn't rules breaks down into several categories. Some of it is necessary context (project structure explanations, build command documentation). Some of it is agent behavior configuration ("be succinct," "avoid providing explanations"). Some of it is just markdown formatting overhead.
Here's what stood out: 430 of the 580 files (74%) had zero extractable rules. Of those, 67 were completely empty to the parser: zero extracted, zero unparseable. Many were single-line redirects. Dragonfly's .cursorrules (30k stars) says "READ AGENTS.md." Umi's .cursorrules (16k stars) contains the single word "RULE.md." Mautic's GEMINI.md says "Read and follow all instructions in ./AGENTS.md."
At the other end, a few files were dense with rules. Apache Skywalking-java's CLAUDE.md extracted 6 rules from 26 lines (23%). Cloudflare chanfana's AGENTS.md: 5 rules from 21 lines (24%). But those files tend to be short, focused lists of concrete instructions.
The heavy files tell a different story. javascript-obfuscator's CLAUDE.md (16k stars): 197 lines, zero rules extracted. These files are documentation with no machine-verifiable instructions embedded.
Only 2 files (0.3%) had parse rates at or above 80%. Nearly three quarters had zero.Parse rate distribution across all 580 files
Parse Rate
Files
Percentage
0% (no rules)
430
74.1%
1-9%
70
12.1%
10-19%
54
9.3%
20-29%
13
2.2%
30-49%
11
1.9%
>= 80%
2
0.3%
Types of content the parser correctly skips
"3.8% extraction rate" sounds like the parser is broken. It isn't. These are lines that genuinely aren't rules:
Markdown structure (headers, horizontal rules, blank lines). Code examples showing how to use a function or run a command. Project descriptions explaining what the repo does. Build and deployment instructions. Links to external documentation. Agent behavior directives that have no code-level representation ("be concise," "ask before making changes"). Workflow instructions ("use this branch strategy," "run tests before pushing").
The parser isn't failing on these. It's correctly identifying them as not-rules. The denominator is every line in the file, not every line that looks like it could be a rule.
A second metric tells the complementary story. 150 of 580 files (25.9%) contained at least one extractable rule. Across those 150 files, 309 rules is an average of 2.1 rules per file. So only a quarter of instruction files contain anything enforceable at all, and when they do, they typically contain two rules. The 3.8% describes the corpus-wide line ratio. The 25.9% and 2.1-per-file numbers describe what rule-writers are actually producing.
What a "verifiable rule" looks like
The 309 rules that did get extracted map to concrete checks. Things like:
- "Use camelCase for function names" (AST naming check)
- "No
anytypes" (TypeScript type safety check) - "Use named exports, not default exports" (import pattern check)
- "Prefer
constoverlet" (preference ratio check) - "Test files must exist for every source file" (filesystem check)
- "Use Yarn, not npm" (tooling check)
Each rule gets a category, a verifier type (AST, filesystem, regex, tree-sitter, preference, tooling, config-file, or git-history), and a qualifier (always, prefer, when-possible, avoid-unless, try-to, never).
Naming rules dominate: 55% of all extracted rules. That's likely a combination of two factors. Naming conventions ("use camelCase," "kebab-case filenames") are the most concrete, unambiguous instructions people write, so they appear frequently. They're also the rule class that static analysis matchers handle most cleanly, so the parser has high affinity for them. We can't fully separate how much of the 55% is user behavior vs. parser strength, but both contribute.Rule extraction by category
Category
Rules Extracted
naming
169
structure
44
code-style
29
forbidden-pattern
24
type-safety
20
dependency
12
error-handling
5
import-pattern
4
test-requirement
2
copilot-instructions.md had the highest extraction rate (5.9%), likely because those files tend to be shorter and more prescriptive. GEMINI.md files had the lowest (1.4%).Rule extraction by instruction file type
Type
Files
Files with Rules
Rules
Total Lines
Rate
copilot-instructions.md
34
13
33
556
5.9%
.cursorrules
102
37
79
1,508
5.2%
AGENTS.md
149
49
97
1,961
4.9%
.windsurfrules
95
22
50
1,866
2.7%
CLAUDE.md
111
20
38
1,501
2.5%
GEMINI.md
89
9
12
830
1.4%
E2E verification: does excalidraw follow its own instruction files?
This is a pipeline demonstration on one repo, not broad validation across ecosystems. We ran the full pipeline on excalidraw (~95k stars) because it's large, well-maintained, and has instruction files with extractable rules: both a CLAUDE.md and a copilot-instructions.md.
The parser found 9 verifiable rules across both files. Deterministic analysis scored 66.1% compliance. Semantic analysis (structural fingerprinting of 626 source files) produced 9 verdicts, all resolved via fast-path vector similarity. Zero LLM calls, zero cost:
| Rule | Compliance | Method |
|---|---|---|
| Prefer functional components | 0.976 | structural-fast-path |
| PascalCase type naming | 0.976 | structural-fast-path |
| Async try/catch usage | 0.983 | structural-fast-path |
| Contextual error logging | 0.979 | structural-fast-path |
| Yarn as package manager | 0.50 | no matching topic |
| TypeScript required | 0.50 | no matching topic |
| Optional chaining preference | 0.50 | no matching topic |
| camelCase variables | 0.50 | no matching topic |
| UPPER_CASE constants | 0.50 | no matching topic |
Rules that match established code pattern topics (component-structure, error-handling) score 0.97+, meaning the codebase's structural fingerprint strongly matches the instruction. The remaining five rules scored a neutral 0.50 because they describe tooling choices and naming conventions that don't have structural AST representations. That's itself a finding: even among the 4% of lines that get extracted as verifiable rules, some fall into categories that resist automated verification beyond simple presence checks. The verifier is real, but not comprehensive. No static analysis tool covers every rule class, and pretending otherwise would be dishonest.
Privacy note: 626 files scanned, all file IDs are opaque sequential integers. No source code strings, file paths, variable names, or comments appear in any payload. In this case, no LLM was even called.
What this means for anyone writing instruction files
Two clarifications before the takeaways. First, "96% can't be verified" means can't be verified through static analysis, not "is useless." Agent behavior configuration, project context, and workflow documentation all have value. They guide the agent even if no tool can confirm compliance after the fact. Second, the 4% that is verifiable still matters. Excalidraw's 9 extractable rules produced a 66.1% deterministic compliance score with specific failures at specific line numbers. Nine rules doesn't sound like much until three of them fail and you find the agent ignored your naming conventions across 626 files.
The real problem isn't that instruction files contain documentation. It's that most people don't know which of their lines are enforceable and which are suggestions the agent can silently drop. That ratio isn't fixed, either. People write unverifiable instructions because nobody's told them which phrasings produce checkable rules.
To write rules that can actually be checked:
Use imperative verbs with specific targets. "Use camelCase for all function names" is verifiable. "Follow good naming conventions" isn't.
Specify the tool or pattern, not the principle. "Prefer const over let" is a ratio check. "Write immutable code" is philosophy.
Include the file patterns your rules apply to. "All .ts files must use named exports" scopes the check. "Use named exports" is vague.
Keep rules and documentation separate. Rules are instructions. Documentation explains why. Mixing them dilutes both.
RuleProbe on GitHub: parse your own instruction files and see what's actually verifiable
The tool
RuleProbe is the parser and verifier behind this analysis. It reads 7 instruction file formats, extracts machine-verifiable rules using 102 built-in matchers across 14 categories, and checks agent output against each one. Deterministic by default, no API keys needed for the core pipeline. Optional semantic analysis for pattern-matching and consistency rules.
npx ruleprobe parse CLAUDE.md --show-unparseable
npx ruleprobe verify CLAUDE.md ./src --format summary
The --show-unparseable flag shows you exactly which lines were skipped and why. That's often the most useful output: it tells you which of your "rules" aren't rules at all.
moonrunnerkc
/
ruleprobe
Verify whether AI coding agents follow the instruction files they're given
RuleProbe
Verify whether AI coding agents actually follow the instruction files they're given
Why
Every AI coding agent reads an instruction file. None of them prove they followed it.
You write CLAUDE.md or AGENTS.md with specific rules: camelCase variables, no any types, named exports only, test files for every source file. The agent says "Done." But did it actually follow them? Your code review catches some violations, misses others, and doesn't scale.
RuleProbe reads the same instruction file, extracts the machine-verifiable rules, and checks agent output against each one. Compliance scores with file paths and line numbers as evidence. Deterministic and reproducible by default. Optional semantic analysis for pattern-matching and consistency rules that require codebase-aware judgment.
Quick Start
npm install -g ruleprobe
Or run it directly:
npx ruleprobe --help
Parse an instruction file to see what rules RuleProbe can extract:
ruleprobe parse CLAUDE.md
ruleprobe parse AGENTS.md --show-unparseable
Verify agent output…
Top comments (0)