<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community:  Gábor Mészáros</title>
    <description>The latest articles on DEV Community by  Gábor Mészáros (@cleverhoods).</description>
    <link>https://dev.to/cleverhoods</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3647906%2F2ae4010e-7f1a-4906-9598-c259abb6e222.jpeg</url>
      <title>DEV Community:  Gábor Mészáros</title>
      <link>https://dev.to/cleverhoods</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cleverhoods"/>
    <language>en</language>
    <item>
      <title>The State of AI Instruction Quality</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 21 Apr 2026 12:41:52 +0000</pubDate>
      <link>https://dev.to/reporails/the-state-of-ai-instruction-quality-35mn</link>
      <guid>https://dev.to/reporails/the-state-of-ai-instruction-quality-35mn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Everybody has opinions about AGENTS.md/CLAUDE.md files. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Best practices get shared. Templates get copied, and this folk-type knowledge dominates the industry. Last year, &lt;a href="https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/" rel="noopener noreferrer"&gt;GitHub analyzed 2,500 repos&lt;/a&gt; and published best-practice advice. We wanted to go further: measure at scale, publish the data, and let anyone verify.&lt;/p&gt;

&lt;p&gt;When the agent doesn't follow instructions and does something contradictory, the usual suspects are: &lt;em&gt;the model is inconsistent, LLMs are not deterministic, you need better guardrails, you need retries.&lt;/em&gt; &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The failures almost always get attributed to the model.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So we decided to measure. We built a diagnostic tool &lt;strong&gt;that treats instruction files as structured objects with measurable properties&lt;/strong&gt;. Deterministic. Reproducible. No LLM-as-judge. Then we pointed it at GitHub repositories with instruction files for five agents - &lt;strong&gt;Claude, Codex, Copilot, Cursor, and Gemini&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;28,721 repositories. 165,063 files. 3.3 million instructions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;... and one question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if the instructions are the problem?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The dataset
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;28,721 projects.&lt;/strong&gt; Sourced from GitHub via API search, cloned, and deterministically analyzed. Each project was scanned for instruction files across five coding agents — then deduplicated to remove false positives from agent detection overlap.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Projects&lt;/th&gt;
&lt;th&gt;% of corpus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;12,356&lt;/td&gt;
&lt;td&gt;43.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;11,206&lt;/td&gt;
&lt;td&gt;39.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;td&gt;7,755&lt;/td&gt;
&lt;td&gt;27.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;7,291&lt;/td&gt;
&lt;td&gt;25.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;5,942&lt;/td&gt;
&lt;td&gt;20.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j1xpnj80ntk84v8g6e3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j1xpnj80ntk84v8g6e3.png" alt="Claude leads adoption at 43%, but all five agents have significant presence."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The percentages add up to more than 100% because &lt;strong&gt;37% of projects configure multiple agents&lt;/strong&gt;. More on that later.&lt;/p&gt;

&lt;p&gt;Key distributions stabilized early. A 9,582-repo sub-sample produced identical tier shares (±0.2pp) and the same mean scores as the full 12,076-repo intermediate sample. The final 28,721-repo corpus moved nothing. The patterns reported below are not small-sample artifacts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All classifications are deterministic&lt;/strong&gt; — the same file produces the same result every time. No LLM-as-judge. Sample classifications are published for inspection (methodology below). The tool is &lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;source-available&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How we measured
&lt;/h2&gt;

&lt;p&gt;The analyzer parses each instruction file into &lt;strong&gt;atoms&lt;/strong&gt; — the smallest semantically distinct units of content. A heading is one atom. A bullet point is one atom. A paragraph is one atom. Each atom gets classified along a few dimensions, all deterministic, no LLM involved:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Charge classification.&lt;/strong&gt; A three-phase pipeline determines whether an atom is a directive ("use X"), a constraint ("do not use Y"), neutral content (context, explanation, structure), or ambiguous (could be read either way). Phase 1 detects negation and prohibition patterns. Phase 2 detects modal auxiliaries and direct commands. Phase 3 uses syntactic dependency parsing to catch imperatives that the first two phases missed. First definitive match wins. Atoms that partially match but don't clear any phase are marked ambiguous. Everything else is neutral.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specificity.&lt;/strong&gt; Binary: does the instruction name a specific construct — a tool, file, command, flag, function, or config key — or does it stay at the category level? "Use consistent formatting" is abstract. "Format with &lt;code&gt;ruff format&lt;/code&gt;" is named. This is a text property, not a judgment call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File categorization.&lt;/strong&gt; Each file is classified as base config (your main CLAUDE.md or .cursorrules), a rule file, a skill definition, or a sub-agent definition — based on file path conventions for each agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content type.&lt;/strong&gt; Charge classification separates behavioral content (directives and constraints) from structural content (headings, context paragraphs, examples). That's how we know what fraction of your file is actually doing work.&lt;/p&gt;

&lt;p&gt;The full tool is source-available (&lt;a href="https://github.com/reporails/cli/blob/main/LICENSE" rel="noopener noreferrer"&gt;BUSL-1.1&lt;/a&gt;). You can run &lt;code&gt;npx @reporails/cli check&lt;/code&gt; on your own project and inspect every finding. More on that at the end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding 1: Most of your instruction file isn't instructions
&lt;/h2&gt;

&lt;p&gt;Here's what the median instruction file actually contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50 content items&lt;/strong&gt; total&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;12 of those are actual directives&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The rest is headings, context paragraphs, examples, structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg45f7k3xx4n6naso8gok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg45f7k3xx4n6naso8gok.png" alt="Median instruction file: 50 content items, 12 actual directives. The rest is structure."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Only 27% of your instruction file is doing what you think it does.&lt;/strong&gt; &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The other 73% is scaffolding. Headings that organize but don't instruct. Explanation paragraphs that compete for the model's attention without adding behavioral weight. Example blocks. Context-setting prose.&lt;/p&gt;

&lt;p&gt;That's not inherently bad. Structure matters. But if you're writing a 200-line CLAUDE.md and only 54 lines are actual instructions, you should probably know that.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The average instruction is &lt;strong&gt;8.9 words&lt;/strong&gt; long. That's a sentence fragment.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Finding 2: 90% of instructions don't name what they're talking about
&lt;/h2&gt;

&lt;p&gt;This is the big one.&lt;/p&gt;

&lt;p&gt;We measured whether each instruction references specific tools, files, commands, or constructs by name — or whether it stays at the category level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two-thirds of all instructions are abstract.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Names specific constructs&lt;/th&gt;
&lt;th&gt;Uses category language&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;39.3%&lt;/td&gt;
&lt;td&gt;60.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;38.3%&lt;/td&gt;
&lt;td&gt;61.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;td&gt;33.3%&lt;/td&gt;
&lt;td&gt;66.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;30.8%&lt;/td&gt;
&lt;td&gt;69.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;30.6%&lt;/td&gt;
&lt;td&gt;69.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What does this look like in practice?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Abstract&lt;/strong&gt;: "Use consistent code formatting"&lt;br&gt;
&lt;strong&gt;Specific&lt;/strong&gt;: "Format with &lt;code&gt;ruff format&lt;/code&gt; before committing"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Abstract&lt;/strong&gt;: "Avoid using mocks in tests"&lt;br&gt;
&lt;strong&gt;Specific&lt;/strong&gt;: "Do not use &lt;code&gt;unittest.mock&lt;/code&gt; — use the real database via &lt;code&gt;test_db&lt;/code&gt; fixture"&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/cleverhoods/instruction-best-practices-precision-beats-clarity-lod"&gt;previous controlled experiments&lt;/a&gt;, specificity produced a 10.9x odds ratio in compliance (N=1000, p&amp;lt;10⁻³⁰). The instruction that names the exact construct gets followed. The one that describes it abstractly... mostly doesn't. This is consistent with independent findings from RuleArena (&lt;a href="https://arxiv.org/abs/2412.08972" rel="noopener noreferrer"&gt;Zhou et al., ACL 2025&lt;/a&gt;), where LLMs struggled systematically with complex rule-following tasks — even strong models fail when the rules themselves are ambiguous or underspecified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;89.9% of all agent configurations&lt;/strong&gt; contain at least one instruction that doesn't name what it means. It's not a few projects. It's nearly everyone.&lt;/p&gt;


&lt;h2&gt;
  
  
  Finding 3: &lt;code&gt;agents.md&lt;/code&gt; is the most common instruction file
&lt;/h2&gt;

&lt;p&gt;Before we get into quality, let's look at what people are actually naming their files:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;agents.md&lt;/code&gt; / &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;20,654&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;claude.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;14,014&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemini.md&lt;/code&gt; / &lt;code&gt;GEMINI.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;5,703&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,647&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.cursorrules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2,415&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;49,071 unique file paths&lt;/strong&gt; across the corpus. That's not a typo. The format fragmentation is real.&lt;/p&gt;

&lt;p&gt;A few things jumped out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;claude.md&lt;/code&gt; (lowercase, 10,642) is &lt;strong&gt;3x more common&lt;/strong&gt; than &lt;code&gt;CLAUDE.md&lt;/code&gt; (3,372). Both work. The community clearly prefers lowercase.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agents.md&lt;/code&gt; dominates — the Codex/generic format is the single most popular instruction file name.&lt;/li&gt;
&lt;li&gt;Skills and rules are already showing up in meaningful numbers: &lt;code&gt;.claude/rules/testing.md&lt;/code&gt; (422), &lt;code&gt;.agents/skills/tailwindcss-development/skill.md&lt;/code&gt; (334).&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Finding 4: Different agents, completely different config philosophies
&lt;/h2&gt;

&lt;p&gt;Not all agents are configured the same way. Not even close.&lt;/p&gt;

&lt;p&gt;We categorized every file into four types: &lt;strong&gt;base config&lt;/strong&gt; (your main CLAUDE.md, .cursorrules, etc.), &lt;strong&gt;rules&lt;/strong&gt; (scoped rule files), &lt;strong&gt;skills&lt;/strong&gt; (task-specific skill definitions), and &lt;strong&gt;sub-agents&lt;/strong&gt; (role-based agent definitions).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Base&lt;/th&gt;
&lt;th&gt;Rules&lt;/th&gt;
&lt;th&gt;Skills&lt;/th&gt;
&lt;th&gt;Sub-agents&lt;/th&gt;
&lt;th&gt;Total files&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;18,733&lt;/td&gt;
&lt;td&gt;4,638&lt;/td&gt;
&lt;td&gt;10,692&lt;/td&gt;
&lt;td&gt;10,538&lt;/td&gt;
&lt;td&gt;44,601&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;5,903&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;19,843&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6,237&lt;/td&gt;
&lt;td&gt;1,716&lt;/td&gt;
&lt;td&gt;33,699&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;td&gt;16,026&lt;/td&gt;
&lt;td&gt;4,486&lt;/td&gt;
&lt;td&gt;10,352&lt;/td&gt;
&lt;td&gt;3,012&lt;/td&gt;
&lt;td&gt;33,876&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;19,001&lt;/td&gt;
&lt;td&gt;81&lt;/td&gt;
&lt;td&gt;8,911&lt;/td&gt;
&lt;td&gt;165&lt;/td&gt;
&lt;td&gt;28,158&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;10,253&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;3,039&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;13,419&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3bqyzkqqkzg0cs1vag1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3bqyzkqqkzg0cs1vag1.png" alt="Cursor is 60% rules files. Codex is 68% base config. Same goal, completely different structure."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor is 60% rules files.&lt;/strong&gt; The &lt;code&gt;.cursor/rules/&lt;/code&gt; system dominates its configuration surface. One agent's config looks nothing like another's.&lt;/p&gt;

&lt;p&gt;Claude is the only agent with a roughly balanced architecture across all four config types. Codex and Gemini are almost entirely base config — single-file setups.&lt;/p&gt;

&lt;p&gt;The median Cursor project has &lt;strong&gt;3 instruction files&lt;/strong&gt;. The median Codex project has &lt;strong&gt;1&lt;/strong&gt;. These aren't just different tools. They're different &lt;em&gt;configuration philosophies&lt;/em&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Finding 5: 37% of projects configure multiple agents
&lt;/h2&gt;

&lt;p&gt;10,620 projects in the corpus target two or more agents. That's not a niche pattern — it's over a third of all projects.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agents&lt;/th&gt;
&lt;th&gt;Projects&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;18,101&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;6,776&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2,687&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;949&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;208&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bnski07fneblyvftsat.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bnski07fneblyvftsat.png" alt="Over a third of projects configure instructions for multiple coding agents."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dominant pair is &lt;strong&gt;Claude + Codex&lt;/strong&gt; (5,038 projects). Makes sense — &lt;code&gt;CLAUDE.md&lt;/code&gt; + &lt;code&gt;AGENTS.md&lt;/code&gt; is the most natural multi-agent starting point.&lt;/p&gt;

&lt;p&gt;Here's what's interesting about multi-agent repos: &lt;strong&gt;the same developer, writing instructions at the same time, for the same project, produces measurably different instruction quality across agents.&lt;/strong&gt; The person didn't change. The project didn't change. The instruction format did.&lt;/p&gt;

&lt;p&gt;Some of that is structural. Cursor's &lt;code&gt;.mdc&lt;/code&gt; rules enforce a different format than Claude's markdown. Codex's &lt;code&gt;AGENTS.md&lt;/code&gt; invites a different writing style than Copilot's &lt;code&gt;copilot-instructions.md&lt;/code&gt;. The format shapes the content.&lt;/p&gt;


&lt;h2&gt;
  
  
  Finding 6: The most-copied skills are the vaguest
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting.&lt;/p&gt;

&lt;p&gt;13,309 unique skills across the corpus. Some of them appear in hundreds of repos — clearly copied from shared templates or community sources. So we measured them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Named%&lt;/strong&gt; = what fraction of a skill's instructions name a specific tool, file, or command (instead of using category language).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Repos&lt;/th&gt;
&lt;th&gt;Named%&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;frontend-design&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;271&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Almost entirely abstract advice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;web-design-guidelines&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;197&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generic design principles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vercel-react-best-practices&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;315&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30.7%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mix of specific and vague&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pest-testing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;216&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;55.1%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Names actual test constructs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;livewire-development&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;87&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;75.5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Names specific Livewire components&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;next-best-practices&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92.6%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Names almost everything&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;frontend-design&lt;/code&gt; is in 271 repos with 2.8% specificity. It's a wall of "follow responsive design principles" and "ensure accessibility compliance." That reads well. It sounds professional. It gives the model almost nothing concrete to act on.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;next-best-practices&lt;/code&gt; is in 76 repos with 92.6% specificity. It says things like "use &lt;code&gt;next/image&lt;/code&gt; for all images" and "prefer &lt;code&gt;server&lt;/code&gt; components over &lt;code&gt;client&lt;/code&gt;." It reads like a checklist. It tells the model exactly what to do.&lt;/p&gt;

&lt;p&gt;One is shared 3.5x more than the other.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The most popular skills are the most decorative.&lt;/strong&gt; The well-written ones barely spread.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feeksw2bf5spqdgvfba2x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feeksw2bf5spqdgvfba2x.png" alt="Each bubble is a community skill. The most popular ones cluster in the top-left — widely adopted, almost entirely abstract."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The best and worst skills (&amp;gt;50 repos)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Most specific:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Repos&lt;/th&gt;
&lt;th&gt;Named%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;next-best-practices&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;92.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;shadcn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;82.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;livewire-development&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;87&lt;/td&gt;
&lt;td&gt;75.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pest-testing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;216&lt;/td&gt;
&lt;td&gt;55.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;laravel-best-practices&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;94&lt;/td&gt;
&lt;td&gt;49.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Most vague:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Repos&lt;/th&gt;
&lt;th&gt;Named%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;openspec-explore&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;td&gt;2.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;frontend-design&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;271&lt;/td&gt;
&lt;td&gt;2.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;web-design-guidelines&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;197&lt;/td&gt;
&lt;td&gt;10.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vercel-composition-patterns&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;131&lt;/td&gt;
&lt;td&gt;10.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;find-skills&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;td&gt;18.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice a pattern? The Laravel/Livewire ecosystem produces specific skills. The generic frontend/design ones stay abstract. &lt;strong&gt;Domain-specific communities write better instructions than cross-cutting ones.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Finding 7: Sub-agents are almost entirely persona prompts
&lt;/h2&gt;

&lt;p&gt;5,526 unique sub-agent roles in the corpus. Developers are building agent teams: code reviewers, architects, debuggers, testers, security auditors.&lt;/p&gt;

&lt;p&gt;The problem? &lt;strong&gt;Sub-agents are the most abstract config type in the entire corpus.&lt;/strong&gt; Only 17% of sub-agent instructions name specific constructs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Repos&lt;/th&gt;
&lt;th&gt;Named%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;code-reviewer.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;236&lt;/td&gt;
&lt;td&gt;14.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;architect.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;18.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;debugger.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;9.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-auditor.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;14.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test-runner.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;10.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;frontend-developer.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;9.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;p&gt;Most of these are persona prompts. "You are a senior code reviewer. You care about code quality, security, and maintainability." That's a role description, not an instruction set. It tells the model &lt;em&gt;who to be&lt;/em&gt;, not &lt;em&gt;what to do&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Compare this to a base config that says "run &lt;code&gt;uv run pytest tests/ -v&lt;/code&gt; before suggesting any commit" — that's 100% named, and the model knows exactly what action to take.&lt;/p&gt;


&lt;h2&gt;
  
  
  The anatomy chart: more directives, worse quality
&lt;/h2&gt;

&lt;p&gt;Here's where it all comes together.&lt;/p&gt;

&lt;p&gt;We measured three things for each config type: how big the files are, how many directives they contain, and what fraction of those directives actually name something specific.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8i6m6ud3s9dwnj3jft3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8i6m6ud3s9dwnj3jft3.png" alt="Sub-agents have the most directives per file — and the least specific ones. More instructions doesn’t mean better instructions."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sub-agents have the &lt;strong&gt;largest&lt;/strong&gt; files (61 items median), the &lt;strong&gt;most&lt;/strong&gt; directives (17), and the &lt;strong&gt;worst&lt;/strong&gt; specificity (17%). They're the wordiest config type in the corpus and the least effective.&lt;/p&gt;

&lt;p&gt;Base configs are the opposite. Fewer directives (11), but 40% of them name specific constructs. The developer writing their own CLAUDE.md by hand, for their own project, produces the most actionable instructions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config type&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Median size&lt;/th&gt;
&lt;th&gt;Median directives&lt;/th&gt;
&lt;th&gt;Specificity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base configs&lt;/td&gt;
&lt;td&gt;69,916&lt;/td&gt;
&lt;td&gt;50 items&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;39.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rules files&lt;/td&gt;
&lt;td&gt;29,122&lt;/td&gt;
&lt;td&gt;34 items&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;31.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;39,231&lt;/td&gt;
&lt;td&gt;59 items&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-agents&lt;/td&gt;
&lt;td&gt;15,484&lt;/td&gt;
&lt;td&gt;61 items&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;17.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is clear: &lt;strong&gt;what developers write by hand is the most specific. What gets templated and shared gets progressively vaguer. And what tries hardest to sound authoritative — sub-agent persona prompts — is the most hollow.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;More instructions is not better instructions.&lt;/p&gt;

&lt;p&gt;Independent research supports the structural angle: FlowBench (&lt;a href="https://arxiv.org/abs/2406.14884" rel="noopener noreferrer"&gt;Xiao et al., 2024&lt;/a&gt;) found that presenting workflow knowledge in structured formats (flowcharts, numbered steps) improved LLM agent planning by 5-6 percentage points over prose — across GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo. Structure is not decoration. It changes what the model retrieves.&lt;/p&gt;


&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;Five things to know about these numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sampling bias.&lt;/strong&gt; GitHub API search, public repos only, English-skewed. Enterprise configurations, private repos, and non-English projects are not represented. This is not a random sample of all instruction files in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classification accuracy.&lt;/strong&gt; The charge classifier is deterministic but not perfect. Edge cases exist: mixed-charge sentences, implicit constructs, domain jargon that looks like a category term but is actually a named tool. Specificity detection (named vs abstract) is simpler and more robust. Sample classifications are &lt;a href="https://github.com/reporails/30k-corpus" rel="noopener noreferrer"&gt;published for inspection&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Association, not causation.&lt;/strong&gt; "More directives correlate with lower specificity" is an observed pattern. We do not claim that adding directives &lt;em&gt;causes&lt;/em&gt; quality to drop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Snapshot.&lt;/strong&gt; Collected March–April 2026. Instruction practices are changing fast — &lt;code&gt;agents.md&lt;/code&gt; didn't exist six months ago. These numbers describe the ecosystem at collection time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No popularity weighting.&lt;/strong&gt; A 10-star hobby project counts the same as a 50K-star production repo. The distribution of instruction quality in &lt;em&gt;production&lt;/em&gt; agent work may differ.&lt;/p&gt;


&lt;h2&gt;
  
  
  What this means
&lt;/h2&gt;

&lt;p&gt;This isn't an article about AI models being bad at following instructions. The models are fine.&lt;/p&gt;

&lt;p&gt;This is an article about what we actually give them to work with.&lt;/p&gt;

&lt;p&gt;Most instruction files are three-quarters scaffolding. Two-thirds of the actual instructions don't name what they're talking about. The most popular community skills are the most decorative. Sub-agent definitions are the wordiest files in the corpus and the least specific.&lt;/p&gt;

&lt;p&gt;None of that is obvious from reading your own files. It wasn't obvious to us before we measured it. A well-structured CLAUDE.md &lt;em&gt;feels&lt;/em&gt; thorough. A shared skill with 271 repos &lt;em&gt;feels&lt;/em&gt; battle-tested. A sub-agent with 17 directives &lt;em&gt;feels&lt;/em&gt; comprehensive.&lt;/p&gt;

&lt;p&gt;Measurement shows something different.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://medium.com/@cleverhoods/the-undiagnosed-input-problem-03231442219d" rel="noopener noreferrer"&gt;The Undiagnosed Input Problem&lt;/a&gt;, I argued that the industry is great at inspecting outputs and weak at inspecting inputs. This corpus analysis is the evidence for that claim.&lt;/p&gt;

&lt;p&gt;The instruction files are there. The developers wrote them. They just have no way to know which parts are working and which parts are wallpaper.&lt;/p&gt;


&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;The analyzer we used for this corpus analysis is available as a CLI you can run against your own instruction files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt;&lt;/strong&gt; — instruction diagnostics for coding agents. Deterministic. No LLM-as-judge. 97 rules across structure, content, efficiency, maintenance, and governance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @reporails/cli check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That scans your project, detects which agents are configured, and reports findings with specific line numbers and rule IDs. Here's what the output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reporails — Diagnostics

  ┌─ Main (1)
  │ CLAUDE.md
  │   ⚠       Missing directory layout             CORE:C:0035
  │   ⚠ L9    7 of 7 instruction(s) lack reinfor…  CORE:C:0053
  │     ... and 16 more
  │
  └─ 21 findings

  Score: 7.9 / 10  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░

  21 findings · 4 warnings · 1 info
  Compliance: HIGH
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The corpus analysis used the same classification pipeline at scale. Fix the findings, run again, watch your score improve.&lt;/p&gt;

&lt;h3&gt;
  
  
  The dataset
&lt;/h3&gt;

&lt;p&gt;The full corpus is published at &lt;strong&gt;&lt;a href="https://github.com/reporails/30k-corpus" rel="noopener noreferrer"&gt;reporails/30k-corpus&lt;/a&gt;&lt;/strong&gt;. Three files:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Records&lt;/th&gt;
&lt;th&gt;What it contains&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;repos.jsonl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;28,721&lt;/td&gt;
&lt;td&gt;Per-project record: agents configured, stars, language, license, topics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stats_public.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Every aggregate statistic in this article&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validation_key.csv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2,814&lt;/td&gt;
&lt;td&gt;Sample classifications with source text for inspection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verify any claim:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# "28,721 repositories"&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;repos.jsonl | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;

&lt;span class="c"&gt;# "43% Claude"&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;repos.jsonl | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
import sys, json
repos = [json.loads(l) for l in sys.stdin]
claude = sum(1 for r in repos if 'claude' in r['canonical_agents'])
print(f'{claude}/{len(repos)} = {claude/len(repos)*100:.1f}%')
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every number in every table traces to that dataset. If you disagree with a finding, count the rows.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of the Instruction Quality series. Previous: &lt;a href="https://medium.com/@cleverhoods/the-undiagnosed-input-problem-03231442219d" rel="noopener noreferrer"&gt;The Undiagnosed Input Problem&lt;/a&gt;. Related: &lt;a href="https://cleverhoods.medium.com/instruction-best-practices-precision-beats-clarity-e1bcae806671" rel="noopener noreferrer"&gt;Precision Beats Clarity&lt;/a&gt; · &lt;a href="https://cleverhoods.medium.com/do-not-think-of-a-pink-elephant-7d40a26cd072" rel="noopener noreferrer"&gt;Do Not Think of a Pink Elephant&lt;/a&gt; · &lt;a href="https://cleverhoods.medium.com/claude-md-best-practices-7-formatting-rules-for-the-machine-a591afc3d9a9" rel="noopener noreferrer"&gt;7 Formatting Rules for the Machine&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>The Undiagnosed Input Problem</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Wed, 08 Apr 2026 11:51:12 +0000</pubDate>
      <link>https://dev.to/reporails/the-undiagnosed-input-problem-4pmc</link>
      <guid>https://dev.to/reporails/the-undiagnosed-input-problem-4pmc</guid>
      <description>&lt;p&gt;The AI agent ecosystem has built a serious industry around controlling outputs. Guardrails. Safety classifiers. Output validation. Monitoring. Retry systems. Human review.&lt;/p&gt;

&lt;p&gt;All of that matters, but there is simpler upstream question that still goes mostly unmeasured:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are the instructions any good?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sounds obvious, &lt;strong&gt;yet it is not how the industry behaves.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When an agent fails to follow instructions, the usual explanations come fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Models are probabilistic&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Agents are inconsistent&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need stronger guardrails&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need better monitoring&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need retries&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need humans in the loop&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;… and while those explanations are right to a certain degree, they also have a side effect: &lt;strong&gt;they turn instruction quality into a blind spot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The ecosystem has become extremely good at inspecting what comes out of the model, and surprisingly weak at inspecting what goes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;

&lt;p&gt;Consider &lt;a href="https://sierra.ai/blog/benchmarking-ai-agents" rel="noopener noreferrer"&gt;τ-bench&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It gives agents policy instructions and measures whether they follow them in realistic customer-service tasks. Airline and retail workflows. Real constraints. Real multi-step behavior.&lt;/p&gt;

&lt;p&gt;The benchmark result that gets repeated is the model result: even strong systems still fail a large share of tasks, and consistency across repeated attempts remains weak.&lt;/p&gt;

&lt;p&gt;The conclusion most people draw is straightforward: &lt;strong&gt;we need better models, better agents, better orchestration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My take: &lt;strong&gt;&lt;em&gt;Maybe&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But there is another question sitting underneath the benchmark:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Were the instructions themselves well-formed and well structured?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not just present. Not just long enough. Not just sincere.&lt;/p&gt;

&lt;p&gt;Well-formed. Well-structured. Well-organized.&lt;/p&gt;

&lt;p&gt;Specific enough to anchor behavior. Structured enough to survive context mixing. Non-conflicting across files. Positioned where the model can actually use them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Those questions usually never gets asked.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The industry response
&lt;/h2&gt;

&lt;p&gt;I had a conversation recently where a lead solutions architect put the standard view plainly:&lt;/p&gt;

&lt;p&gt;“&lt;em&gt;The instruction merely influences the probability distribution over outputs. It doesn’t override it.&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;That is right about the mechanism but it is wrong about what follows from it.&lt;/p&gt;

&lt;p&gt;Yes, instructions operate probabilistically. &lt;strong&gt;But that does not mean all instructions are weak in the same way.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The shape of the distribution is not fixed. It changes with the properties of the instruction itself. Specificity sharpens it. Structure sharpens it. Conflict flattens it. Vague abstractions flatten it. Bad formatting can suppress it almost entirely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Across my earlier controlled experiments, small changes in wording and placement produced large changes in compliance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cleverhoods.medium.com/do-not-think-of-a-pink-elephant-7d40a26cd072" rel="noopener noreferrer"&gt;Instruction&lt;/a&gt; ordering moved compliance by 25 percentage points with the same model and the same directive.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cleverhoods.medium.com/instruction-best-practices-precision-beats-clarity-e1bcae806671" rel="noopener noreferrer"&gt;Specificity&lt;/a&gt; produced roughly a 10x compliance effect when the instruction named the exact construct instead of describing it abstractly.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cleverhoods.medium.com/claude-md-best-practices-7-formatting-rules-for-the-machine-a591afc3d9a9" rel="noopener noreferrer"&gt;Formatting&lt;/a&gt; changed whether the model reliably registered the instruction at all.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The problem is that most instruction systems are built without diagnostics.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;That is not an AI limitation. That is an engineering failure.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The folk system
&lt;/h2&gt;

&lt;p&gt;Right now, instruction practice spreads mostly through imitation.&lt;/p&gt;

&lt;p&gt;A popular repository posts “best practices” for Claude Code. Shared Cursor rules circulate as templates. People copy &lt;code&gt;AGENTS.md&lt;/code&gt; files between projects. Teams accumulate &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;.cursorrules&lt;/code&gt;, c&lt;code&gt;opilot-instructions.md&lt;/code&gt;, etc and project-specific rule files across multiple tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Copy, paste, hope, repeat.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some of that advice is useful. Almost none of it is tested in any controlled, reproducible way. That would be fine if instruction quality were self-evident. &lt;strong&gt;It is not.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A long instruction file can feel thorough while being internally contradictory. A highly opinionated ruleset can feel disciplined while producing almost no behavioral influence on the model.&lt;/p&gt;

&lt;p&gt;A sprawling multi-file setup can look sophisticated while making the system worse.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Without diagnostics, developers do not know which instructions are binding, which are noise, and which are actively interfering with each other.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The gap
&lt;/h2&gt;

&lt;p&gt;The tooling split is now pretty clear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output tooling&lt;/strong&gt; is mature. Guardrails AI validates structure. Lakera focuses on prompt injection and security. NeMo Guardrails enforces safety and conversational rails. Llama Guard classifies risky content. The output edge is crowded.&lt;/p&gt;

&lt;p&gt;Prompt testing is real. Promptfoo, Braintrust, and LangSmith can all help evaluate behavior. But they are primarily black-box systems: did the prompt produce the output you wanted?&lt;/p&gt;

&lt;p&gt;That is useful.&lt;/p&gt;

&lt;p&gt;It is not the same as measuring the instruction artifact itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instruction-quality tooling&lt;/strong&gt; exists only in fragments. Some tools use LLM-as-judge. Some use deterministic local rules. But the category is still early, inconsistent, and mostly disconnected from measured behavioral outcomes.&lt;/p&gt;

&lt;p&gt;What is still largely missing is a deterministic way to inspect instruction files as engineered objects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how specific they are&lt;/li&gt;
&lt;li&gt;how directly they state intent&lt;/li&gt;
&lt;li&gt;whether they conflict across files&lt;/li&gt;
&lt;li&gt;whether they overuse headings&lt;/li&gt;
&lt;li&gt;whether they provide alternatives instead of bare prohibitions&lt;/li&gt;
&lt;li&gt;whether the system is getting denser while getting weaker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code gets static analysis.&lt;/p&gt;

&lt;p&gt;Instruction systems usually get &lt;em&gt;vibes&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we measured
&lt;/h2&gt;

&lt;p&gt;We built an analyzer that treats instruction files as structured objects with measurable properties. Deterministic. Reproducible. No LLM-as-judge.&lt;/p&gt;

&lt;p&gt;I am running it across a large live corpus of real repositories. The full run completes this week; what follows is what the partial sample already shows - stable enough to publish, not yet the full picture.&lt;/p&gt;

&lt;p&gt;Quality is reported on a 0-to-100 scale: &lt;code&gt;0&lt;/code&gt; means the file produces no measurable influence on model behavior, &lt;code&gt;100&lt;/code&gt; is the ceiling the framework can score.&lt;/p&gt;

&lt;p&gt;A fresh aggregation over &lt;strong&gt;12,076&lt;/strong&gt; completed instruction-file scans is virtually identical to an earlier &lt;strong&gt;9,582&lt;/strong&gt;-repo sample:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bottom tier:&lt;/strong&gt; &lt;code&gt;40.3%&lt;/code&gt; vs &lt;code&gt;40.1%&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;top tier:&lt;/strong&gt; &lt;code&gt;12.1%&lt;/code&gt; vs &lt;code&gt;12.2%&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;mean quality score:&lt;/strong&gt; &lt;code&gt;27&lt;/code&gt; vs &lt;code&gt;27&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;directive content ratio:&lt;/strong&gt; &lt;code&gt;27.9%&lt;/code&gt; vs &lt;code&gt;27.9%&lt;/code&gt; - the share of instruction sentences that directly tell the model what to do&lt;/p&gt;

&lt;p&gt;That matters because it means the pattern is stable.&lt;/p&gt;

&lt;p&gt;This does not look like a small-sample artifact.&lt;/p&gt;

&lt;p&gt;And the strongest finding is not what I expected.&lt;/p&gt;
&lt;h2&gt;
  
  
  More rules, lower quality
&lt;/h2&gt;

&lt;p&gt;The common response to bad agent behavior is to add more rules.&lt;/p&gt;

&lt;p&gt;More files. More guidance. More scoping. More edge-case coverage.&lt;/p&gt;

&lt;p&gt;The corpus says that strategy tends to backfire.&lt;/p&gt;

&lt;p&gt;Across &lt;strong&gt;12,076&lt;/strong&gt; repositories, instruction quality falls as instruction-file count rises:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Files per repo     N      Mean score   Bottom tier %   Top tier %
1                  4681   28           46.3%           16.9%
2-5                4796   26           37.3%            9.5%
6-20               1972   26           36.0%            8.8%
21-50               438   25           31.3%            5.7%
51-500              186   25           33.3%            5.4%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key number is the top-tier share.&lt;/p&gt;

&lt;p&gt;It collapses from &lt;code&gt;16.9%&lt;/code&gt; in single-file setups to &lt;code&gt;5.4%&lt;/code&gt; in repositories with &lt;code&gt;51&lt;/code&gt; to &lt;code&gt;500&lt;/code&gt; instruction files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That is a roughly 3x drop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The article version of that finding is simple:&lt;/p&gt;

&lt;p&gt;Developers respond to bad agent behavior by adding more rules. In the corpus, that strategy correlates with a 3x collapse in the probability of landing in the top tier.&lt;/p&gt;

&lt;p&gt;That does not prove file count causes low quality by itself.&lt;/p&gt;

&lt;p&gt;But it does show that rule proliferation is not rescuing these systems. At scale, it is associated with weaker instruction quality, not stronger.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sweet spot
&lt;/h2&gt;

&lt;p&gt;There is also a more subtle result in the partial sample. Instruction quality appears to be non-monotonic in directive density: more directives help at first, then stop helping, and past a point start to hurt.&lt;/p&gt;

&lt;p&gt;The full curve is in next week’s piece. The short version is that there is an optimal density range, after which additional directives stop strengthening the system.&lt;/p&gt;

&lt;p&gt;Enough force to bind behavior. Not so much that the system turns into an overpacked rules document.&lt;/p&gt;

&lt;h2&gt;
  
  
  A real example
&lt;/h2&gt;

&lt;p&gt;Here is the kind of instruction block the corpus is full of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Code should be clear, well documented, clear PHPDocs.

# Code must meet SOLID DRY KISS principles.

# Should be compatible with PSR standards when it need.

# Take care about performance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is not malicious. It is not absurd.&lt;/p&gt;

&lt;p&gt;It is just &lt;strong&gt;weak.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everything is abstract. Nothing is anchored. Headings are doing the work prose should do. The agent can read it, represent it, and still walk past most of it.&lt;/p&gt;

&lt;p&gt;Now compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Never use &lt;span class="sb"&gt;`&lt;/span&gt;var_dump&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; or &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;committed code. Use &lt;span class="sb"&gt;`&lt;/span&gt;Log::debug&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; instead.
Run &lt;span class="sb"&gt;`&lt;/span&gt;./vendor/bin/phpstan analyse src/&lt;span class="sb"&gt;`&lt;/span&gt; before every commit. Level 6 minimum.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same general intent. Completely different binding strength.&lt;/p&gt;

&lt;p&gt;The second version names the construct, names the alternative, names the command, and names the threshold. &lt;strong&gt;It gives the model something concrete to hold onto.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is what diagnostics should make visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means
&lt;/h2&gt;

&lt;p&gt;Output guardrails still matter.&lt;/p&gt;

&lt;p&gt;Prompt evaluation still matters.&lt;/p&gt;

&lt;p&gt;Safety systems still matter.&lt;/p&gt;

&lt;p&gt;But they do not answer the upstream question: &lt;strong&gt;Are the instructions themselves well-formed?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the answer is no, then a large class of downstream failures will keep showing up as mysterious agent unreliability when the real problem is earlier and simpler.&lt;/p&gt;

&lt;p&gt;The agent loaded the instruction and walked past it.&lt;/p&gt;

&lt;p&gt;That is often not a model problem.&lt;/p&gt;

&lt;p&gt;It is an input problem.&lt;/p&gt;

&lt;p&gt;And input quality is measurable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s next
&lt;/h2&gt;

&lt;p&gt;These are corpus-level findings from a partial sample, not universal laws.&lt;/p&gt;

&lt;p&gt;The sample is still in flight. The strongest claims here are about association, not proof of causality. Specific conflict-count case studies need source verification before publication. Popularity weighting is not yet applied, so “40% of repositories score in the bottom tier” is not the same claim as “40% of production agent work scores in the bottom tier.”&lt;/p&gt;

&lt;p&gt;The full corpus run completes this week. Next week I publish the end-of-run analysis across the full sample — the complete distribution, the cross-cuts the partial sample cannot yet support, and the specific case studies this article deliberately held back. If you want to know where your stack lands, that is the piece to come back for.&lt;/p&gt;

&lt;p&gt;For now, the central pattern is already stable enough to matter:&lt;/p&gt;

&lt;p&gt;The ecosystem keeps responding to weak agent behavior by adding more instructions, while the corpus shows that more instruction files are usually associated with lower measured quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That is the undiagnosed input problem.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Not that instructions do not matter.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;That they matter, measurably, and most teams still have no way to see whether theirs are helping or hurting.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;This is part of the Instruction Best Practices series. Previous: &lt;a href="https://cleverhoods.medium.com/do-not-think-of-a-pink-elephant-7d40a26cd072" rel="noopener noreferrer"&gt;Do NOT Think of a Pink Elephant&lt;/a&gt;, &lt;a href="https://cleverhoods.medium.com/instruction-best-practices-precision-beats-clarity-e1bcae806671" rel="noopener noreferrer"&gt;Precision Beats Clarity&lt;/a&gt;, &lt;a href="https://cleverhoods.medium.com/claude-md-best-practices-7-formatting-rules-for-the-machine-a591afc3d9a9" rel="noopener noreferrer"&gt;7 Formatting Rules for the Machine&lt;/a&gt;. I’m building instruction diagnostics for coding agents. Follow for the full corpus analysis.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>claude</category>
      <category>performance</category>
    </item>
    <item>
      <title>Do NOT Think of a Pink Elephant</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:19:14 +0000</pubDate>
      <link>https://dev.to/cleverhoods/do-not-think-of-a-pink-elephant-383n</link>
      <guid>https://dev.to/cleverhoods/do-not-think-of-a-pink-elephant-383n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;You thought of a pink elephant, didn't you?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Same goes for LLMs too. &lt;/p&gt;

&lt;p&gt;"&lt;em&gt;Do not use mocks in tests.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;Clear, direct, unambiguous instruction. The agent read it — I can see it in the trace. Then it wrote a test file with &lt;code&gt;unittest.mock&lt;/code&gt; on line 3. Thanks...&lt;/p&gt;

&lt;p&gt;I've seen this play out hundreds of times. A developer writes a rule, the agent loads it, and it does exactly what the rule said not to do. The natural conclusion: instructions are unreliable. The agent is probabilistic. You can't trust it.&lt;/p&gt;

&lt;p&gt;That's wrong. The instruction was the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pink elephant
&lt;/h2&gt;

&lt;p&gt;There's a well-known effect in psychology called ironic process theory (Daniel Wegner, 1987). Tell someone "don't think of a pink elephant," and they immediately think of a pink elephant. The act of suppressing a thought requires activating it first.&lt;/p&gt;

&lt;p&gt;Something structurally similar happens with AI instructions.&lt;/p&gt;

&lt;p&gt;"Do not use mocks in tests" introduces the concept of mocking into the context. The tokens &lt;code&gt;mock&lt;/code&gt;, &lt;code&gt;tests&lt;/code&gt;, &lt;code&gt;use&lt;/code&gt; — these are exactly the tokens the model would produce when writing test code with mocks. You've put the thing you're banning right in the generation path.&lt;/p&gt;

&lt;p&gt;This doesn't mean restrictive instructions are useless. It means a bare restriction is incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anatomy of a complete instruction
&lt;/h2&gt;

&lt;p&gt;The instructions that work — reliably, across thousands of runs — have three components. But the order you write them in matters as much as whether they're there at all.&lt;/p&gt;

&lt;p&gt;Here's how most people write it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Human-natural ordering — constraint first&lt;/span&gt;
Do not use unittest.mock in tests.
Use real service clients from tests/fixtures/.
Mocked tests passed CI last quarter while the production
integration was broken — real clients catch this.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three components are present. Restriction, directive, context. But the restriction fires first — the model activates &lt;code&gt;{mock, unittest, tests}&lt;/code&gt; before it ever sees the alternative. You've front-loaded the pink elephant.&lt;/p&gt;

&lt;p&gt;Now flip it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Golden ordering — directive first&lt;/span&gt;
Use real service clients from tests/fixtures/.
Real integration tests catch deployment failures and configuration
errors that would otherwise reach production undetected.
Do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same three components. Different order. The directive establishes the desired pattern first. The reasoning reinforces it. The restriction fires last, when the positive frame is already dominant.&lt;/p&gt;

&lt;p&gt;In my experiments — 500 runs per condition, same model, same context — constraint-first produces violations 31% of the time. Directive-first with positive reasoning: 7%.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The pink elephant isn't just about missing components. It's about which concept the model sees first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three layers, in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Directive&lt;/strong&gt; — what to do. This goes first. It establishes the pattern you want in the generation path before the prohibited concept appears.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt; — why. Reasoning that reinforces the directive &lt;em&gt;without mentioning the prohibited concept&lt;/em&gt;. "Real integration tests catch deployment failures" adds mass to the positive pattern. Reasoning that mentions the prohibited concept doubles the violation rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restriction&lt;/strong&gt; — what not to do. This goes last. Negation provides weak suppression — but weak suppression is enough when the positive pattern is already dominant.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The part nobody expects
&lt;/h2&gt;

&lt;p&gt;Here's what surprised me: &lt;strong&gt;the ordering effect is larger than any other variable I've measured.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Precise naming vs. vague categories? 28 percentage points. Exact scope vs. broad scope? 74 points across the range. But reordering — same words, same components, just flipped — accounts for 25 points on its own. And it compounds with everything else.&lt;/p&gt;

&lt;p&gt;Most developers write instructions the way they'd write them for a human: state the problem, then the solution. "Don't do X. Instead, do Y." It's natural. It's also the worst ordering for an LLM.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Never write "Don't use X. Instead, use Y." Write "Use Y. Here's why Y works. Don't use X."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Formatting helps too — structure is not decoration. I covered that in depth in &lt;a href="https://dev.to/cleverhoods/-claudemd-best-practices-7-formatting-rules-for-the-machine-3d3l"&gt;7 Formatting Rules for the Machine&lt;/a&gt;. But formatting on top of bad ordering is polishing the wrong end. Get the order right first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;Here's a real instruction I see in the wild:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;When writing tests, avoid mocking external services. Try to
use real implementations where possible. This helps catch
integration issues early. If you must mock, keep mocks minimal
and focused.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Count the problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Avoid" — hedged, not direct&lt;/li&gt;
&lt;li&gt;"external services" — category, not construct&lt;/li&gt;
&lt;li&gt;"Try to" — escape hatch built into the instruction&lt;/li&gt;
&lt;li&gt;"where possible" — another escape hatch&lt;/li&gt;
&lt;li&gt;"If you must mock" — reintroduces mocking as an option &lt;em&gt;within the instruction that prohibits it&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Constraint-first ordering — the prohibition leads, the alternative follows&lt;/li&gt;
&lt;li&gt;No structural separation — restriction, directive, hedge, and escape hatch all in one paragraph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now rewrite it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**Use the service clients**&lt;/span&gt; in &lt;span class="sb"&gt;`tests/fixtures/stripe.py`&lt;/span&gt; and
&lt;span class="sb"&gt;`tests/fixtures/redis.py`&lt;/span&gt;.
&lt;span class="gt"&gt;
&amp;gt; Real service clients caught a breaking Stripe API change&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; that went undetected for 3 weeks in payments - integration&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; tests against live endpoints surface these immediately.&lt;/span&gt;

&lt;span class="ge"&gt;*Do not import*&lt;/span&gt; &lt;span class="sb"&gt;`unittest.mock`&lt;/span&gt; or &lt;span class="sb"&gt;`pytest.monkeypatch`&lt;/span&gt;.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Directive first — names the exact files. Context second — the specific incident, reinforcing &lt;em&gt;why the directive matters&lt;/em&gt; without mentioning the prohibited concept. Restriction last — names the exact imports, fires after the positive pattern is established. No hedging. No escape hatches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;For any instruction in your AGENTS.md/CLAUDE.md or SKILLS.md  files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the directive.&lt;/strong&gt; Name the file, the path, the pattern. Use backticks. If there's no alternative to lead with, you're writing a pink elephant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add the context.&lt;/strong&gt; One sentence. The specific incident or the specific reason the directive works. Do not mention the thing you're about to prohibit — reasoning that references the prohibited concept halves the benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End with the restriction.&lt;/strong&gt; Name the construct — the import, the class, the function. Bold it. No "try to avoid" or "where possible."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format each component distinctly.&lt;/strong&gt; The directive, context, and restriction should be visually and structurally separate. Don't merge them into one paragraph.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;If your instruction is just "don't do X" — you've told the model to think about X.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Tell it what to think about instead. And tell it &lt;em&gt;first&lt;/em&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>agentskills</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Instruction Best Practices: Precision Beats Clarity</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 24 Mar 2026 13:12:30 +0000</pubDate>
      <link>https://dev.to/cleverhoods/instruction-best-practices-precision-beats-clarity-lod</link>
      <guid>https://dev.to/cleverhoods/instruction-best-practices-precision-beats-clarity-lod</guid>
      <description>&lt;p&gt;Two rules in the same file. Both say "don't mock."&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When working with external services, avoid using mock objects in tests.

When writing tests for src/payments/, do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Same intent. Same file. Same model. One gets followed. One gets ignored.&lt;/p&gt;

&lt;p&gt;I stared at the diff for a while, convinced something was broken. The model loaded the file. It read both rules. It followed one and walked past the other like it wasn't there.&lt;/p&gt;

&lt;p&gt;Nothing was broken. The words were wrong.&lt;/p&gt;

&lt;h1&gt;
  
  
  The experiment
&lt;/h1&gt;

&lt;p&gt;I ran controlled behavioral experiments: same model, same context window, same position in the file. One variable changed at a time. Over a thousand runs per finding, with statistically significant differences between conditions.&lt;/p&gt;

&lt;p&gt;Two findings stood out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt; &lt;em&gt;(and the one that surprised me most)&lt;/em&gt;: when instructions have a conditional scope ("When doing X..."), precision matters enormously. &lt;strong&gt;A broad scope is worse than a wrong scope.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;: instructions that name the exact construct get followed roughly &lt;strong&gt;10 times more often&lt;/strong&gt; than instructions that describe the category. "&lt;code&gt;unittest.mock&lt;/code&gt;" vs "mock objects" — same rule, same meaning to a human. Not the same to the model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Scope it or drop it
&lt;/h1&gt;

&lt;p&gt;Most instructions I see in the wild look like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When working with external services, do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That "When working with external services" is the scope — it tells the agent &lt;em&gt;when&lt;/em&gt; to apply the rule. Scopes are useful. But the wording matters more than you'd expect.&lt;/p&gt;

&lt;p&gt;I tested four scope wordings for the same instruction:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Exact scope — best compliance
When writing tests for src/payments/, do not use unittest.mock.

# Universal scope — nearly as good
When writing tests, do not use unittest.mock.

# Wrong domain — degraded
When working with databases, do not use unittest.mock.

# Broad category — worst compliance
When working with external services, do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Read that ranking again. &lt;strong&gt;Broad is worse than wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"When working with databases" has nothing to do with the test at hand. But it gives the agent something concrete - a specific domain to anchor on. The instruction is scoped to the wrong context, but it's still a clear, greppable constraint.&lt;/p&gt;

&lt;p&gt;"When working with external services" is technically correct. It even sounds more helpful. But it activates a cloud of associations - HTTP clients, API wrappers, service meshes, authentication, retries - and the instruction gets lost in the noise.&lt;/p&gt;

&lt;p&gt;The rule: &lt;strong&gt;if your scope wouldn't work as a grep pattern, rewrite it or drop it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An unconditional instruction beats a badly-scoped conditional:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Broad scope — fights itself
When working with external services, prefer real implementations
over mock objects in your test suite.

# No scope — just say it
Do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The second version is blunter. It's also more effective. Universal scopes ("When writing tests") cost almost nothing — they frame the context without introducing noise. But broad category scopes actively hurt.&lt;/p&gt;

&lt;h1&gt;
  
  
  Name the thing
&lt;/h1&gt;

&lt;p&gt;Here's what the difference looks like across domains.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Describes the category — low compliance
Avoid using mock objects in tests.

# Names the construct — high compliance
Do not use unittest.mock.

# Category
Handle errors properly in API calls.

# Construct
Wrap calls to stripe.Customer.create() in try/except StripeError.

# Category
Don't use unsafe string formatting.

# Construct
Do not use f-strings in SQL queries. Use parameterized queries
with cursor.execute().

# Category
Avoid storing secrets in code.

# Construct
Do not hardcode values in os.environ[]. Read from .env
via python-dotenv.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The pattern: if the agent could tab-complete it, use that form. If it's something you'd type into an import statement, a grep, or a stack trace - that's the word the agent needs.&lt;/p&gt;

&lt;p&gt;Category names feel clearer to us, humans. "Mock objects" is plain English. But the model matches against what it would actually generate, not against what the words mean in English. "&lt;code&gt;unittest.mock&lt;/code&gt;" matches the tokens the model would produce when writing test code. "Mock objects" matches everything and nothing.&lt;/p&gt;

&lt;p&gt;Think of it like search. A query for &lt;code&gt;unittest.mock&lt;/code&gt; returns one result. A query for "mocking libraries" returns a thousand. The agent faces the same problem: a vague instruction activates too many associations, and the signal drowns.&lt;/p&gt;

&lt;h1&gt;
  
  
  The compound effect
&lt;/h1&gt;

&lt;p&gt;When both parts of the instruction are vague - vague scope, vague body - the failures compound. When both are precise, the gains compound.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Before — vague everywhere
When working with external services, prefer using real implementations
over mock objects in your test suite.

# After — precise everywhere
When writing tests for `src/payments/`:
Do not import `unittest.mock`.
Use the sandbox client from `tests/fixtures/stripe.py`.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Same intent. The rewrite takes ten seconds. The difference is not incremental, it's categorical.&lt;/p&gt;

&lt;p&gt;Formatting gets the instruction &lt;em&gt;read&lt;/em&gt; - headers, code blocks, hierarchy make it scannable. Precision gets the instruction &lt;em&gt;followed&lt;/em&gt; - exact constructs and tight scopes make it actionable. They work together. A well-formatted vague instruction still gets ignored. A precise instruction buried in a wall of text still gets missed. You need both.&lt;/p&gt;

&lt;h1&gt;
  
  
  When to adopt this
&lt;/h1&gt;

&lt;p&gt;This matters most when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your instruction files mention categories more than constructs, like "services," "libraries," "objects," "errors" etc.&lt;/li&gt;
&lt;li&gt;You use broad conditional scopes: "when working with...," "for external...," "in general..."&lt;/li&gt;
&lt;li&gt;You have rules that are loaded and read but not followed&lt;/li&gt;
&lt;li&gt;You want to squeeze more compliance out of existing instructions without restructuring the file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It matters less when your instructions are already construct-level ("do not call &lt;code&gt;eval()&lt;/code&gt;") or unconditional.&lt;/p&gt;

&lt;h1&gt;
  
  
  Try it
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Open your instruction files.&lt;/li&gt;
&lt;li&gt;Find every instruction that uses a category word -&amp;gt; "services," "objects," "libraries," "errors," "dependencies."&lt;/li&gt;
&lt;li&gt;Replace it with the construct the agent would encounter at runtime - the import path, the class name, the file glob, the CLI flag.&lt;/li&gt;
&lt;li&gt;For conditional instructions: replace broad scopes with exact paths or file patterns. If you can't be exact, drop the condition entirely - unconditional is better than vague.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then run your agent on the same task that was failing. You'll see the difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formatting is the signal. Precision is the target.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>performance</category>
      <category>agents</category>
    </item>
    <item>
      <title>CLAUDE.md Best Practices: 7 formatting rules for the Machine</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 03 Mar 2026 13:06:00 +0000</pubDate>
      <link>https://dev.to/cleverhoods/-claudemd-best-practices-7-formatting-rules-for-the-machine-3d3l</link>
      <guid>https://dev.to/cleverhoods/-claudemd-best-practices-7-formatting-rules-for-the-machine-3d3l</guid>
      <description>&lt;p&gt;I watched an agent ignore a rule I wrote 2 hours earlier.&lt;/p&gt;

&lt;p&gt;Not a vague rule. A specific one. &lt;strong&gt;"run pytest before committing."&lt;/strong&gt; It was right there in the CLAUDE.md, paragraph two, between the project description and the linting setup. The agent read the file. I saw it in the context. It just... didn't follow it.&lt;/p&gt;

&lt;p&gt;I moved the same instruction under a &lt;code&gt;## Testing&lt;/code&gt; header, wrapped &lt;code&gt;pytest&lt;/code&gt; in backticks, and added a one-line rationale. Next run, the agent followed it to the letter.&lt;/p&gt;

&lt;p&gt;The instruction didn't change. The &lt;strong&gt;signal strength&lt;/strong&gt; did.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/cleverhoods/why-bootstrap-should-be-the-first-command-in-every-agent-session-4jg2"&gt;last post&lt;/a&gt;, we got the agent oriented — &lt;code&gt;/bootstrap&lt;/code&gt; loads the map, the workflows, the boundaries. But orientation and compliance are different things. You can hand someone a perfect briefing and still lose them if the briefing is a wall of text. Same with agents.&lt;/p&gt;

&lt;p&gt;The question isn't whether your instructions are loaded. It's whether the agent &lt;em&gt;follows&lt;/em&gt; them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The comparison
&lt;/h2&gt;

&lt;p&gt;Here's the same instruction, two ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version A:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;When working on this project, always make sure to run the test suite
before committing any changes. The command to run tests is pytest and
you should run it from the project root. If tests fail, fix them before
committing. Also make sure to use ruff for formatting.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Version B:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`pytest`&lt;/span&gt; — run from project root before every commit
&lt;span class="p"&gt;-&lt;/span&gt; Fix failures before committing

&lt;span class="gu"&gt;## Formatting&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`ruff check --fix &amp;amp;&amp;amp; ruff format`&lt;/span&gt; — run before committing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same content. Version B gets followed. Version A gets buried.&lt;/p&gt;

&lt;p&gt;This isn't about aesthetics. Structural elements — headers, code fences, lists — create anchor points that agents latch onto. Prose paragraphs don't. The more structure you provide, the more reliably each instruction lands.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's not just about length
&lt;/h2&gt;

&lt;p&gt;You already learned to keep your CLAUDE.md short. It's a good start but it's not sufficient. A 20-line prose paragraph gets lost just as easily as a 200-line one. The variable isn't word count. It's structure.&lt;/p&gt;

&lt;p&gt;A short file with no headers, no code blocks, and no rationale will underperform a longer file that's well-structured.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Length is the ceiling. Formatting is the signal.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Seven structural rules
&lt;/h2&gt;

&lt;p&gt;These aren't content guidelines. They're formatting choices that determine whether instructions survive the trip from file to agent behavior. I'll start with the three you won't find in other guides, then cover the four that everyone mentions but nobody explains &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Include rationale
&lt;/h3&gt;

&lt;p&gt;"Never force push" is an instruction. "Never force push — rewrites shared history, unrecoverable for collaborators" is an instruction the agent &lt;em&gt;weighs&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Without rationale&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never use &lt;span class="sb"&gt;`rm -rf`&lt;/span&gt; on the project root
&lt;span class="p"&gt;-&lt;/span&gt; Always run tests before committing
&lt;span class="p"&gt;-&lt;/span&gt; Don't modify package-lock.json manually

&lt;span class="gh"&gt;# With rationale&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never use &lt;span class="sb"&gt;`rm -rf`&lt;/span&gt; on the project root — irrecoverable
&lt;span class="p"&gt;-&lt;/span&gt; Always run tests before committing — CI will reject untested code
&lt;span class="p"&gt;-&lt;/span&gt; Don't modify package-lock.json manually — causes merge conflicts
  and dependency resolution issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rationale doesn't just explain — it gives the agent a way to generalize. An agent that understands &lt;em&gt;why&lt;/em&gt; force push is forbidden will also avoid &lt;code&gt;git reset --hard origin/main&lt;/code&gt; without being told. The "why" turns a single rule into a class of behaviors.&lt;/p&gt;

&lt;p&gt;This is the most undervalued formatting choice. Every prohibition should carry its reason.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Keep heading hierarchy shallow
&lt;/h3&gt;

&lt;p&gt;Three levels is enough. &lt;code&gt;h1&lt;/code&gt; for the file title, &lt;code&gt;h2&lt;/code&gt; for sections, &lt;code&gt;h3&lt;/code&gt; for subsections. That's it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before (5 levels deep)&lt;/span&gt;
&lt;span class="gh"&gt;# Project&lt;/span&gt;
&lt;span class="gu"&gt;## Development&lt;/span&gt;
&lt;span class="gu"&gt;### Testing&lt;/span&gt;
&lt;span class="gu"&gt;#### Unit Tests&lt;/span&gt;
&lt;span class="gu"&gt;##### Mocking Strategy&lt;/span&gt;

&lt;span class="gh"&gt;# After (3 levels max)&lt;/span&gt;
&lt;span class="gh"&gt;# Project&lt;/span&gt;
&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="gu"&gt;### Unit tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deep nesting dilutes attention. An &lt;code&gt;h5&lt;/code&gt; competes with every heading above it for the agent's focus. It doesn't lose the &lt;code&gt;h2&lt;/code&gt;, but the hierarchy creates ambiguity about which level governs. Flat structures keep every instruction at the surface. &lt;strong&gt;If you need an &lt;code&gt;h4&lt;/code&gt;, you probably need a separate file.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Name files descriptively
&lt;/h3&gt;

&lt;p&gt;When an agent searches your project - browsing a directory listing, running a glob, deciding which file to read - the file name is the first filter. Before content, before headers, before anything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Before
docs/guide.md
docs/notes.md
scripts/setup.sh

# After
docs/api-authentication.md
docs/deployment-checklist.md
scripts/setup-local-dev.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent sees a directory listing and picks what to open. &lt;code&gt;api-authentication.md&lt;/code&gt; tells it whether the file might be relevant to the current task. &lt;code&gt;guide.md&lt;/code&gt; forces it to open and read before it can decide. Descriptive names save the agent a round trip. &lt;strong&gt;In a project with dozens of files, that adds up.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This applies to any file the agent might discover: docs, scripts, configs.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Now the four you've heard before - but with a &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Use headers
&lt;/h3&gt;

&lt;p&gt;Agents scan headers the way developers scan a README: as a table of contents. A header says "&lt;strong&gt;new topic, reset attention.&lt;/strong&gt;"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before&lt;/span&gt;
The project uses TypeScript with strict mode enabled. For testing we
use vitest. The CI pipeline runs on GitHub Actions.

&lt;span class="gh"&gt;# After&lt;/span&gt;
&lt;span class="gu"&gt;## Language&lt;/span&gt;

TypeScript with strict mode enabled.

&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`npx vitest`&lt;/span&gt; — run from project root

&lt;span class="gu"&gt;## CI&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`.github/workflows/`&lt;/span&gt; — GitHub Actions

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One topic per header. The agent navigates to the right section instead of parsing the whole paragraph. Without headers, every instruction competes with every other instruction for attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Put commands in code blocks
&lt;/h3&gt;

&lt;p&gt;Commands in prose get read as descriptions. Commands in code blocks get treated as executable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before&lt;/span&gt;
You can run the linter by running npm run lint and the tests
by running npm test.

&lt;span class="gh"&gt;# After&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`npm run lint`&lt;/span&gt; — check for issues
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`npm test`&lt;/span&gt; — run test suite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do nothing else from this post, wrap your commands in backticks. It's the single highest-impact change - &lt;strong&gt;a command in a code fence is a command. A command in a sentence is a suggestion&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Use standard section names
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;## Testing&lt;/code&gt; gets recognized instantly. &lt;code&gt;## Quality Assurance Verification Process&lt;/code&gt; doesn't.&lt;/p&gt;

&lt;p&gt;Agents have been trained on millions of README files. They know what &lt;code&gt;## Testing&lt;/code&gt;, &lt;code&gt;## Commands&lt;/code&gt;, &lt;code&gt;## Structure&lt;/code&gt;, and &lt;code&gt;## Conventions&lt;/code&gt; mean. Those names carry built-in context.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instead of&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Quality Assurance&lt;/td&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Development Guidelines&lt;/td&gt;
&lt;td&gt;Conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational Instructions&lt;/td&gt;
&lt;td&gt;Commands&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety and Compliance&lt;/td&gt;
&lt;td&gt;Boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project Organization&lt;/td&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The familiar name is the signal. The creative name is noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Make instructions actionable
&lt;/h3&gt;

&lt;p&gt;"Follow best practices" is not an instruction. "&lt;em&gt;Use ruff for formatting, run before committing&lt;/em&gt;" is.&lt;/p&gt;

&lt;p&gt;The test: could an agent execute this instruction right now, without asking a clarifying question? If not, it's too vague.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before&lt;/span&gt;
Make sure code quality is maintained and follows our standards.

&lt;span class="gh"&gt;# After&lt;/span&gt;
&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Format with &lt;span class="sb"&gt;`ruff format`&lt;/span&gt; before committing
&lt;span class="p"&gt;-&lt;/span&gt; Type annotations on all public functions
&lt;span class="p"&gt;-&lt;/span&gt; No &lt;span class="sb"&gt;`print()`&lt;/span&gt; in production code — use &lt;span class="sb"&gt;`logging`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every instruction should pass the "act on it immediately" test. If it can't be acted on, it's a wish, not an instruction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compound effect
&lt;/h2&gt;

&lt;p&gt;Each rule alone is a small improvement. Together, they're multiplicative - not because the rules add up, but because they reinforce each other. Headers create sections. Sections hold code blocks. Code blocks contain actionable commands. Rationale explains why. Descriptive file names route attention to the right file. Shallow hierarchy keeps everything findable.&lt;/p&gt;

&lt;p&gt;Here's a realistic before/after applying all seven:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;This project is a Python CLI tool. We use pytest for testing and ruff
for linting. Make sure to run tests before you commit anything. The
source code is in src/myapp and tests are in tests/. Don't modify
anything in the dist/ folder because that's generated. Also we have
some rules about how to write tests — they should test behavior not
implementation details, and use parametrize instead of writing lots
of individual test functions that do the same thing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`pytest`&lt;/span&gt; — run from project root before every commit
&lt;span class="p"&gt;-&lt;/span&gt; Test behavior, not implementation — assert on outcomes, not internal calls
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`@pytest.mark.parametrize`&lt;/span&gt; when cases share the same assertion shape

&lt;span class="gu"&gt;## Formatting&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`ruff check --fix &amp;amp;&amp;amp; ruff format`&lt;/span&gt;

&lt;span class="gu"&gt;## Structure&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Source: &lt;span class="sb"&gt;`src/myapp/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Tests: &lt;span class="sb"&gt;`tests/`&lt;/span&gt;

&lt;span class="gu"&gt;## Boundaries&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`dist/`&lt;/span&gt; — generated, do not modify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same information. Half the words. Every instruction lands.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to reformat
&lt;/h2&gt;

&lt;p&gt;If you notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent apologizes for missing an instruction that's in your file&lt;/li&gt;
&lt;li&gt;The same rule gets violated in consecutive sessions&lt;/li&gt;
&lt;li&gt;You keep adding more words to an instruction hoping the agent will "get it"&lt;/li&gt;
&lt;li&gt;Your CLAUDE.md is one long section with no headers&lt;/li&gt;
&lt;li&gt;Commands appear in sentences instead of code blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your instructions don't need more content. They need more structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The connection to /bootstrap
&lt;/h2&gt;

&lt;p&gt;In the previous posts we built the delivery system: &lt;code&gt;backbone.yml&lt;/code&gt; maps the project, Mermaid draws the workflows, &lt;code&gt;/bootstrap&lt;/code&gt; loads both in seconds. That's the &lt;em&gt;orientation&lt;/em&gt; layer - the agent knows where it is and how things work.&lt;/p&gt;

&lt;p&gt;This is about &lt;strong&gt;attention budget allocation&lt;/strong&gt;. The agent has a limited context window. What matters isn't just what's in it — it's how the agent decides what's relevant at each step. Structure is what makes your instructions win that competition.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Orientation without compliance means the agent knows your project but ignores your rules. Compliance without orientation means the agent follows instructions but works in the wrong place. You need both.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open your CLAUDE.md (or whatever instruction file your agent reads)&lt;/li&gt;
&lt;li&gt;Find the longest prose paragraph&lt;/li&gt;
&lt;li&gt;Break it: one header per topic, one code block per command, one sentence of rationale per prohibition&lt;/li&gt;
&lt;li&gt;Run your agent on the same task you ran yesterday&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The instructions didn't change. The signal did.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;Don't just write more instructions. Format the ones you have.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>agents</category>
      <category>ai</category>
      <category>documentation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why /bootstrap should be the first Command in every Agent session</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 24 Feb 2026 12:39:23 +0000</pubDate>
      <link>https://dev.to/cleverhoods/why-bootstrap-should-be-the-first-command-in-every-agent-session-4jg2</link>
      <guid>https://dev.to/cleverhoods/why-bootstrap-should-be-the-first-command-in-every-agent-session-4jg2</guid>
      <description>&lt;p&gt;After a 2.5 hour session you accidentally close your coding agent terminal mid session. The output is there, the commits are there, but something important is gone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;That synergy that you spent hours to build up.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You reopen the console and hope you two can start over, but it feels like now you are strangers. The agent is now "&lt;em&gt;Somebody that you used to know.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;No, this is not an intro of a light love novel, it's the usual experience with coding agents. Coding agents are stateless by design so each and every new session is a new beginning.&lt;/p&gt;

&lt;h2&gt;
  
  
  The resume illusion
&lt;/h2&gt;

&lt;p&gt;Some agents have &lt;code&gt;--resume&lt;/code&gt; functionality. Claude Code has it. Codex has it. Gemini CLI has it. It's useful, but it has limitations.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--resume&lt;/code&gt; only &lt;strong&gt;replays&lt;/strong&gt; the conversation log. It doesn't restore the loaded and curated mental model - the understanding of your project's topology, constraints, and current state that the agent built up over those 2.5 hours.&lt;/p&gt;

&lt;p&gt;Resume gives you only the transcript. Not the understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two primitives I already had
&lt;/h2&gt;

&lt;p&gt;Over the last few weeks I wrote about two separate ideas:&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;The backbone.yml Pattern&lt;/a&gt;, I introduced a YAML manifest that maps your project's topology - agents, directories, configs, schemas. &lt;strong&gt;Information.&lt;/strong&gt; The agent reads it once and knows where everything is. No more exploration tax.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;Mermaid for Workflows&lt;/a&gt;, I showed how flowcharts give agents reliable step-by-step processes to follow. &lt;strong&gt;Process.&lt;/strong&gt; Structured syntax that sticks out in a context window full of prose, backed by research showing agents follow flowcharts more reliably than natural language.&lt;/p&gt;

&lt;p&gt;Backbone tells the agent &lt;em&gt;what exists&lt;/em&gt;. Workflows tell the agent &lt;em&gt;how to operate&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But I was using them separately. I'd tell Claude "read the backbone" at session start, then invoke workflows as needed. Manual orchestration. Every session, same ritual. &lt;/p&gt;

&lt;p&gt;Why am I doing this separately? &lt;strong&gt;Isn't context just &lt;em&gt;Information&lt;/em&gt; + &lt;em&gt;Process&lt;/em&gt; ?&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Read the map. Follow the process. Produce a working mental model. Every session, one command.&lt;/p&gt;

&lt;p&gt;That's &lt;code&gt;/bootstrap&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What /bootstrap does
&lt;/h2&gt;

&lt;p&gt;One command. Two modes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First run&lt;/strong&gt; (no backbone exists): scans the project, detects agents and structure, generates a &lt;code&gt;backbone.yml&lt;/code&gt;, then synthesizes a context report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every subsequent run&lt;/strong&gt; (backbone exists): reads the backbone, maps agents, loads constraints, checks project state, and produces a mental model.&lt;/p&gt;

&lt;p&gt;Both modes use the diagram + prose combo from the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;mermaid post&lt;/a&gt; - flowcharts for the branching, prose for the reasoning behind each step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lb18lctptwks9ktwug7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lb18lctptwks9ktwug7.png" alt="Bootstrap workflow" width="431" height="1291"&gt;&lt;/a&gt;&lt;/p&gt;
Bootstrap workflow



&lt;p&gt;The output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bootstrap complete.

Project: my-app v1.2.0 (branch: feature/auth)
Agents: claude (CLAUDE.md), copilot (.github/copilot-instructions.md)
Structure: src/, tests/, docs/, config/

Navigation:
  Agent config → backbone.agents.{agent}
  Project dirs → backbone.paths.{key}
  Schemas      → backbone.schemas.{name}

Operations:
  Build  → npm run build
  Test   → npm test
  Deploy → ./scripts/deploy.sh

Constraints:
  - Never modify config/production.yml directly
  - Always run tests before committing

State: v1.2.0, 3 unreleased changes (auth module)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, the agent knows where things are, how to operate, what's off limits, and what's in progress. No exploration. No guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Seed mode: the smart first run
&lt;/h2&gt;

&lt;p&gt;Most bootstrapping tools drop a blank template and say "fill this in." That's 0% useful on day one.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/bootstrap&lt;/code&gt; scans first, generates second. It detects agents across the ecosystem:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.cursorrules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.windsurfrules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Windsurf&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.clinerules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.aider*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.continue/config.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Continue&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It maps directories, finds configs, detects build/test workflows from &lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;Makefile&lt;/code&gt;, CI configs. The generated backbone is 70-80% correct from the scan alone.&lt;/p&gt;

&lt;p&gt;The remaining 20% - semantic connections, domain concepts - gets marked with &lt;code&gt;# TODO: refine&lt;/code&gt; so you know exactly where to invest review time. Verified topology. Flagged guesses. One command.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill structure
&lt;/h2&gt;

&lt;p&gt;I built this as an &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;Agent Skill&lt;/a&gt; - the open standard for packaging reusable instructions across agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bootstrap/
  SKILL.md              # Entry point - frontmatter + instructions
  workflows/
    seed.md             # Scan + generate (mermaid flowchart)
    bootstrap.md        # Read + synthesize (mermaid flowchart)
  templates/
    backbone.yml        # Starter backbone shape
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the two primitives? The &lt;code&gt;templates/backbone.yml&lt;/code&gt; is the information layer from the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;backbone post&lt;/a&gt;. The &lt;code&gt;workflows/*.md&lt;/code&gt; files are the process layer from the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;mermaid post&lt;/a&gt; - complete with flowcharts, key decisions, and edge cases.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/bootstrap&lt;/code&gt; is their love child. One skill that reads both primitives and turns them into a loaded context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-agent by design
&lt;/h2&gt;

&lt;p&gt;The SKILL.md format is an open standard created by Anthropic and now adopted by OpenAI, Google, Cursor, and others. A skill authored once works across 30+ agents - the format is filesystem-based, not API-dependent.&lt;/p&gt;

&lt;p&gt;Drop the &lt;code&gt;bootstrap/&lt;/code&gt; folder into &lt;code&gt;.claude/skills/&lt;/code&gt; for Claude Code, &lt;code&gt;.agents/skills/&lt;/code&gt; for Codex CLI, or wherever your agent looks. Same skill, same result.&lt;/p&gt;

&lt;p&gt;This matters because the bootstrap concept isn't Claude-specific. Every coding agent is stateless. Every agent benefits from a loaded mental model at session start. The problem is universal, so the solution should be too.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes after bootstrap
&lt;/h2&gt;

&lt;p&gt;Before bootstrap, every session starts with the agent exploring. After bootstrap, every session starts with the agent &lt;em&gt;understanding&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No more &lt;code&gt;find&lt;/code&gt; / &lt;code&gt;ls&lt;/code&gt; / &lt;code&gt;grep&lt;/code&gt; loops&lt;/strong&gt; to discover what the backbone already maps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No more wrong assumptions&lt;/strong&gt; about where configs live&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No more repeated corrections&lt;/strong&gt; - "no, the tests are in &lt;code&gt;spec/&lt;/code&gt;, not &lt;code&gt;tests/&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No more context poisoning&lt;/strong&gt; from exploration artifacts cluttering the window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent reads the backbone, follows the workflow, synthesizes the context, and starts working. Every session. In seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The progression
&lt;/h2&gt;

&lt;p&gt;Looking back at this series, the progression is clear:&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm"&gt;capability levels post&lt;/a&gt; - what maturity looks like for instruction files.&lt;br&gt;
In the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;backbone.yml post&lt;/a&gt; - give the agent a map (information).&lt;br&gt;
In the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;mermaid post&lt;/a&gt; - give the agent reliable processes (workflows).&lt;br&gt;
Now - combine both into a single command that loads a mental model.&lt;/p&gt;

&lt;p&gt;Map + Process = Understanding. That's the whole idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The bootstrap skill will be published as a cross-agent compatible Agent Skill in the &lt;a href="https://github.com/reporails/skills" rel="noopener noreferrer"&gt;Reporails skills repo&lt;/a&gt; this week.&lt;/p&gt;

&lt;p&gt;In the meantime, the pattern works even without the skill:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;code&gt;backbone.yml&lt;/code&gt; mapping your project (&lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;template here&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Add a workflow with a mermaid flowchart for session initialization (&lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;approach here&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Start every session with: "Load the backbone, follow the bootstrap workflow, and tell me what you understand"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's manual bootstrap. The skill just makes it &lt;code&gt;/bootstrap&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't start a session. Bootstrap it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;This post is part of the &lt;a href="https://dev.to/cleverhoods/series/35305"&gt;Reporails series&lt;/a&gt;. Previous: &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;Mermaid for Workflows&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
      <category>architecture</category>
    </item>
    <item>
      <title>CLAUDE.md Best Practices: Mermaid for Workflows</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 17 Feb 2026 12:04:57 +0000</pubDate>
      <link>https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb</link>
      <guid>https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I picture says a thousand words. I wanted to see my system.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not the code. I wanted to see the &lt;strong&gt;workflows&lt;/strong&gt;. What happens when a rule gets validated. What happens when a session starts. What happens when compaction triggers. Systems are workflows, and I couldn't see mine.&lt;/p&gt;

&lt;p&gt;I had them written down, of course. Prose paragraphs in CLAUDE.md/SKILL.md or RULES describing each process step by step. But past four or five steps with branching, the prose became unreadable. I'd write it, come back a week later, and need to re-parse the whole thing to understand what I'd written. Mental overload, every time.&lt;/p&gt;

&lt;p&gt;My coding agent had the same problem. Research calls it "&lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;lost in the middle&lt;/a&gt;" - LLMs perform best with information at the beginning and end of their context, and significantly worse with information buried in the middle. My prose workflows were exactly that: critical branching logic buried in paragraphs, sandwiched between other instructions. Claude would miss steps. Skip branches. Drift from the intended process.&lt;/p&gt;

&lt;p&gt;And the workflows themselves drifted too. I'd remove a pipeline phase and update one paragraph but miss another. Prose makes that invisible - three sentences can reference a removed step and nothing looks broken.&lt;/p&gt;

&lt;p&gt;So I rewrote my workflows as Mermaid diagrams. And three things happened at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;I could see the system.&lt;/strong&gt; Rendered Mermaid gives you a visual map of what's happening - for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude followed them more reliably.&lt;/strong&gt; Structured syntax sticks out in a context window full of prose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They stopped rotting.&lt;/strong&gt; You can't leave a dangling arrow in a flowchart the way you can leave a stale sentence in a paragraph.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Turns out there's research backing all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FlowBench&lt;/strong&gt; (&lt;a href="https://arxiv.org/abs/2406.14884" rel="noopener noreferrer"&gt;Xiao et al., EMNLP 2024&lt;/a&gt;) tested how LLM agents perform when given the same workflow knowledge in different formats - natural language, pseudo-code, and flowcharts. Across 51 scenarios on GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flowcharts achieved the best trade-off for agent performance&lt;/li&gt;
&lt;li&gt;Combining formats (text + code + flowcharts) outperformed any single format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Format matters. It measurably affects how well the agent follows your instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to convert
&lt;/h2&gt;

&lt;p&gt;Not everything benefits equally from a diagram. The rule:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If it has branches, it needs a diagram. If it has judgment, it also needs prose. Most real workflows need both.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Deterministic pipelines - CI/CD, deployment, validation, review workflows - are pure flowchart territory. Every step has a defined outcome, every branch has a condition.&lt;/p&gt;

&lt;p&gt;But most workflows aren't purely deterministic. They have branching &lt;em&gt;and&lt;/em&gt; judgment: "if the tests fail with a type error, fix inline; if it's a logic error, rethink the approach." The diagram captures the branch. The prose below it captures the judgment. Neither format alone carries both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before and after
&lt;/h2&gt;

&lt;p&gt;Here's what my rule validation workflow looked like before - prose only, describing the same process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Rule Validation&lt;/span&gt;

Run validation on all rules. For each rule, first validate the
schema (fields, types, format). If that passes, check the contract
(.md and .yml matching). If the contract is valid, resolve template
variables and run OpenGrep validation on pattern syntax. If OpenGrep
returns exit 2 or 7, report the error. If it returns 0 or 1,
the rule passes. After all rules are checked, output a summary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's what the Mermaid version looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    START([/validate-rules options]) --&amp;gt; COLLECT[Collect rules from paths]
    COLLECT --&amp;gt; LOOP[For each rule]
    LOOP --&amp;gt; SCHEMA[1. Schema validation&amp;lt;br/&amp;gt;Fields, types, format]
    SCHEMA --&amp;gt;|fail| REPORT
    SCHEMA --&amp;gt;|pass| CONTRACT[2. Contract validation&amp;lt;br/&amp;gt;.md and .yml matching]
    CONTRACT --&amp;gt;|fail| REPORT
    CONTRACT --&amp;gt;|pass| RESOLVE[Resolve template variables]
    RESOLVE --&amp;gt; OPENGREP[3. OpenGrep validation&amp;lt;br/&amp;gt;Pattern syntax]
    OPENGREP --&amp;gt;|exit 2 or 7| REPORT
    OPENGREP --&amp;gt;|exit 0 or 1| REPORT[Report results]
    REPORT --&amp;gt; NEXT{More rules?}
    NEXT --&amp;gt;|yes| LOOP
    NEXT --&amp;gt;|no| SUMMARY[Summary output]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhocw4w3crfqdbwndsul7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhocw4w3crfqdbwndsul7.png" alt="Rendered Mermaid workflow from Reporails rule validation" width="800" height="1222"&gt;&lt;/a&gt;&lt;/p&gt;
Rendered Mermaid workflow from Reporails rule validation



&lt;p&gt;Same information. But the flowchart makes every branch explicit and every failure path visible. Claude can't accidentally skip a validation step or misinterpret which exit codes mean failure.&lt;/p&gt;

&lt;p&gt;But the diagram alone is still only half the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The combo: diagram + prose
&lt;/h2&gt;

&lt;p&gt;FlowBench's strongest finding wasn't "use flowcharts" - it was "combine formats." Each format carries what it's best at.&lt;/p&gt;

&lt;p&gt;Here's what one of my actual workflows looks like after conversion - &lt;a href="https://github.com/reporails/rules/blob/main/.shared/workflows/rule-validation.md" rel="noopener noreferrer"&gt;rule-validation.md&lt;/a&gt; from Reporails:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Rule Validation Workflow&lt;/span&gt;

​mermaid
flowchart TD
    START([/validate-rules options]) --&amp;gt; COLLECT[Collect rules from paths]
    COLLECT --&amp;gt; LOOP[For each rule]
    LOOP --&amp;gt; SCHEMA[1. Schema validation&lt;span class="nt"&gt;&amp;lt;br/&amp;gt;&lt;/span&gt;Fields, types, format]
    SCHEMA --&amp;gt;|fail| REPORT
    SCHEMA --&amp;gt;|pass| CONTRACT[2. Contract validation&lt;span class="nt"&gt;&amp;lt;br/&amp;gt;&lt;/span&gt;.md and .yml matching]
    CONTRACT --&amp;gt;|fail| REPORT
    CONTRACT --&amp;gt;|pass| RESOLVE[Resolve template variables]
    RESOLVE --&amp;gt; OPENGREP[3. OpenGrep validation&lt;span class="nt"&gt;&amp;lt;br/&amp;gt;&lt;/span&gt;Pattern syntax]
    OPENGREP --&amp;gt;|exit 2 or 7| REPORT
    OPENGREP --&amp;gt;|exit 0 or 1| REPORT[Report results]
    REPORT --&amp;gt; NEXT{More rules?}
    NEXT --&amp;gt;|yes| LOOP
    NEXT --&amp;gt;|no| SUMMARY[Summary output]
​

&lt;span class="gu"&gt;## Why Three Layers in This Order&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**Schema validation**&lt;/span&gt; catches structural errors (missing fields, wrong
   types) with zero external dependencies. Cheapest check - filters out
   rules that would cause confusing downstream failures.
&lt;span class="p"&gt;
2.&lt;/span&gt; &lt;span class="gs"&gt;**Contract validation**&lt;/span&gt; confirms that rule.md and rule.yml agree.
   Catches the class of bugs where one file was updated but the other
   wasn't. Requires both files to be schema-valid first.
&lt;span class="p"&gt;
3.&lt;/span&gt; &lt;span class="gs"&gt;**OpenGrep validation**&lt;/span&gt; runs actual patterns against the syntax
   checker. Most expensive step - requires template resolution, file I/O,
   agent config loading. Only runs on rules that are already structurally
   sound.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The diagram shows the three-step pipeline with its branches. The prose explains &lt;em&gt;why&lt;/em&gt; that ordering - cheapest first, most expensive last, each layer depending on the previous one being clean. Neither format alone carries both the flow and the reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to adopt this
&lt;/h2&gt;

&lt;p&gt;If your CLAUDE.md has any of these, you have a flowchart waiting to happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"First do X. If X passes, do Y. If Y fails, do Z."&lt;/li&gt;
&lt;li&gt;"Run A, then B, then C. If any step fails, stop."&lt;/li&gt;
&lt;li&gt;"Check for X. If found, do Y. Otherwise, do Z."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sequential steps with conditions = flowchart. Convert those, leave everything else as prose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Find a workflow in your CLAUDE.md that reads like a recipe with conditions&lt;/li&gt;
&lt;li&gt;Rewrite the control flow as Mermaid&lt;/li&gt;
&lt;li&gt;Keep the rationale and judgment calls as prose below the diagram&lt;/li&gt;
&lt;li&gt;Delete the original prose-only version&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One converted workflow. See if Claude follows it more reliably - and enjoy being able to &lt;em&gt;see&lt;/em&gt; your system for the first time.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't describe the path. Draw the map.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;*The FlowBench paper is at &lt;a href="https://arxiv.org/abs/2406.14884" rel="noopener noreferrer"&gt;arxiv.org/abs/2406.14884&lt;/a&gt;. The "lost in the middle" paper is at &lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;arxiv.org/abs/2307.03172&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm building instruction file governance at &lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt; - this finding led to a new rule category (Context Quality) that I'll cover in the next post.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previous in series: &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;The backbone.yml Pattern&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Reporails: Copilot adapter, built with copilot, for copilot.</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Mon, 16 Feb 2026 07:54:28 +0000</pubDate>
      <link>https://dev.to/cleverhoods/reporails-copilot-adapter-built-with-copilot-for-copilot-2gfo</link>
      <guid>https://dev.to/cleverhoods/reporails-copilot-adapter-built-with-copilot-for-copilot-2gfo</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/reporails" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt; is a validator for AI agent instruction files: CLAUDE.md, AGENTS.md, copilot-instructions.md. It scores your files, tells you what's missing, and helps you fix it.&lt;/p&gt;

&lt;p&gt;The project already supported Claude Code and Codex. For this challenge, I added &lt;strong&gt;GitHub Copilot CLI as a first-class supported agent&lt;/strong&gt; - using Copilot CLI itself to build the adapter.&lt;/p&gt;

&lt;p&gt;The architecture was already multi-agent by design. A &lt;code&gt;.shared/&lt;/code&gt; directory holds agent-agnostic workflows and knowledge. Each agent gets its own adapter that wires into the shared content. Claude does it through &lt;code&gt;.claude/skills/&lt;/code&gt;, Copilot through &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Adding Copilot took &lt;strong&gt;113 lines&lt;/strong&gt;. Not because the work was trivial - but because the architecture was ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repos:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLI: &lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;reporails/cli&lt;/a&gt; (v0.3.0)&lt;/li&gt;
&lt;li&gt;Rules: &lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;reporails/rules&lt;/a&gt; (v0.4.0)&lt;/li&gt;
&lt;li&gt;Recommended: &lt;a href="https://github.com/reporails/recommended" rel="noopener noreferrer"&gt;reporails/recommended&lt;/a&gt; (v0.2.0)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;After adding Copilot support, each agent gets its own rule set with no cross-contamination:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Rules&lt;/th&gt;
&lt;th&gt;Breakdown&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;30 CORE - 1 excluded + 0 COPILOT-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;30 CORE - 1 excluded + 10 CLAUDE-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;30 CORE + 7 CODEX-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Run it yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @reporails/cli check &lt;span class="nt"&gt;--agent&lt;/span&gt; copilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It understood the architecture immediately
&lt;/h3&gt;

&lt;p&gt;I explained the &lt;code&gt;.shared/&lt;/code&gt; folder — that it was created specifically so both Claude and Copilot (and other agents) can reference the same workflows and knowledge without duplication. Copilot got it on the first exchange:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoefznpb6t0hjh8l7old.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoefznpb6t0hjh8l7old.png" alt="Copilot understanding .shared/ architecture" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;
Copilot understanding .shared/ architecture



&lt;p&gt;The key insight it surfaced: "The .shared/ content is already agent-agnostic. Both agents reference the same workflows. No duplication is needed - just different entry points."&lt;/p&gt;

&lt;p&gt;That's exactly right. Claude reaches shared workflows through &lt;code&gt;/generate-rule&lt;/code&gt; → &lt;code&gt;.claude/skills/&lt;/code&gt; → &lt;code&gt;.shared/workflows/rule-creation.md&lt;/code&gt;. Copilot reads instructions → &lt;code&gt;.shared/workflows/rule-creation.md&lt;/code&gt;. Same destination, different front doors.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it built
&lt;/h3&gt;

&lt;p&gt;Copilot created the full adapter in three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Foundation&lt;/strong&gt; - &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;, &lt;code&gt;agents/copilot/config.yml&lt;/code&gt;, updated &lt;code&gt;backbone.yml&lt;/code&gt;, verified test harness supports &lt;code&gt;--agent copilot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Wiring&lt;/strong&gt; - entry points in copilot-instructions.md, context-specific conditional instructions, wired to &lt;code&gt;.shared/workflows/&lt;/code&gt; and &lt;code&gt;.shared/knowledge/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt; - updated README and CONTRIBUTING with agent-agnostic workflow guidance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrcu4g25up7ii012xtx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrcu4g25up7ii012xtx6.png" alt="Copilot Contribution Parity Complete" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;
Copilot Contribution Parity Complete



&lt;h3&gt;
  
  
  The bug it found (well, helped find)
&lt;/h3&gt;

&lt;p&gt;While testing the Copilot adapter, I discovered that the test harness had a cross-contamination bug. When running &lt;code&gt;--agent copilot&lt;/code&gt;, it was testing CODEX rules too — because &lt;code&gt;_scan_root()&lt;/code&gt; scanned ALL &lt;code&gt;agents/*/rules/&lt;/code&gt; directories indiscriminately.&lt;/p&gt;

&lt;p&gt;The fix was three lines of Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# If agent is specified, only scan that agent's rules directory
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;agent_dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2wqbd0tqkpiet99eyvn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2wqbd0tqkpiet99eyvn.png" alt="Test Harness Agent Isolation Fix" width="800" height="383"&gt;&lt;/a&gt;Test Harness Agent Isolation Fix&lt;/p&gt;

&lt;h3&gt;
  
  
  The model selector surprise
&lt;/h3&gt;

&lt;p&gt;When I opened the Copilot CLI model selector, the default model was &lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt;. The irony of building a Copilot adapter using Copilot CLI running Claude was not lost on me.&lt;/p&gt;

&lt;h3&gt;
  
  
  What worked, honestly
&lt;/h3&gt;

&lt;p&gt;Copilot CLI understood multi-agent architecture without hand-holding. It generated correct config files matching existing adapter patterns. The co-author signature was properly included in all commits. It didn't try to duplicate content that was already shared - it just wired the entry points.&lt;/p&gt;

&lt;p&gt;The whole experience reinforced something I've been thinking about: the tool matters less than the architecture underneath. If your project is structured well, any competent agent can extend it. That's the whole point of reporails - making sure your instruction files are good enough that the agent can actually help you.&lt;/p&gt;

&lt;h3&gt;
  
  
  What also happened during this challenge
&lt;/h3&gt;

&lt;p&gt;While building the Copilot adapter, I also rebuilt the entire rules framework from scratch. Went from 47 rules (v0.3.1) to 35 rules (v0.4.0) - fewer rules, dramatically higher quality. Every rule is now distinct, detectable, and backed by evidence. But that's a story for another post.&lt;/p&gt;




&lt;p&gt;Try it: &lt;code&gt;npx @reporails/cli check&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/reporails" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://dev.to/cleverhoods"&gt;Previous posts&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>CLAUDE.md Best Practices: The backbone.yml Pattern</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 10 Feb 2026 12:31:44 +0000</pubDate>
      <link>https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi</link>
      <guid>https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi</guid>
      <description>&lt;p&gt;There's a Dutch scouting tradition called "dropping." Kids get driven to an unfamiliar forest at night - sometimes blindfolded - and have to find their way back to camp. It builds independence, problem-solving, resilience.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;That's what most people do to their AI agents.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Drop them in a codebase. No orientation. Figure it out. (&lt;em&gt;Veel succes en heel gezellig&lt;/em&gt;, as the Dutch would say.)&lt;/p&gt;

&lt;p&gt;The difference is that, unlike people, the AI Agent memory goes as far as context allows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;find &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.yml"&lt;/span&gt; &lt;span class="nt"&gt;-type&lt;/span&gt; f
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"config"&lt;/span&gt; &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.md"&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; .claude/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent explores. Makes wrong assumptions. Gets corrected. Tries again. Eventually finds what it needs, or doesn't and quietly poison context.&lt;/p&gt;

&lt;p&gt;I call this the &lt;strong&gt;exploration tax&lt;/strong&gt; - the &lt;strong&gt;tokens&lt;/strong&gt; and &lt;strong&gt;time&lt;/strong&gt; spent orienting instead of working.&lt;/p&gt;

&lt;h2&gt;
  
  
  Give the agent a map
&lt;/h2&gt;

&lt;p&gt;The fix is simple: one file that maps your project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backbone.yml&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;

&lt;span class="na"&gt;structure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;config/&lt;/span&gt;
  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
  &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tests/&lt;/span&gt;
  &lt;span class="na"&gt;docs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs/&lt;/span&gt;

&lt;span class="na"&gt;conventions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.test.ts"&lt;/span&gt;
  &lt;span class="na"&gt;config_format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yaml&lt;/span&gt;

&lt;span class="na"&gt;boundaries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;never_modify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;migrations/&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;vendor/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's enough to start. Claude reads this once and knows: config lives in &lt;code&gt;config/&lt;/code&gt;, tests are &lt;code&gt;*.test.ts&lt;/code&gt;, never touch &lt;code&gt;.env&lt;/code&gt; or &lt;code&gt;migrations/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;No more exploration loops. No more wrong guesses. No more "sorry, I thought the config was in the root directory."&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling up
&lt;/h2&gt;

&lt;p&gt;As your project grows, so can your backbone. Here's what mine looks like for &lt;a href="https://github.com/reporails/rules/blob/main/.reporails/backbone.yml" rel="noopener noreferrer"&gt;Reporails rules&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;claude&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;main_instruction_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLAUDE.md&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents/claude/config.yml&lt;/span&gt;
    &lt;span class="na"&gt;skills&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.claude/skills/&lt;/span&gt;
    &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.claude/tasks/&lt;/span&gt;
  &lt;span class="na"&gt;codex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents/codex/config.yml&lt;/span&gt;

&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;core&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/&lt;/span&gt;
  &lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents/&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;rule_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{category}/{slug}/"&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rule.md"&lt;/span&gt;
    &lt;span class="na"&gt;test_pass&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/pass/"&lt;/span&gt;
    &lt;span class="na"&gt;test_fail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/fail/"&lt;/span&gt;
  &lt;span class="na"&gt;categories&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;structure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/structure/&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/content/&lt;/span&gt;
    &lt;span class="na"&gt;efficiency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/efficiency/&lt;/span&gt;
    &lt;span class="na"&gt;maintenance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/maintenance/&lt;/span&gt;

&lt;span class="na"&gt;schemas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;schemas/rule.schema.yml&lt;/span&gt;
  &lt;span class="na"&gt;capability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;schemas/capability.schema.yml&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;schemas/agent.schema.yml&lt;/span&gt;

&lt;span class="na"&gt;registry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry/capabilities.yml&lt;/span&gt;
  &lt;span class="na"&gt;levels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry/levels.yml&lt;/span&gt;
  &lt;span class="na"&gt;coordinate_map&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry/coordinate-map.yml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiple agents, rule patterns, schemas, registries - all mapped. Claude can construct paths directly instead of exploring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring it up
&lt;/h2&gt;

&lt;p&gt;The backbone file alone isn't enough - you need to tell Claude to use it. Add this to your CLAUDE.md:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Initialization&lt;/span&gt;

Read these files before searching or modifying anything:
&lt;span class="p"&gt;
1.&lt;/span&gt; Read &lt;span class="sb"&gt;`backbone.yml`&lt;/span&gt; for project structure and path resolution
&lt;span class="p"&gt;2.&lt;/span&gt; Read any registries or schemas referenced there as needed
&lt;span class="p"&gt;3.&lt;/span&gt; Read &lt;span class="sb"&gt;`.claude/rules/`&lt;/span&gt; for context-specific constraints

&lt;span class="gu"&gt;## Structure&lt;/span&gt;

Defined in &lt;span class="sb"&gt;`backbone.yml`&lt;/span&gt; - the single source of truth for project topology.

&lt;span class="gs"&gt;**BEFORE**&lt;/span&gt; running &lt;span class="sb"&gt;`find`&lt;/span&gt;, &lt;span class="sb"&gt;`grep`&lt;/span&gt;, &lt;span class="sb"&gt;`ls`&lt;/span&gt;, or glob to locate project files, read &lt;span class="sb"&gt;`backbone.yml`&lt;/span&gt; first. All paths are mapped there. Do not use exploratory commands to discover paths that the backbone already provides.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the key: explicit instruction to read the map before exploring. Without it, Claude might still wander.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a separate file?
&lt;/h2&gt;

&lt;p&gt;You could put all of this directly in your CLAUDE.md. But there's a tradeoff.&lt;/p&gt;

&lt;p&gt;Everything in CLAUDE.md sits in the context window from the start - every session, every message, whether the agent needs it or not.&lt;/p&gt;

&lt;p&gt;backbone.yml is read-on-demand. Claude doesn't load it at session start - it reads it when it would otherwise start exploring. The map replaces discovery, not adds to it.&lt;/p&gt;

&lt;p&gt;There are also things a directory structure can't express:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patterns.&lt;/strong&gt; &lt;code&gt;{category}/{slug}/rule.md&lt;/code&gt; isn't a folder - it's a convention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relationships.&lt;/strong&gt; Which agent owns which config? What schema validates what file?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boundaries.&lt;/strong&gt; What's off-limits? What's deprecated?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Directories show what exists. backbone.yml shows how it fits together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost of exploration
&lt;/h2&gt;

&lt;p&gt;I tracked my Claude Code usage across 176 sessions. A significant chunk of friction came from wrong assumptions about project structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used the wrong YAML library (PyYAML instead of ruamel.yaml)&lt;/li&gt;
&lt;li&gt;Wrote changes to the wrong repo in a monorepo&lt;/li&gt;
&lt;li&gt;Assumed directories existed that didn't&lt;/li&gt;
&lt;li&gt;Missed config files that were right there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each mistake costs tokens, time, and trust. The models are smart enough - the problem is orientation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this fits
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm"&gt;previous post&lt;/a&gt;, I introduced capability levels for instruction files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L1-L2&lt;/strong&gt;: CLAUDE.md exists, has basic constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L3&lt;/strong&gt;: External references, multiple files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L4&lt;/strong&gt;: Path-scoped rules that load conditionally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L5&lt;/strong&gt;: backbone.yml - maintained structure, active upkeep&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L6&lt;/strong&gt;: Dynamic context, skills, MCP integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most setups stop at L2-3. The jump to L5 isn't about adding more rules - it's about making your existing setup navigable. backbone.yml is how you get there.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to adopt this
&lt;/h2&gt;

&lt;p&gt;Not every project needs it. Weekend hack? Basic CLAUDE.md is fine.&lt;/p&gt;

&lt;p&gt;But if you notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude repeatedly exploring the same directories&lt;/li&gt;
&lt;li&gt;Wrong assumptions about project structure&lt;/li&gt;
&lt;li&gt;Corrections like "no, the config is in X, not Y"&lt;/li&gt;
&lt;li&gt;Monorepo confusion about which repo to modify&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you're paying the exploration tax. A backbone file pays for itself in the first session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep it accurate
&lt;/h2&gt;

&lt;p&gt;A backbone.yml only works if it's true. Paths that don't resolve, patterns that don't match reality - those are worse than no map at all.&lt;/p&gt;

&lt;p&gt;Structure that rots is worse than no structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;backbone.yml&lt;/code&gt; in your project root&lt;/li&gt;
&lt;li&gt;Map your directories, configs, conventions&lt;/li&gt;
&lt;li&gt;Add the initialization section to your CLAUDE.md&lt;/li&gt;
&lt;li&gt;Watch Claude stop guessing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I use this with Claude Code daily. The pattern should work for any agent that reads instruction files - Codex, Copilot, Cursor - though I haven't tested all of them. If you try it, let me know how it goes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't drop your agent in the dark. Give it a map.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt; is where I'm building instruction file governance. The &lt;a href="https://github.com/reporails/rules/blob/main/.reporails/backbone.yml" rel="noopener noreferrer"&gt;backbone.yml example&lt;/a&gt; above is from there.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>CLAUDE.md best practices - From Basic to Adaptive</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 03 Feb 2026 12:15:28 +0000</pubDate>
      <link>https://dev.to/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm</link>
      <guid>https://dev.to/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm</guid>
      <description>&lt;blockquote&gt;
&lt;h2&gt;
  
  
  &lt;em&gt;How do you learn new things as a developer?&lt;/em&gt;
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;My take on it is to find yourself an actual project (&lt;em&gt;not tutorials&lt;/em&gt;) and start &lt;strong&gt;iterating&lt;/strong&gt;. I wanted to learn LangGraph for my SageCompass project. SageCompass is a monorepo with LangGraph + Drupal (for RAG content management) and Gradio (for UI). &lt;/p&gt;

&lt;p&gt;I iterated ... &lt;em&gt;&lt;strong&gt;a LOT&lt;/strong&gt;&lt;/em&gt;. &lt;strong&gt;&lt;em&gt;A lot lot&lt;/em&gt;&lt;/strong&gt;. &lt;a href="https://dev.to/cleverhoods/from-prompt-to-platform-architecture-rules-i-use-59gp"&gt;A lot lot lot.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After 2 months of learning the principles of managing a python project and on top of that a LangGraph project, I felt ready to start using a coding agent (Codex at that time), to reduce refactoring times. As it turned out, coding agents are working significantly more reliably, if you have strong boundaries. I had my unit test structure hammered out, directives and contract were clear and strongly defined.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;However&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;SageCompass is a monorepo. I needed some highly elevated AGENTS.md setup to manage all of them together. The LangGraph part? Tight. Contracts, test structure, clear boundaries - the agent barely needed hand-holding. The Drupal part? I've been working with Drupal for 17 years. I know what I need, but I hadn't written it down for an agent yet. The Gradio part? I was still learning it myself - &lt;em&gt;how do you write instructions for something you don't fully understand yet&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;I couldn't just have one big instruction file. Each component was at a different stage of readiness. Copy-pasting rules across them would have been worse than having no rules at all.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;That's when it hit me&lt;/strong&gt;: instruction setups have capability levels. And if they have levels, they can be measured. And if they can be measured, they can be improved systematically.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Emergence of capability levels
&lt;/h2&gt;

&lt;p&gt;When I tried to port my LangGraph rules to the Gradio component, I needed to figure out which ones were universal and which ones were specific to a well-established, contract-heavy setup. &lt;/p&gt;

&lt;p&gt;A rule like &lt;strong&gt;&lt;em&gt;'never commit .env files'&lt;/em&gt;&lt;/strong&gt; applies everywhere. A rule like &lt;strong&gt;&lt;em&gt;'implement nodes as make_node&lt;/em&gt;* &lt;em&gt;factories'&lt;/em&gt;&lt;/strong&gt; is meaningless outside LangGraph.&lt;br&gt;
That forced me to categorize. Not just what rules do, but what level of project capability they assume. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;A basic project needs different instructions than one with enforced contracts and navigation maps.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What I found?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;A starting point and Six levels. L1 to L6.&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L0  Absent      → No instruction file (The starting point)
L1  Basic       → File exists, tracked
L2  Scoped      → Project-specific constraints  
L3  Structured  → External references, modular
L4  Abstracted  → Path-scoped loading
L5  Maintained  → Structural discipline
L6  Adaptive    → Dynamic context, skills, MCP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what each one means in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  L0: Absent
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;No CLAUDE.md. No AGENTS.md. Nothing.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude works from its training data and whatever it can infer from your code. It'll guess your stack from package.json, maybe pick up patterns from existing files. But it has zero guidance about your preferences, constraints, or "never do this" rules.&lt;/p&gt;

&lt;p&gt;For quick scripts or throwaway experiments, this is fine. For anything you'll maintain, you're probably leaving value on the table.&lt;/p&gt;




&lt;h2&gt;
  
  
  L1: Basic
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
└── CLAUDE.md       ← exists
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;A file exists. It's tracked in git.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Content might be &lt;code&gt;/init&lt;/code&gt; boilerplate — the auto-generated stuff Claude Code produces. Might be a few lines you wrote yourself. The point is you've acknowledged that Claude needs context, and you've given it somewhere to live.&lt;/p&gt;

&lt;p&gt;This is the "I know this matters" stage. Most people get here quickly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude has &lt;em&gt;something&lt;/em&gt; project-specific. It knows this isn't just a random repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Rules. Claude knows &lt;em&gt;about&lt;/em&gt; your project, but not your &lt;em&gt;constraints&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L2: Scoped
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Project&lt;/span&gt;
E-commerce API, Node.js, PostgreSQL.

&lt;span class="gu"&gt;## Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; MUST use TypeScript strict mode
&lt;span class="p"&gt;-&lt;/span&gt; MUST NOT use &lt;span class="sb"&gt;`any`&lt;/span&gt; type  
&lt;span class="p"&gt;-&lt;/span&gt; MUST run tests before committing
&lt;span class="p"&gt;-&lt;/span&gt; NEVER modify migration files directly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Explicit constraints. &lt;a href="https://www.rfc-editor.org/rfc/rfc2119.html" rel="noopener noreferrer"&gt;MUSTs and MUST NOTs.&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where you stop describing and start prescribing. Not just "here's what the project is" but "here's what you can and cannot do."&lt;/p&gt;

&lt;p&gt;The language matters. "Prefer TypeScript" is a suggestion Claude might ignore. "MUST use TypeScript strict mode" is a rule it tends to follow.&lt;/p&gt;

&lt;p&gt;For small projects with simple conventions, this is often enough. You have your rules in one place. Claude follows them. Life is reasonable.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude follows &lt;em&gt;your&lt;/em&gt; rules, not just generic best practices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Scale. When the file gets long, important stuff gets lost in the noise.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L3: Structured
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

See @docs/architecture.md for system overview.
See @docs/api-conventions.md for API patterns.

&lt;span class="gu"&gt;## Constraints&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;External references. Multiple files. Content split by concern.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You've hit the point where one file isn't working anymore. So you break it up. Architecture in one place. API conventions in another. Your CLAUDE.md becomes a router pointing to the right context.&lt;/p&gt;

&lt;p&gt;This is also where team collaboration gets easier. Different people can own different files.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Separation of concerns. Easier to maintain. Each file has a job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: All files load regardless of what you're working on. Editing tests? Claude still loads your API conventions. Noisy.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L4: Abstracted
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
├── CLAUDE.md
└── .claude/
    └── rules/
        ├── api-rules.md        # paths: src/api/**
        ├── frontend-rules.md   # paths: src/components/**
        └── test-rules.md       # paths: tests/**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Path-scoped loading. Different rules for different parts of the codebase.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Edit &lt;code&gt;src/api/users.ts&lt;/code&gt;? Only API rules load. Edit &lt;code&gt;tests/user.test.ts&lt;/code&gt;? Only test rules load.&lt;/p&gt;

&lt;p&gt;This is where context efficiency gets real. You're not wasting tokens on irrelevant rules. Claude's attention stays on what matters for the task at hand.&lt;/p&gt;

&lt;p&gt;How you implement this depends on the tool. Claude Code uses &lt;code&gt;.claude/rules/&lt;/code&gt; with frontmatter. Cursor uses &lt;code&gt;.cursor/rules/&lt;/code&gt;. The concept is the same.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude adapts to &lt;em&gt;what you're working on&lt;/em&gt;, not just what project you're in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Maintenance. Structures rot. Rules go stale.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L5: Maintained
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;L4 with discipline.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Same structure, but with habits to keep it current:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A backbone file mapping the codebase, updated when things change&lt;/li&gt;
&lt;li&gt;Some way to track what's stale&lt;/li&gt;
&lt;li&gt;Regular reviews (however often makes sense for you)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference between L4 and L5 isn't features — it's upkeep. L4 is "I set this up." L5 is "I keep it working."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Reliability over time. The setup doesn't quietly rot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Dynamic capabilities. Claude follows instructions but can't extend itself.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L6: Adaptive
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
├── CLAUDE.md
├── .claude/
│   ├── rules/
│   └── skills/
│       ├── database-migrations/
│       │   └── SKILL.md
│       └── api-testing/
│           └── SKILL.md
└── mcp.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Skills that load based on task. MCP servers for external integrations.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this level, Claude doesn't just follow instructions — it loads capabilities. Working on migrations? The migration skill activates with its own context. Need to hit an external API? MCP handles it.&lt;/p&gt;

&lt;p&gt;Very few setups are here yet. The tooling is new. The patterns are still emerging.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude extends its abilities based on what it detects you're doing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Quick self-check
&lt;/h2&gt;

&lt;p&gt;Where do you land?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;If yes...&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you have any instruction file?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L1&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Does it have explicit constraints (MUST/MUST NOT)?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L2&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you use @imports or multiple files?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L3&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do different paths load different rules?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L4&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you actively maintain the structure?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L5&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you use skills or MCP?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;L6&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From what I've seen, most setups are &lt;strong&gt;L1&lt;/strong&gt; &lt;em&gt;(Basic)&lt;/em&gt; or &lt;strong&gt;L2&lt;/strong&gt; &lt;em&gt;(Scoped)&lt;/em&gt;. Some reach &lt;strong&gt;L3&lt;/strong&gt; &lt;em&gt;(Structured)&lt;/em&gt;. &lt;strong&gt;L4&lt;/strong&gt; &lt;em&gt;(abstracted)&lt;/em&gt; and above is rare - not because it's hard, but because the patterns aren't widely known yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why bother with levels?
&lt;/h2&gt;

&lt;p&gt;It's not about chasing a high score.&lt;/p&gt;

&lt;p&gt;It's about having words for things.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I'm at &lt;strong&gt;L2&lt;/strong&gt; &lt;em&gt;(Scoped)&lt;/em&gt; and wondering if &lt;strong&gt;L4&lt;/strong&gt; &lt;em&gt;(abstracted)&lt;/em&gt; is worth the effort" &lt;strong&gt;&lt;em&gt;is a conversation you can actually have.&lt;/em&gt;&lt;/strong&gt; "My CLAUDE.md is pretty good" isn't.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The right level depends on your project. A weekend hack doesn't need path scoping. A complex system with multiple domains probably does. The framework just helps you think about where you are and where you might want to go.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm building
&lt;/h2&gt;

&lt;p&gt;I'm working on a validator that uses this framework: &lt;strong&gt;&lt;em&gt;detects your level, checks structure, score your setup&lt;/em&gt;&lt;/strong&gt;. &lt;em&gt;(If you run it from Claude Code CLI, it helps you fix issues too.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's early. Like, really early. I'm still working through core level implementations. But if you want to poke at it and tell me what's broken, I'd appreciate it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reporails CLI:&lt;/strong&gt; &lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;github.com/reporails/cli&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or just use the levels as a mental model. &lt;strong&gt;&lt;em&gt;That's the real value anyway.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/rules/blob/main/docs/capability-levels.md" rel="noopener noreferrer"&gt;Capability levels docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;Rules repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>claudecode</category>
      <category>devtools</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>CLAUDE.md: Check, Score, Improve &amp; Repeat</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 27 Jan 2026 08:54:55 +0000</pubDate>
      <link>https://dev.to/cleverhoods/claudemd-lint-score-improve-repeat-2om5</link>
      <guid>https://dev.to/cleverhoods/claudemd-lint-score-improve-repeat-2om5</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;strong&gt;The missing quality checker for AI instruction files.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You asked for a small refactor. A &lt;em&gt;&lt;strong&gt;small(!)&lt;/strong&gt;&lt;/em&gt; refactor.&lt;br&gt;
Claude Code rewrote &lt;strong&gt;&lt;em&gt;half&lt;/em&gt;&lt;/strong&gt; the module.&lt;/p&gt;

&lt;p&gt;"&lt;em&gt;You're right, I apologize.&lt;/em&gt;" "&lt;em&gt;Let me fix that.&lt;/em&gt;" "&lt;em&gt;Sorry, I misunderstood.&lt;/em&gt;" — on repeat.&lt;/p&gt;

&lt;p&gt;So you open the &lt;strong&gt;CLAUDE.md&lt;/strong&gt;. Then the &lt;strong&gt;rules&lt;/strong&gt;. Then the &lt;strong&gt;SKILLS&lt;/strong&gt;. Each is  400 lines at least. 24 files total. &lt;br&gt;
You cross-reference the official docs, skim three "best practices" blog posts, dig through GitHub examples. &lt;/p&gt;

&lt;p&gt;Hours of trial and error later, you do what any reasonable person would: you ask Claude to figure it out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;───────────────────────────────────────────────────────────────────
❯ review my CLAUDE.md and rules. Tell me what is wrong.
───────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Burn ALL the tokens
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code obliges.&lt;/strong&gt; It reads all 24 files, their cross-referencing imports and all the additional relevant documentation. It neatly summarizes them. It suggests improvements, you accept them, it rewrites a few sections, adds here, removes there. &lt;/p&gt;

&lt;p&gt;It burns tokens like kindling. &lt;/p&gt;

&lt;p&gt;Your &lt;strong&gt;CLAUDE.md&lt;/strong&gt;, &lt;strong&gt;rules&lt;/strong&gt;, &lt;strong&gt;SKILLS&lt;/strong&gt; got just a bit longer, but you're fine with that — at least it won't happen again... right? This is fine. Right?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowvqb5elm4m42hdjw2v4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowvqb5elm4m42hdjw2v4.jpg" alt="This is fine" width="561" height="265"&gt;&lt;/a&gt;Everything is fine&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Forty minutes later, you have a slightly different mess and no idea if it's better. So you open &lt;em&gt;CLAUDE.md&lt;/em&gt; ...&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Lint the vibes!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Your code needs &lt;strong&gt;structure&lt;/strong&gt;, &lt;strong&gt;types&lt;/strong&gt;, &lt;strong&gt;format&lt;/strong&gt;. It has &lt;strong&gt;tests&lt;/strong&gt;, &lt;strong&gt;type checks&lt;/strong&gt; and &lt;strong&gt;linters&lt;/strong&gt;. Your AI instructions? &lt;strong&gt;&lt;em&gt;Vibes&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Reporails helps with that.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add reporails &lt;span class="nt"&gt;--&lt;/span&gt; uvx &lt;span class="nt"&gt;--from&lt;/span&gt; reporails-cli ails-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Then ask:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;───────────────────────────────────────────────────────────────────
❯ what ails claude?
───────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Runs deterministic checks and semantic validations. Produces actionable fixes Claude can apply.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcugat31q1lj56jpqtdn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcugat31q1lj56jpqtdn.png" alt="What ails claude?" width="800" height="665"&gt;&lt;/a&gt;It will tell you what can be done better&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Just run&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;───────────────────────────────────────────────────────────────────
❯ fix the duplicate   
───────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;... and observe:&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;● I&lt;span class="s1"&gt;'ll remove the duplicate from writing-rules.md since instruction-file-style.md already covers the "actionable content" requirement for .claude/rules/**.                  

● Update(.claude/rules/writing-rules.md)                                                                                                                                     
  ⎿  Removed 1 line                                                                                                                                                          
      26  - Use descriptive filenames (`api-validation.md` not `rules1.md`)                                                                                                  
      27  - Add `paths` frontmatter to reduce noise when not relevant
      28  - No paths = loads globally for all files
      29 -- Content MUST be actionable, not explanatory

● Done. Removed the duplicate "Content MUST be actionable, not explanatory" from writing-rules.md:29 — this is already covered by instruction-file-style.md:12 ("Every line
  MUST be actionable or informative") which applies to the same scope.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The result
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;No more vibes. No more gluing together &lt;em&gt;best practices&lt;/em&gt; blog posts.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;● Semantic Evaluation                                                                                                                                                        

  I already have context from the previous read. Evaluating the 9 candidates:

  M4 &lt;span class="o"&gt;(&lt;/span&gt;Contradictions&lt;span class="o"&gt;)&lt;/span&gt;: 2 candidates - all passed

  C6 &lt;span class="o"&gt;(&lt;/span&gt;Duplicates&lt;span class="o"&gt;)&lt;/span&gt;: 6 candidates - all passed &lt;span class="o"&gt;(&lt;/span&gt;the duplicate I fixed is no longer flagged&lt;span class="o"&gt;)&lt;/span&gt;

  G3 &lt;span class="o"&gt;(&lt;/span&gt;Security ownership&lt;span class="o"&gt;)&lt;/span&gt;: 1 candidate - passed &lt;span class="o"&gt;(&lt;/span&gt;organizational guidance, not a security rule&lt;span class="o"&gt;)&lt;/span&gt;

  &lt;span class="nt"&gt;---&lt;/span&gt;
  Final Score
  ┌────────────┬─────────────────────┐
  │   Metric   │        Value        │
  ├────────────┼─────────────────────┤
  │ Score      │ 8.3↑ / 10&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; │
  ├────────────┼─────────────────────┤
  │ Capability │ Governed &lt;span class="o"&gt;(&lt;/span&gt;L5+&lt;span class="o"&gt;)&lt;/span&gt;      │
  ├────────────┼─────────────────────┤
  │ Violations │ 21 &lt;span class="o"&gt;(&lt;/span&gt;↓2 from 23&lt;span class="o"&gt;)&lt;/span&gt;     │
  └────────────┴─────────────────────┘
  The C6 duplicate between instruction-file-style.md and writing-rules.md is resolved.

✻ Cogitated &lt;span class="k"&gt;for &lt;/span&gt;32s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reporails v0.0.1
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;What took hours now takes minutes.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;42 rules&lt;/a&gt;. &lt;a href="https://github.com/reporails/rules/tree/main/docs" rel="noopener noreferrer"&gt;Documented&lt;/a&gt;. &lt;a href="https://github.com/reporails/rules/blob/main/LICENSE" rel="noopener noreferrer"&gt;Open source&lt;/a&gt;. &lt;a href="https://github.com/reporails/rules/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;Easy to extend.&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/reporails-cli/" rel="noopener noreferrer"&gt;PyPI: reporails-cli&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;GitHub: CLI &amp;amp; MCP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;GitHub: Rules&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md is just the start. More agents coming soon.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>claudecode</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>devtool</category>
    </item>
    <item>
      <title>From Prompt to Platform: Architecture Rules I Use</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 20 Jan 2026 07:36:43 +0000</pubDate>
      <link>https://dev.to/cleverhoods/from-prompt-to-platform-architecture-rules-i-use-59gp</link>
      <guid>https://dev.to/cleverhoods/from-prompt-to-platform-architecture-rules-i-use-59gp</guid>
      <description>&lt;p&gt;The "&lt;em&gt;build -&amp;gt; &lt;strong&gt;surprise&lt;/strong&gt; -&amp;gt; restructure -&amp;gt; repeat&lt;/em&gt;" loop is amazing early on. However, after a while it's like two clowns trying to out-prank each other: it gets funnier and funnier, lots of laughs... until one of them pulls out a flamethrower for one last prank and the laughter gets a little awkward.&lt;/p&gt;

&lt;p&gt;This type of iteration is fun until it isn't. &lt;strong&gt;So I went looking for guidance.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Experiences With LangGraph Tutorials
&lt;/h2&gt;

&lt;p&gt;Most examples show you how to build a graph. Define some nodes. Wire them together. Ship it. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Great for prototyping.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They don't show you where to put things when you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 nodes&lt;/li&gt;
&lt;li&gt;3 agents&lt;/li&gt;
&lt;li&gt;5 tools&lt;/li&gt;
&lt;li&gt;Shared state across subgraphs&lt;/li&gt;
&lt;li&gt;Middleware for guardrails&lt;/li&gt;
&lt;li&gt;A platform layer that stays framework-independent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I searched. Found bits and pieces, but no complete picture. So I built it.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Folder Structure That Scales
&lt;/h2&gt;

&lt;p&gt;Here's what my LangGraph component looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/
├── agents/           # Agent factories (build_agent_*)
├── graphs/           # Graph definitions (main, subgraphs, phases)
├── nodes/            # Node factories (make_node_*)
├── states/           # Pydantic state models
├── tools/            # Tool definitions
├── middlewares/      # Cross-cutting concerns (guardrails, redaction)
└── platform/
    ├── core/         # Pure types, contracts, policies (no wiring)
    │   ├── contract/ # Validators: state, tools, prompts, phases
    │   ├── dto/      # Pure data transfer objects
    │   └── policy/   # Pure decision logic
    ├── adapters/     # Boundary translation (DTOs ↔ State)
    ├── runtime/      # Evidence hydration, state helpers
    ├── config/       # Environment, paths
    └── observability/# Logging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this structure?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It mirrors LangGraph's mental model: agents are agents; nodes are nodes; graphs are graphs. In the orchestration layer, things are &lt;strong&gt;easy to find&lt;/strong&gt; and responsibilities stay separated.&lt;/p&gt;

&lt;p&gt;But the real insight is the &lt;code&gt;platform/&lt;/code&gt; layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Platform Layer: Why It Exists
&lt;/h2&gt;

&lt;p&gt;While separating the LangGraph components was easy, separating the wiring was hard. The structure didn't appear on day one. It emerged after a number of iterations - each cycle surfaced a different missing architectural rule, whose absence made refactors rapidly more difficult with every new component. &lt;/p&gt;

&lt;p&gt;Without architectural rules, everything gets spaghettified:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WITHOUT PLATFORM LAYER - Everything mixed together
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;problem_framing_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SageState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Guardrail logic mixed with state management
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unsafe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guardrail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GuardrailResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;

    &lt;span class="c1"&gt;# Evidence hydration mixed with node orchestration  
&lt;/span&gt;    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_store&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;phase_entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# ... inline hydration logic
&lt;/span&gt;
    &lt;span class="c1"&gt;# Validation mixed with execution
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid state update!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ... good luck writing tests for it!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With the platform layer&lt;/strong&gt;, concerns are separated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WITH PLATFORM LAYER - Clean separation
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;problem_framing_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SageState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Use platform contracts for validation
&lt;/span&gt;    &lt;span class="nf"&gt;validate_state_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use platform runtime helpers for evidence
&lt;/span&gt;    &lt;span class="n"&gt;bundle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;collect_phase_evidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use platform policies for decisions
&lt;/span&gt;    &lt;span class="n"&gt;guardrail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate_guardrails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use adapters for state translation
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;guardrail_to_gating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;guardrail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Node only orchestrates - all logic in platform!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The node becomes what it should be: &lt;strong&gt;orchestration only&lt;/strong&gt;. No domain logic. No direct store access. No inline validation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hexagonal Split
&lt;/h2&gt;

&lt;p&gt;The pattern that solved it: &lt;a href="https://alistair.cockburn.us/hexagonal-architecture" rel="noopener noreferrer"&gt;hexagonal architecture&lt;/a&gt;. Core stays pure - no framework dependencies, no imports from the layers above. Everything else can depend on Core, but Core depends on nothing. This makes the boundaries testable and the rules enforceable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│                    APPLICATION LAYER                    │
│  (app/nodes, app/graphs, app/agents, app/middlewares)   │
│  - LangGraph orchestration                              │
│  - Calls platform services via contracts                │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│                    PLATFORM LAYER                       │
│  ┌───────────┐ ┌───────────┐ ┌─────────┐ ┌───────────┐  │
│  │  Adapters │ │  Runtime  │ │ Config  │ │Observabil.│  │
│  │DTO&amp;lt;-&amp;gt;State│ │  helpers  │ │env/paths│ │  logging  │  │
│  └─────┬─────┘ └─────┬─────┘ └────┬────┘ └─────┬─────┘  │
│        │             │            │            │        │
│        └─────────────┴──────┬─────┴────────────┘        │
│                             ▼                           │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Core (PURE - no framework dependencies)           │ │
│  │  - Contracts and validators                        │ │
│  │  - Policy evaluation (pure functions)              │ │
│  │  - DTOs (frozen dataclasses)                       │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The rule&lt;/strong&gt;: &lt;code&gt;core/&lt;/code&gt; has NO imports from anything above it - no app orchestration (agents, nodes, graphs, etc.), no wiring, no adapters. Dependencies point inward only.&lt;/p&gt;

&lt;p&gt;This isn't just a guideline. It's enforced.&lt;/p&gt;




&lt;h3&gt;
  
  
  How to enforce a guideline?
&lt;/h3&gt;

&lt;p&gt;Simple: write a test for it that would catch the violation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/unit/architecture/test_core_purity.py
&lt;/span&gt;
&lt;span class="n"&gt;FORBIDDEN_IMPORTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.graphs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.nodes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... all app orchestration and platform wiring
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_core_has_no_forbidden_imports&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Core layer must remain pure - no wiring dependencies.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;core_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app/platform/core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;rglob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;core_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;forbidden&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;FORBIDDEN_IMPORTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;forbidden&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; imports &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;forbidden&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - core must stay pure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you break the boundary, test fails. &lt;strong&gt;No exceptions.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Beyond guidelines, you can also define &lt;a href="https://en.wikipedia.org/wiki/Design_by_contract" rel="noopener noreferrer"&gt;contracts&lt;/a&gt; that validate at runtime.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Contracts That Validate
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;core/contract/&lt;/code&gt; directory contains validators that enforce contract rules at runtime:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Contract&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_state_update()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Restricts mutations to authorized owners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_structured_response()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Forces validation before persisting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_phase_registry()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ensures phase keys match declared schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_allowlist_contains_schema()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ensures tool allowlist correctness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't optional - every node calls them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Every state update goes through the contract
&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;phase_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;phase_entry&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="nf"&gt;validate_state_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;next_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The contracts themselves are also tested - validation logic, phase dependencies, invalidation cascades. See &lt;a href="https://github.com/cleverhoods/sagecompass/blob/main/langgraph/tests/unit/platform/core/contract/test_state.py" rel="noopener noreferrer"&gt;test_state.py&lt;/a&gt; for the full suite.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test Structure That Scales
&lt;/h2&gt;

&lt;p&gt;Tests are organized by type (unit, integration, e2e) and category (architecture, orchestration, platform). This makes coverage gaps obvious and lets you run targeted subsets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tests/
├── unit/
│   ├── architecture/      # Boundary enforcement
│   │   ├── test_core_purity.py
│   │   ├── test_adapter_boundary.py
│   │   └── test_import_time_construction.py
│   ├── orchestration/     # Agents, nodes, graphs
│   └── platform/          # Core + adapters
├── integration/
│   ├── orchestration/
│   └── platform/
└── e2e/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With pytest markers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pyproject.toml
# Test markers for categorizing tests by purpose and scope
&lt;/span&gt;&lt;span class="n"&gt;markers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="c1"&gt;# Test Type Markers (by scope)
&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit: Fast, isolated tests with no external dependencies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integration: Tests crossing component boundaries (may use test fixtures)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;e2e: End-to-end workflow tests (full pipeline validation)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;# Test Category Markers (organizational categories)
&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;architecture: Hexagonal architecture enforcement (import rules, layer boundaries)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orchestration: LangGraph orchestration components (agents, nodes, graphs, middlewares, tools)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform: Platform layer tests (hexagonal architecture - core, adapters, runtime)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run unit architecture tests alone: &lt;code&gt;uv run pytest -m "unit and architecture"&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The architecture is validated by 110 tests - 11 of which specifically enforce architecture boundaries.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What This Enables
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Here's where it gets interesting.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You might be thinking: &lt;em&gt;&lt;em&gt;cool story, but...&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr582q6kjk9w0cqylz04m.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr582q6kjk9w0cqylz04m.gif" alt="...but why?" width="480" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because when your architecture is predictable and enforceable, something curious happens: &lt;strong&gt;coding agents stop being a liability and start being useful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When every node follows the same pattern...&lt;br&gt;
When every state update goes through a validator...&lt;br&gt;
When every boundary is well-defined and tested...&lt;/p&gt;

&lt;p&gt;...an AI agent can't accidentally break your architecture without the tests catching it. It can't import forbidden modules. It can't skip validation. It can't bypass the contracts - not without failing the test suite.&lt;/p&gt;

&lt;p&gt;The rules become more than just documentation. They're guardrails for both humans and AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Want the Full Thing?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;46 architecture principles (tiered):&lt;/strong&gt; &lt;a href="https://github.com/cleverhoods/sagecompass/blob/main/docs/langgraph-python-architecture-principles.md" rel="noopener noreferrer"&gt;Architecture principles&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform contracts README:&lt;/strong&gt; &lt;a href="https://github.com/cleverhoods/sagecompass/blob/main/langgraph/app/platform/core/contract/README.md" rel="noopener noreferrer"&gt;Platform contracts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture tests:&lt;/strong&gt; &lt;a href="https://github.com/cleverhoods/sagecompass/tree/main/langgraph/tests/unit/architecture" rel="noopener noreferrer"&gt;Architecture tests&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Next up
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What happens when you point Claude Code at an architecture it can't break.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The CLAUDE.md file isn't just a conglomerate of instructions - it's a contract that preserves context and enforces boundaries during development.&lt;/p&gt;

&lt;p&gt;I built a framework for it with measurable results.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Coming next: The CLAUDE.md Maturity Model.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of my "From Prompt to Platform" series documenting the SageCompass build. &lt;a href="https://dev.to/cleverhoods/from-zero-to-agentic-platform-building-the-sagecompass-origin-story-series-prologue-2g3i"&gt;Start from the prologue&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>langgraph</category>
      <category>langchain</category>
      <category>python</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
