clearloop for OpenWalrus

Posted on Mar 15 • Originally published at openwalrus.xyz

How developers configure their AI agents

#ai #research #opensource #openwalrus

The CLAUDE.md that helps you ship your MVP becomes the CLAUDE.md that slows you down during refactoring. You write instructions optimized for generating code fast — "use this pattern, follow this structure, here's the tech stack" — and it works. Then you enter the optimization phase. You're not generating new code anymore. You're restructuring, deleting, rethinking. And the instructions that made Claude a great code generator now make it resist the very changes you need.

Every project goes through this. Nobody talks about it. The instruction file is the highest-leverage file in your project, and most developers set it once and forget it.

This post surveys how the community actually uses CLAUDE.md, MCP servers, skills, and hooks — with real data on adoption, maintenance patterns, and the lifecycle problem that every project hits eventually.

According to the JetBrains 2025 Developer Ecosystem Survey (24,534 developers), 62% use at least one AI coding assistant. The Stack Overflow 2025 survey puts that number at 84%. The instruction file is where the relationship between developer and agent gets defined.
[Interactive chart — see original post]

The instruction file landscape

Every major AI coding tool has its own instruction file format. They're converging, but slowly.

Feature	CLAUDE.md	.cursorrules	copilot-instructions.md	.windsurfrules	AGENTS.md
Location	`./CLAUDE.md`, `.claude/rules/`	`.cursor/rules/`	`.github/copilot-instructions.md`	Root	Root
Format	Markdown + optional YAML	Markdown + YAML frontmatter	Markdown + YAML	Plain text	Standard Markdown
Auto-loaded	Always	Always	Chat/review/agent only	Every prompt	Varies by tool
Path scoping	`paths` frontmatter	`globs` frontmatter	`applyTo` frontmatter	None	Directory hierarchy
Sub-directory scan	Yes (walks tree)	Yes	No	No	Yes
File imports	Yes (`@path` syntax)	No	No	No	No
User-level rules	Yes (global + local)	Yes	Yes (IDE settings)	No	No

The convergence point is AGENTS.md — launched jointly by Google, OpenAI, Factory, Sourcegraph, and Cursor, now stewarded by the Linux Foundation. It's standard Markdown, no special syntax, used by 60,000+ open-source projects. The community advice: "If you use multiple AI tools, put shared instructions in AGENTS.md and keep CLAUDE.md for Claude-specific features."

Claude Code has the richest hierarchy: organization-level → project-level (CLAUDE.md) → user-level (~/.claude/CLAUDE.md) → local overrides (CLAUDE.md.local) → sub-directory (src/persistence/CLAUDE.md). Each layer scopes differently and loads lazily when you enter the directory.

How people actually use CLAUDE.md

What goes in the file

Across community templates and real files, the most common sections are:

Project description (3 lines max) — what this project is
Tech stack — framework, language, database, styling, testing
Key commands — build, test, lint, deploy
Architecture overview — directory structure, key files
Code conventions — style, naming, patterns to follow
DO NOT rules — critical prohibitions ("never modify .env", "don't use class components")
References — pointers to detailed docs, not inline content

The curated collection at josix/awesome-claude-md shows real files from leading open-source projects. Template starters exist at abhishekray07/claude-md-templates and serpro69/claude-starter-kit.

How long should it be

This is the most debated question. The data:

HumanLayer recommends under 60 lines as a good benchmark
Community consensus targets under 200 lines / 2,000 tokens
Anthropic's guidance allows up to 200 lines for the auto-loaded portion
One developer's informal survey of GitHub CLAUDE.md files found roughly 10% exceed 500 lines — almost certainly too large
A Medium walkthrough showed trimming from 2,800 lines to 200 lines reduced startup tokens from 2,100 to 800 — a 62% token reduction

The constraint is fundamental: research on instruction-following at scale shows that model performance begins to degrade around 150-200 instructions — primacy effects kick in and later instructions get ignored. Claude Code's system prompt already carries significant instruction weight. Every instruction you add to CLAUDE.md competes for attention with the ones already there.

How often people update it

Anthropic's own team, led by Boris Cherny (Claude Code's creator), treats CLAUDE.md as a living document: "Anytime we see Claude do something incorrectly, we add it to the CLAUDE.md so Claude knows not to do it next time." In code reviews, he tags @.claude on coworkers' PRs to add learnings via the Claude Code GitHub Action.

Most community projects are far less disciplined. One developer proposed a seven-level maturity model (L0-L6) that, while not an official standard, is useful for self-assessment:

L0 Absent: No CLAUDE.md at all
L1 Basic: File exists, generic instructions, never updated
L2 Scoped: Project-specific conventions, occasionally tweaked
L3 Structured: Multiple scoped files, sub-directory CLAUDE.md, regular review
L4 Abstracted: Reusable patterns extracted, @path imports for modularity
L5 Maintained: Staleness tracking, regular reviews, pruning stale rules
L6 Adaptive: Hooks enforce rules, skills extend capabilities, context-aware loading [Interactive chart — see original post] No rigorous survey exists, but based on community discussion patterns (HN threads, GitHub repos, Reddit), most projects sit at L0-L2. The file exists, it helped once, and nobody touches it again until something breaks badly enough to investigate.

The hierarchy in practice

The most effective pattern from power users:

User-level (~/.claude/CLAUDE.md): Personal preferences that apply everywhere — "always use TypeScript strict mode," "prefer functional style," "never auto-commit"
Project-level (./CLAUDE.md): Team-shared config checked into git — architecture, conventions, build commands
Local overrides (CLAUDE.md.local): Personal project-specific overrides not committed — "I'm working on the auth module this week, focus context there"
Sub-directory (src/persistence/CLAUDE.md): Lazy-loaded context for specific subsystems, only consumed when Claude is working in that directory

Chris Dzombak documented his streamlined user-level CLAUDE.md. The key insight: keep project-agnostic preferences separate from project-specific instructions.

MCP servers: what people actually install

The top MCP servers by community adoption:

Server	What it does	Why people use it
GitHub	Direct interaction with repos, issues, PRs, CI/CD	Most widely used — eliminates context-switching
Context7	Fetches real-time, version-specific library docs	"Like having every library's maintainer next to you"
Sequential Thinking	Structured reasoning for complex problems	Methodical decomposition for architecture decisions
Playwright	Web automation via accessibility trees	Testing and scraping without browser-switching
Filesystem	Secure local file operations	Sandboxed file access for untrusted contexts
PostgreSQL	Natural language database queries	Schema exploration and data inspection
Serena	Semantic code retrieval and editing	Finding relevant code across large codebases

The pain points

MCP configuration has significant friction:

Config file confusion: Multiple locations (~/.claude.json vs ~/.claude/mcp.json) with the latter silently ignored in some setups
All-or-nothing loading: Every configured server loads everywhere, even when irrelevant. A handful of servers can inject 50+ tools consuming 50,000-100,000 tokens before you start working
No profiles: You can't say "use these servers for this project type." The feature request for context-aware tool switching is one of the most upvoted
First-time setup: Can take an hour due to environment conflicts, permission issues, and fragmented documentation
Silent failures: "Connection closed" with no actionable information

The mitigation: Tool Search (lazy loading) reduces context usage by up to 95%, making multi-server setups practical. The community recommends keeping it on auto.

Skills and hooks

CLAUDE.md tells Claude what to do. Skills extend what Claude can do. Hooks constrain how it does it. They complement each other.

Skills

A skill is a folder containing a SKILL.md file plus optional scripts. Unlike slash commands (manual invocation), skills activate automatically when their description matches the task context. Anthropic recommends keeping SKILL.md under 500 lines.

Community skill collections:

hesreallyhim/awesome-claude-code — curated skills, hooks, slash commands
VoltAgent/awesome-agent-skills — 500+ agent skills compatible across tools
claude-code-skill-factory — toolkit for building production-ready skills

The pattern maps to what we've described as less code, more skills — push extensibility to modular, reusable skills rather than hardcoding behavior.

Hooks

Hooks are deterministic shell commands that execute at 17 lifecycle events. The most used:

PreToolUse — Validate or block before actions. Protect .env, block rm -rf, reject edits to package-lock.json
PostToolUse — Cleanup after actions. Auto-format with Prettier/Black after every file Claude modifies
Stop — Run on task completion. Auto-commit changes, run the test suite, isolate changes into virtual branches
UserPromptSubmit — Inject context before processing. Add current git status, prepend relevant context

Real-world examples from the community:

Context rotation hook: A PreToolUse hook detects when context usage hits 65% and triggers automatic /clear with a structured handover document. The agent picks up seamlessly after the reset.

Auto-format on edit: A PostToolUse hook runs Prettier on every file Claude touches, ensuring formatting compliance without relying on Claude to remember the style guide.

GitButler integration: A Stop hook commits changes and isolates them into virtual branches per session, giving you clean rollback points.

The community consensus: "Advisory rules in CLAUDE.md are suggestions. Hooks are enforcement." When something absolutely cannot happen — editing production configs, committing secrets — hooks are more reliable than instructions.

The prompt lifecycle problem

This is the problem the user raised, and the data confirms it's universal.

Context rot

Stanford's "Lost in the Middle" study (TACL 2024) found that when relevant information moves from the beginning or end of context to the middle, model performance can drop by more than 20-30% — even for models that claim long-context support. The degradation is position-dependent, not just length-dependent.

How it shows up in practice:

Agents repeat previously completed work — they've forgotten they already did it
Contradictory decisions that conflict with earlier analysis
Re-reading files already processed because compressed summaries lost the details
Multi-step tasks failing mid-process when intermediate state vanishes
Auto-compaction reducing "architectural discussions to single sentences"

Prompt rot

Distinct from context rot, prompt rot is the gradual degradation of instruction effectiveness:

Agents hedge decisions they previously made confidently
They request clarification on previously automatic tasks
Output quality drifts while remaining "technically correct"
The agent's effective identity gets diluted by accumulated instructions

Root causes: context accumulation without pruning, sequential workarounds creating contradictions, and no reset mechanism.

One HN user's compliance test: he has Claude address him as "Mr Tinkleberry" in every response. When it stops, he knows Claude is ignoring instructions. That's prompt rot in action.

The MVP-to-production shift

[Interactive chart — see original post]
The lifecycle curve looks like this:

MVP phase (low effort, high output): "Vibe coding" works. Minimal CLAUDE.md, maybe just tech stack and conventions. Claude generates code fast. You're building new things every session.

Growth phase (rising effort, declining output): Technical debt from loose structure surfaces. Claude starts generating code that conflicts with what it generated last week. You add more instructions. The file grows. Some instructions contradict others. Output quality wobbles.

Refactoring phase (effort spike): You need Claude to restructure and delete, not generate. But the CLAUDE.md is optimized for generation. Claude resists deleting code it was told to create. It follows conventions from the old architecture. This is where most developers hit the wall — the instructions that worked for building don't work for rebuilding.

Production phase (stabilized effort): If you survive the refactoring gap, you arrive at a CLAUDE.md that's shorter, more focused on constraints than generation patterns, and paired with hooks for enforcement.

What people actually do about it

Weekly review: "Every few weeks, ask Claude to review and optimize your CLAUDE.md. A quick 'review this CLAUDE.md and suggest improvements' surfaces issues." Simple, but most people don't do it.

Nightly curation: Filter daily logs to permanent memory. Remove transient context. Keep only durable patterns.

Per-session hard reload: Start each session from clean source files instead of accumulated context. This prevents instruction drift but loses session continuity.

Automatic rotation: One developer built a system that proactively clears context at 60-65% usage (before quality degrades), using tmux integration, structured handover documents, and session recovery scripts.

Complete rewrite: When the patch count on your CLAUDE.md exceeds 5, start over. This is the nuclear option, but sometimes the accumulated contradictions are worse than rebuilding.

What works

Distilled from power users, community discussions, and Anthropic's own usage:

"Give Claude a way to verify its work — it will 2-3x the quality." Boris Cherny's most cited tip. Whether it's running tests, type-checking, or building — verification loops compound. This connects to plan mode as a verification gate: go back and forth on the plan until you like it, then switch to auto-accept.

Keep it under 200 lines. Use sub-directory CLAUDE.md for subsystem-specific context. The main file should be scannable in 30 seconds. If you need more detail, use @path imports to reference external docs.

Hooks beat advisory rules for critical constraints. "Never edit .env" in CLAUDE.md is a suggestion. A PreToolUse hook that rejects writes to .env is enforcement.

Separate instructions from learned knowledge. CLAUDE.md = "do it this way" (deterministic, team-shared). Auto-memory (MEMORY.md) = "I noticed this about your project" (emergent, personal). Mixing them creates confusion.

Update when your project phase changes. The biggest mistake is treating CLAUDE.md as static. When you shift from building to refactoring, rewrite the file. Different phases need different instructions.

Boris Cherny's workflow: Runs 5 Claude instances in parallel locally, plus 5-10 on claude.ai/code. Uses plan mode extensively. Ships 50-100 PRs/week (he posted 259 PRs in one 30-day stretch). His setup is "surprisingly vanilla" — Claude Code works great out of the box. The most important practice: iterative refinement via CLAUDE.md, not elaborate tooling.

Open questions

Should instruction files evolve automatically as the project evolves? Today, CLAUDE.md is a manual artifact. Could the agent detect that you've shifted from building to refactoring and suggest instruction updates? Auto-memory does this partially, but CLAUDE.md itself remains static.

Is AGENTS.md the right convergence point? 60,000 repos is impressive traction, but the lowest common denominator of "standard Markdown with no special syntax" means giving up features like path scoping, file imports, and hierarchical loading. Will tools keep their proprietary formats for power features and use AGENTS.md as a fallback?

How do you measure CLAUDE.md quality? Beyond "it feels like it's working," there's no metric. Token usage before first meaningful output? Instruction compliance rate? Number of corrections per session? The "Mr Tinkleberry test" is charming but not scalable.

Can skills replace most of what goes in CLAUDE.md? If architectural conventions are a skill, code style is a skill, and deployment procedures are a skill — what's left in CLAUDE.md? Maybe just the project description and the pointers to everything else.

DEV Community