Alex

Posted on Mar 14

I Analyzed Dozens of AI Agent Rules Files. Most Are Making Your Agent Worse.

#ai #agents #tutorial #productivity

An ETH Zurich study from February 2026 tested something most of us assumed was helping: giving AI coding agents a rules file.

The results were not what anyone expected.

LLM-generated rules files: -3% task success rate, +20% cost
Human-written rules files: +4% success rate, but +19% cost
No rules file at all was the baseline

The researchers' conclusion: "Omit LLM-generated context files entirely and limit human-written instructions to non-inferable details."

In other words — the rules file most developers copy-paste from cursor.directory or generate with ChatGPT is actively making their AI agent slower, more expensive, and less accurate.

I wanted to understand why, so I crawled thousands of rules files from GitHub and analyzed them. Here's what I found.

The four types of rules (and why only one matters)

Every rule in your rules file falls into one of four tiers:

Essential

The agent cannot discover this from your code or config files. Remove it and the agent will make wrong decisions.

- Use pnpm, never npm or yarn
- Auth uses custom middleware chain, not Express standard
- All API responses must use the envelope format from lib/api-response.ts
- Bugs are tracked in Linear project "BACKEND" — reference ticket IDs in commits

The litmus test: run the agent without this rule. Does it make a mistake that no project file could have prevented? If yes — it's essential.

Helpful

The agent could figure this out by reading multiple files, but the rule saves significant exploration time.

- Prefer server actions for mutations, not API routes
- Use the cn() utility (clsx + twMerge) for conditional class merging
- Test files go in __tests__/ directories, not alongside source

These are worth keeping. They save the agent from reading 10 files to infer what one line could tell it.

Redundant

The agent detects this automatically from standard project files. These waste context window tokens.

- Use TypeScript                    ← tsconfig.json exists
- This project uses React           ← package.json says so
- Use ESLint for linting             ← .eslintrc.json exists
- The project uses 2-space indent    ← .prettierrc says so

Every one of these burns tokens that could hold conversation history or code context instead. Delete them.

Improve Codebase

This shouldn't be an AI instruction at all. It should be a linter rule, a config flag, or a CI check.

- Don't use var                      ← Add ESLint no-var rule
- Use strict mode                    ← Set strict: true in tsconfig.json
- Always run tests before commits    ← Set up husky pre-commit hook
- Use meaningful variable names      ← Add ESLint naming-convention rule

The key distinction: detection vs. enforcement. "This project uses pnpm" is redundant (the lockfile reveals it). "Always use pnpm, never npm" is essential — it enforces a preference the agent can't infer from files alone.

Real example: before and after

Here's a typical rules file I found on GitHub (anonymized, but representative of hundreds I analyzed):

Before (28 rules)

# Project Rules

## General
- Use TypeScript
- Use React with Next.js
- Write clean, maintainable code
- Follow best practices
- Use functional components

## Code Style
- Use 2-space indentation
- Use single quotes
- Use semicolons
- Use camelCase for variables
- Use PascalCase for components
- Prefer const over let, never use var

## Architecture
- Use the App Router
- Use server actions for mutations, not API routes
- Separate business logic from API routes
- Use the repository pattern for data access

## Error Handling
- Always handle errors properly
- Use try-catch blocks
- Never silently swallow errors

## Testing
- Write tests for all new features
- Use Vitest for unit tests
- Mock external dependencies

## Performance
- Use React.memo for expensive components
- Lazy load routes
- Optimize images with next/image

How does this break down?

Essential: 3 rules (server actions for mutations, repository pattern, separate business logic)
Helpful: 7 rules (Vitest, mock externals, React.memo, lazy load, next/image, camelCase, PascalCase)
Redundant: 8 rules (Use TypeScript, Use React, Use Next.js, App Router, functional components, write clean code, follow best practices, write tests)
Improve Codebase: 7 rules (2-space indent, single quotes, semicolons, no var, handle errors, try-catch, never swallow errors)

Over half the file is noise. 15 out of 28 rules either duplicate what the agent already knows or belong in tooling config.

After (11 rules)

# Project Rules

- Use pnpm, never npm or yarn
- Use server actions for mutations, not API routes
- Separate business logic from API routes into /lib/services/
- Use the repository pattern — all DB access goes through /lib/repositories/
- API responses use the envelope format: { success, data, error, meta }
- Error handling: use neverthrow Result types, not try-catch
- All new endpoints need Zod schema validation in /lib/validations.ts
- Test with Vitest — mock external deps, never mock the database
- Use next/image for all images, enforce WebP format
- Commits follow conventional format: feat|fix|refactor: description
- Never import from @prisma/client directly — use the db wrapper from /lib/db

Every rule here is something the agent cannot infer from project files, or would waste significant time discovering. The file is 60% smaller and every line changes agent behavior.

The context window problem

Why does this matter so much? Because of how LLM context windows work.

Your rules file, conversation history, code context, and the AI's own reasoning all share the same fixed-size window. A 200-line rules file with 15 essential lines and 185 lines of noise means:

Token waste — you're paying for the AI to read instructions it doesn't need
Attention dilution — research shows LLMs pay less attention to content in the middle of long contexts (the "lost in the middle" effect). Your essential rules get buried between redundant ones.
Reduced code context — every token spent on "Use TypeScript" is a token that can't hold actual code the agent needs to understand

The ETH Zurich numbers make sense when you think about it this way. A bloated rules file doesn't just fail to help — it actively competes with useful context for the agent's attention.

How to audit your own rules file

Take your current .cursorrules, CLAUDE.md, or AGENTS.md and ask three questions about each rule:

1. Can the agent see this in a config file?

If tsconfig.json, package.json, .eslintrc, biome.json, .prettierrc, or any other standard config already expresses this — delete the rule.

2. Is this enforceable by tooling?

If a linter rule, formatter config, git hook, or CI check can enforce this deterministically — move it there. Don't ask an AI to enforce what a tool can guarantee.

3. Would removing this rule cause the agent to make a wrong decision?

If the answer is "no, it would just take longer" — it's helpful, keep it but don't stress about the wording. If the answer is "yes, it would produce incorrect code" — that's essential, make it precise and specific.

What actually changes AI behavior

Addy Osmani's research found that specific tool mentions in context files increase agent usage from 0.01 to 1.6 times per task. The pattern is clear: specificity drives behavior change.

Rules that work:

- Use server actions for mutations, not API routes
- All dates stored as UTC in the database, converted to local only in the UI layer
- Never import from @prisma/client directly — use the typed wrapper from /lib/db

Rules that don't:

- Write clean code
- Follow best practices
- Handle errors properly

The difference is degrees of freedom. "Handle errors properly" gives the agent infinite valid interpretations. "Use neverthrow Result types, never try-catch in service functions" gives it exactly one path.

I built a tool to automate this

After classifying rules by hand, I got tired and built Agent Rules Builder — a free tool that does this automatically.

The Analyzer is the feature I use most. Paste your existing rules file, and the AI classifies every single rule into the four tiers — with a quality score, reasoning, and concrete suggestions. For "Improve Codebase" rules, it tells you exactly what linter rule or config flag to add instead.

It also catches cross-cutting issues: near-duplicate rules, contradictions, vague rules, and security problems like accidentally exposed API keys or internal paths.

The Builder is for when you're starting from scratch or setting up a new project. Pick your language (67 supported), select your framework, and browse a library of 10,000+ rules organized by category — Architecture, Testing, Security, Performance, Error Handling, Agent Workflow, and more.

What makes it different from template directories: every rule has (or can have) community votes. Other developers upvote or downvote individual rules — not whole files, each rule separately. Bad rules sink, good ones surface on the trending page.

And each rule gets an AI quality assessment — classified as essential, helpful, redundant, or improve-codebase with a score from 1-10 — so you can see at a glance whether a rule justifies its context window cost before you add it.

You can edit rules directly, add your own, choose between three verbosity levels, and export to any format.

The takeaway

Most rules files hurt more than they help because they're full of instructions the AI doesn't need. The fix is simple:

Delete anything the agent can read from config files
Move anything enforceable to linter/formatter/CI
Keep only what the agent genuinely can't infer
Make every remaining rule specific enough to have one interpretation

A 10-line file of essential rules beats a 200-line file of copy-pasted advice every time.

Top comments (1)

Mykola Kondratiuk • Mar 23

the over-specification trap is real. I went through a phase where my rules files were basically mini-documentation of every edge case I had ever hit, and the agent got slower and more confused the more I added. the ones that worked best were weirdly sparse - like three or four lines of actual constraints and nothing else. I think the issue is that dense rules create implicit priorities the model has to reason about even when they're irrelevant to the current task