Derrick Pedranti

Posted on Apr 12

Stop Overloading Your CLAUDE.md — Simplicity Wins (and Saves Tokens)

#ai #claude #promptengineering #tooling

If your CLAUDE.md, .cursorrules, or agent.md file is longer than a few hundred lines, you are probably making your AI assistant worse, not better.

Every time you start a new chat session, you pay a hidden cost for massive context files—in tokens, performance, and overall accuracy. Many developers tend to over-engineer their context files, stuffing them with endless rules and massive context blocks. Ironically, this usually leads to worse results.

The Shift Most Developers Haven't Fully Realized

Modern Large Language Models (LLMs) are exceptionally capable right out of the box. You no longer need to explain fundamental concepts like how React works, what REST APIs are, or re-teach basic programming architecture. The models already possess this knowledge.

What matters now isn't providing more instructions, but managing the context you provide much more effectively. This emerging practice is known as context engineering—the art of optimizing exactly what goes into the model's context window to produce the best possible results without overwhelming it.

The Hidden Cost of Large Context Files

Every time you start a new coding session or prompt your AI assistant, your context files (CLAUDE.md, .cursorrules, agent.md, system instructions) are all loaded into the context window.

That content immediately converts into tokens, and those tokens have a tangible cost:

Cost: You pay for every token processed, whether through direct API usage or hidden compute limits.
Attention: LLMs have finite attention spans. Essential project rules get diluted by boilerplate instructions.
Performance Risk: The larger the context, the slower the response times, and the higher the chance the model hallucinates or ignores specific constraints.

LLMs operate within a finite context window, meaning everything you include competes for attention. When you dump a massive configuration file into every single session, you run the risk of degrading the model's reasoning capabilities over time.

The Big Mistake: "More Context = Better Results"

It feels logical to assume that giving an AI more background information will yield a better answer. However, research and real-world usage consistently demonstrate a "less-is-more" effect in prompting. Removing non-essential content actually improves the accuracy and relevance of the model's output.

When a context window is bloated:

The model gets distracted: It might index heavily on a minor, irrelevant rule you included "just in case."
Important instructions get buried: The "needle in a haystack" problem means your critical constraints are lost in a sea of generic best practices.
Signal-to-noise ratio drops: Meaningful project context is drowned out by unnecessary explanations, leading to generic or confused outputs.

What You Should Do Instead

1. Keep Context Files Minimal

Most developers and teams do not need an enormous configuration file. Your system prompts should be lean and highly specific.

Only include:

Project-specific rules: Naming conventions, specific directory structures, or custom architectural patterns unique to your repository.
Constraints the model wouldn't infer: Hard requirements like "Never use external libraries for data fetching" or "Strictly adhere to local timezones."
Truly required defaults: Formatting preferences or language-specific compiler flags.

Everything else? Remove it.

2. Stop Treating Agents Like They're Dumb

There is no need to include generic instructions like "Write clean code" or "Use best practices." Modern models are aligned to do this by default. Telling an advanced LLM to write good code is like telling a senior engineer not to forget to breathe—it wastes space and adds no value.

3. Use Skills Instead of Static Context

This is where you can drastically improve your workflow. Agent skills allow for progressive disclosure of context:

Instead of loading a massive document of instructions upfront, only the skill's name and a brief description are loaded initially (consuming perhaps 100 tokens).
The full, detailed instructions and context are only loaded dynamically when the agent decides it needs to use that specific skill.

By utilizing skills, you ensure lower token usage per request, significantly better focus for the model, and a much more scalable system as your project grows.

4. Keep Skills Small Too

Even dynamic skills can suffer from bloat if you aren't careful. When building out agent capabilities:

Only include what the agent wouldn't already know: Do not paste the entirety of a public API's documentation if the model was likely trained on it.
Keep instructions concise and actionable: Focus on input/output expectations and specific steps.
Avoid "documentation-style" writing: Be direct. Once a skill activates, its entire payload enters the context window, so every word should earn its keep.

The Real Insight: Context Is a Budget

It helps to think of your context window like system RAM. In software development, you wouldn't load unnecessary libraries into memory, keep unused data structures active, or duplicate logic everywhere.

You should treat your AI's context with the same level of discipline. Manage it like a strict budget where every token must justify its inclusion.

Why This Matters More Than Ever

We are entering an era of AI development where:

Foundational models are becoming commoditized.
Baseline capabilities across different providers are largely similar.
The true differentiation lies in how you orchestrate and utilize them.

The engineering teams and individual developers who will excel are those who keep their systems lean, rigorously optimize their context, and build modular, reusable workflows—not the ones writing the most exhaustive, monolithic prompts.

Introducing: simplify-markdown

One problem I consistently encountered while refining these workflows is that AI-generated markdown tends to get bloated incredibly fast. It becomes too verbose, contains redundant sections, includes unnecessary explanations, and relies on token-heavy structures.

To solve this, I built a specialized skill: simplify-markdown.

This tool is designed to systematically reduce token usage, clean up unwieldy context files, and simplify agent or skill markdown files so that only the signal remains.

Where to Find It

GitHub Repository: ai-agent-toolkit
Skill Source: simplify-markdown/SKILL.md

When to Use It

Consider integrating simplify-markdown into your workflow when:

Your context files (.cursorrules, CLAUDE.md, etc.) are growing too large to manage easily.
Your dynamic skills feel bloated and are slowing down execution.
Your prompt architecture is becoming difficult to reason about.
You want to immediately improve response performance and lower your token expenditure.

Final Thought

The future of AI-assisted development isn't about writing more instructions. It is about writing less, but better instructions.

By focusing on smaller context windows, cleaner automated workflows, and smarter loading mechanisms like skills, you empower the AI rather than suffocate it. The models are already highly capable; your job as an engineer is simply to provide the right environment and stay out of their way.

Inspiration & Sources

Some of the core ideas and inspiration for this post came from the following resources—highly recommend checking them out:

The Startup Ideas Podcast
Ras Mic on YouTube

DEV Community