AbdlrahmanSaber

Posted on May 15

Your CLAUDE.md Is Wasting Tokens (And It's Probably Not Helping)

#ai #claudecode #productivity #devtools

There's a ritual that's become standard in the AI coding agent world. You spin up Claude Code or a similar agent harness, you create a CLAUDE.md or agent.md, and you start listing things: which framework you use, how you want code structured, what tone to respond in. You've read that the best results come from giving the model more context.

The ritual feels productive. It might even work initially. But it's probably making your agent dumber over time, and almost certainly wasting money.

Here's why — and what to do instead.

The Token Math Nobody Talks About

When you have a CLAUDE.md file, its entire contents get injected into the context window at the start of every single conversation turn. Not just once — every turn. A 1,000-line file is roughly 7,000–10,000 tokens. Multiply that across a long session with 20–30 back-and-forth exchanges, and you've burned a meaningful chunk of your available context on instructions the model already knows.

That last part is key: instructions the model already knows.

When you write This project uses React in your CLAUDE.md, you're telling the model something it can figure out by looking at your package.json. When you write use TypeScript in a project full of .ts files, you're burning tokens to state the obvious. The model has the codebase in context. It can read.

The 95% case — the overwhelming majority of agent setups — doesn't need a CLAUDE.md at all. The exception is when you have genuinely proprietary information: your company's internal methodology, a specific domain convention that's not derivable from the code, something the model has no way to infer. That's the 5%.

For everything else, there's a better primitive: skills.

Skills and Progressive Disclosure

A skill file looks similar to a CLAUDE.md on the surface — it's a markdown file with a name, a description, and a body of detailed instructions. The difference is how it's loaded into context.

With a CLAUDE.md, the full content is always present. With a skill, only the name and description are injected into context. The body stays on disk until the agent decides it's relevant.

So instead of burning 7,000 tokens on every turn, you're burning maybe 50 — two sentences that tell the agent "this skill exists, and here's roughly what it's for." When the agent encounters a task that matches that description, it reads the rest. Otherwise, it doesn't.

That's progressive disclosure. The agent reaches for what it needs rather than carrying everything all the time.

A simplified example of what this looks like in practice:

# Sponsor Research Skill

**Description**: Use when evaluating a new sponsor email. Researches the company across multiple signals and classifies the opportunity.

---

## Research Steps

1. Search for the company on LinkedIn, Crunchbase, Twitter/X, and YouTube
2. Check Trustpilot for reviews (if fewer than 50 reviews or rating below 3.5, flag as risk)
3. Verify any funding claims on Crunchbase
4. If two or more signals are missing or below threshold → automatic rejection
5. Log result to the sponsors spreadsheet with: company name, date, classification (good/bad), notes

...

The name and description live in context. The numbered steps don't — until needed.

The Biggest Mistake: Writing Skills Before Teaching

Here's where most developers go wrong: they identify a workflow, open a blank file, and start writing the skill. Sometimes they ask the AI to write the skill for them based on a description.

Both approaches miss the point.

A skill written without a recorded successful run is just guesswork. The agent has no reference for what "good" looks like in your specific context. It'll follow the steps mechanically, encounter edge cases you didn't anticipate, and either fail silently or hallucinate its way through gaps in the instructions.

The right sequence is:

1. Identify the workflow.
Something you do repeatedly. Researching sponsors. Generating a weekly analytics report. Reviewing a PR for a specific code quality standard.

2. Walk through it with the agent, step by step.
Don't write the skill yet. Just run the workflow conversationally. Tell the agent what to do first, verify it did it right, tell it what to do next. If it gets something wrong, correct it in natural language. You're building the successful run that will become the source material for the skill.

You: I'm forwarding you a sponsor email. First, search for this company on Twitter.
Agent: [searches, returns results]
You: Good. Now check Trustpilot. If fewer than 50 reviews, flag it.
Agent: [checks, flags appropriately]
You: Right. Now log this to the Google Sheet as "rejected" with these notes.

3. Once you have a successful run, ask the agent to create the skill from it.
At this point, the agent has actual context — it watched itself succeed. It knows where things went right, where you corrected it, what the output should look like. The skill it generates from this will be dramatically more useful than anything written from scratch.

Now that we've completed this workflow successfully, review what you just did and write a skill file for it. The skill should capture every step including the rejection criteria.

4. Keep iterating.
The first version of the skill will still have gaps. Run it again. When it fails, don't get frustrated — the failure is data. Ask the agent what went wrong, identify the specific error, fix it, and then tell it to update the skill file so the same mistake doesn't happen again.

This is recursive skill building: workflow → successful run → skill → failure → fix → update skill → repeat.

Every iteration tightens the skill. After four or five loops, you have something that runs reliably in ten minutes with minimal hand-holding.

Context Window Hygiene

There's a second reason this matters beyond cost: model quality degrades as the context window fills.

A fresh context window is fast and sharp. As it fills — with file contents, tool calls, multi-turn conversation history — performance degrades. Not dramatically, but noticeably. The model at 85% context utilization is measurably worse than the same model at 30%.

A bloated CLAUDE.md accelerates this. You're filling context with static instructions before the real work even starts. Skills keep that space available for what actually matters: the live conversation, the relevant code, the tool outputs.

The mental model: treat your context window like working memory. Don't fill it with things that can be looked up. Fill it with things that are happening right now.

On Sub-Agents and Scaling

This same logic applies to multi-agent setups. It's tempting to spin up five sub-agents on day one because the architecture looks impressive. But each of those agents needs its own context, its own skills, its own successful runs before it's genuinely useful.

The more productive path:

Start with one agent doing everything
Build skills for your recurring workflows — individually, with real runs
Once a workflow is stable and isolated enough, hand it to a dedicated sub-agent
That sub-agent inherits skills with actual successful runs behind them

Scale for productivity, not for architecture diagrams. A single well-configured agent with five solid skills will outperform a complex multi-agent setup where nobody took the time to teach the agents anything.

What This Looks Like in Practice

For a YouTube channel analytics workflow with eight data sources — Notion, YouTube Analytics, Twitter, and five others — you're not going to prompt the agent once and get a clean report. That's not a reasonable expectation for anything with that much integration complexity.

But after five iterations of the recursive skill-building process? The agent runs the entire thing in ten minutes, hits every data source correctly, and formats the output exactly how you want it. The skill file has grown from a rough draft to a precise specification — written by the agent, from its own successful runs.

That's the bar worth aiming for.

The Short Version

The models are good. Stop blaming the model. Start examining the context.
Skip CLAUDE.md unless you have genuinely proprietary information the model cannot infer from the codebase.
Use skills for recurring workflows. They use progressive disclosure — only the name and description hit context until needed.
Teach before you codify. Run the workflow conversationally, get a successful result, then ask the agent to write the skill from that run.
Treat failures as iterations, not failures. Fix → update the skill → run again.
Keep context lean. The less you load upfront, the sharper the model stays throughout a long session.

The work is less glamorous than configuring a 30-agent pipeline. But the results are more reliable, cheaper to run, and — crucially — they get better over time because you're building on actual successful runs rather than hopeful instructions.

If you found this useful, let me know in the comments what your current CLAUDE.md setup looks like — or if you've already made the switch to skills-first. I'm curious how different teams are approaching this.