brian austin

Posted on Mar 31

How I cut my Claude Code costs in half: token efficiency settings that actually work

#claude #claudecode #ai #productivity

How I cut my Claude Code costs in half: token efficiency settings that actually work

There's a GitHub repo going viral on HN right now — drona23/claude-token-efficient — claiming a universal CLAUDE.md header cuts output tokens by 63%. I tested it. Here's what I found, and what actually moves the needle on cost.

The token problem

Claude Code is incredible. It's also verbose by default. Every response includes:

Explanations you didn't ask for
Recaps of what it just did
Caveats and disclaimers
"Here's what I'm going to do" preambles before doing it

All of that costs tokens. At scale, it adds up.

The CLAUDE.md token header (what's trending)

The viral approach: add this to the top of your CLAUDE.md:

# Response Efficiency Rules

CRITICAL: Minimize output tokens. Follow these rules:
- No preamble: never say what you're about to do, just do it
- No recap: never summarize what you just did
- No meta-commentary: no "Great question" or "Certainly"
- Truncate code: use `// ... existing code` for unchanged sections
- Single-line confirmations only: for simple tasks, one sentence max
- No unsolicited advice: don't suggest improvements unless asked

I tested this over a week of real development work. Results:

Conversational responses: ~40% shorter ✅
Code edits: ~30% shorter ✅
Complex explanations: ~10% shorter (minimal effect) ⚠️

So the 63% claim is aggressive, but the direction is real. For typical mixed usage, I saw ~35% reduction.

The `.claude/settings.json` complement

The CLAUDE.md header controls response style. Your settings.json controls what Claude can touch:

{
  "autoApprove": ["read", "ls", "find", "grep"],
  "autoReject": ["git reset", "git clean", "rm -rf"],
  "maxThinkingTokens": 5000,
  "verbosity": "minimal"
}

The maxThinkingTokens setting is underused. Default is uncapped. Setting it to 5000 for routine tasks cuts internal reasoning tokens without affecting output quality for anything that isn't a genuinely hard problem.

The bigger lever: your API endpoint

Here's the thing nobody talks about: where the API call goes matters as much as how efficient your prompts are.

If you're using Claude Pro at $20/month, you're paying per seat. Heavy Claude Code usage can bump you into rate limits fast — which either slows you down or pushes you to the $30-$40 enterprise tier.

I switched to routing Claude Code through SimplyLouie — a $2/month API proxy that uses the same Claude Sonnet model — and the math changed completely:

# ~/.bashrc or ~/.zshrc
export ANTHROPIC_BASE_URL="https://api.simplylouie.com"
export ANTHROPIC_API_KEY="your-key-here"

That's it. Claude Code picks up the env var automatically. Same model, same quality, $18/month cheaper.

Combined setup that's actually working

CLAUDE.md header (token efficiency):

# Efficiency
No preamble. No recap. No meta-commentary. Truncate unchanged code with comments. Single-line confirms for simple tasks.

.claude/settings.json (safety + cost):

{
  "autoApprove": ["read", "ls", "find", "grep", "cat"],
  "autoReject": ["git reset --hard", "git clean -fd", "rm -rf"],
  "maxThinkingTokens": 5000
}

Environment (routing):

export ANTHROPIC_BASE_URL="https://api.simplylouie.com"

What doesn't work

I tried a few things that failed:

Aggressive token limits in the prompt — adding respond in max 100 words to every prompt makes Claude worse at complex tasks. The efficiency header works because it targets unnecessary verbosity, not all verbosity.

Turning off thinking entirely — some tasks genuinely benefit from chain-of-thought. Capping at 5000 is the sweet spot.

Over-restricting autoReject — if you add too many patterns, Claude starts asking permission for safe operations, which adds latency and more tokens.

The actual math

Claude Pro: $20/month
SimplyLouie API: $2/month

35% token reduction from efficiency settings: ~$0 extra savings (you're on a flat fee anyway)

Total: $18/month saved, same model, no rate limit surprises.

If you're doing serious development with Claude Code, the efficiency settings are worth 10 minutes of setup. The API routing change saves you $18/month in under 2 minutes.

Using Claude Code seriously? SimplyLouie is $2/month for the same Claude Sonnet API — 7-day free trial, no commitment.

DEV Community

How I cut my Claude Code costs in half: token efficiency settings that actually work

How I cut my Claude Code costs in half: token efficiency settings that actually work

The token problem

The CLAUDE.md token header (what's trending)

The `.claude/settings.json` complement

The bigger lever: your API endpoint

Combined setup that's actually working

What doesn't work

The actual math

Top comments (0)

How I cut my Claude Code costs in half: token efficiency settings that actually work

The token problem

The CLAUDE.md token header (what's trending)

The .claude/settings.json complement

The bigger lever: your API endpoint

Combined setup that's actually working

What doesn't work

The actual math

The `.claude/settings.json` complement