You're wasting tokens. Not a little -a lot.
Here's a prompt I see constantly:
"I have a React app and I'm using the useState hook. My component re-renders every time the parent renders even though the props haven't changed. Why is this happening?"
Claude doesn't need any of that setup. It already knows React. It already knows what useState is. The only thing it needed was:
"Component re-renders on parent render. Props unchanged. Why."
Same answer. 64% fewer tokens.
The delta principle
Most prompts are written for humans. We explain context, name the framework, describe how things work before asking the question. That's how we communicate with each other.
But Claude already knows the context. The only thing it needs is the delta — the new information, the specific problem, the unknown.
Everything else is noise.
What you can safely strip
Claude already knows these — stop re-explaining them:
- Framework names used as context ("I have a React app", "I'm using Python")
- Concept explanations ("hooks are a React feature that...", "a closure is when...")
- Stack introductions ("my app uses Node, Express, and MongoDB")
Social noise that adds zero signal:
- Pleasantries: "hey", "hope you can help", "thanks in advance"
- Permission requests: "could you please", "I was wondering if"
- Hedging: "I think", "I'm not sure but", "maybe", "possibly"
- Filler: "basically", "essentially", "just", "simply", "actually"
What you should never strip:
- The actual error, bug, or problem
- Numbers, thresholds, measurements
- Variable names, function names, file names
- Code blocks and URLs
- Anything Claude could NOT know without being told
Before and after
Debugging:
Before (41 tokens):
"I'm working on a Node.js Express API and I'm getting a 401 unauthorized
error when I try to call the endpoint. I'm passing the JWT token in the
Authorization header."
After (12 tokens):
"401 on endpoint. JWT in Authorization header."
Code review:
Before (29 tokens):
"Could you please review this Python function and tell me if there are
any issues or improvements I could make?"
After (6 tokens):
"Review. Issues + improvements."
Explanation:
Before (19 tokens):
"I was wondering if you could explain how database connection pooling
works in simple terms?"
After (5 tokens):
"Explain connection pooling. Simple."
The compound effect
Single prompt savings look small. But across a real session, it compounds.
Here's a simulated 20-turn dev session — the kind where you're debugging something across multiple back-and-forths:
| Turn | Verbose (tokens) | Delta (tokens) | Saved |
|---|---|---|---|
| 1 | 48 | 18 | 30 |
| 2 | 35 | 12 | 23 |
| 3 | 52 | 14 | 38 |
| 4 | 29 | 8 | 21 |
| 5 | 41 | 11 | 30 |
| 6 | 33 | 9 | 24 |
| 7 | 44 | 15 | 29 |
| 8 | 38 | 10 | 28 |
| 9 | 31 | 8 | 23 |
| 10 | 45 | 13 | 32 |
| 11 | 27 | 7 | 20 |
| 12 | 39 | 11 | 28 |
| 13 | 50 | 16 | 34 |
| 14 | 36 | 10 | 26 |
| 15 | 42 | 12 | 30 |
| 16 | 28 | 8 | 20 |
| 17 | 46 | 14 | 32 |
| 18 | 33 | 9 | 24 |
| 19 | 40 | 11 | 29 |
| 20 | 37 | 10 | 27 |
| Total | 757 | 226 | 531 |
531 tokens saved in a single session. 70% reduction.
On Claude's API at Sonnet pricing, that's a small number in dollars. But if you're building on top of the API and running hundreds of sessions a day, it adds up fast. And even on claude.ai, fewer input tokens means less context noise — Claude processes cleaner signal and responds more precisely.
Three intensity levels
Not every prompt needs ultra-compression. I use three modes depending on the situation:
lite — strip pleasantries only, keep context (~20% reduction)
Use when: onboarding a new topic, first message in a session
full — strip everything Claude knows, keep only the delta (~60% reduction)
Use when: mid-session debugging, iterating on code, quick questions
ultra — compress to bare minimum signal (~70%+ reduction)
Use when: you know exactly what you want and don't care about polish
The skill file
I turned this into a Claude skill — a markdown file that instructs Claude to apply delta compression automatically, with activation/deactivation commands and intensity switching.
The README has the full rule set, intensity examples, and instructions for adding it to your Claude setup.
One thing worth thinking about
This is a small optimization. But the principle behind it is bigger:
We've been writing prompts for humans. We explain, we hedge, we contextualize — because that's how we earn understanding from other people. With LLMs, that overhead is waste. The model doesn't need to be convinced you know what you're talking about. It doesn't need the social scaffolding.
Just send the delta.
Top comments (0)