Spoiler: it's more like 4%.
And honestly? That's still kind of impressive when you understand why.
I've been using Claude Code heavily, and like most of you, I'm always looking for ways to keep costs down. So when Caveman started trending — a tool that makes Claude respond like a caveman to slash token usage — I had to dig into the actual numbers.
Here's what a typical 100K token session actually looks like:
75,000 tokens are inputs. File reads, grep results, tool outputs, your conversation history. Caveman doesn't touch any of this. It can't.
25,000 tokens are outputs. But most of that is tool calls (10K) and code blocks (9K) — both of which have to be written correctly. Caveman leaves those alone too.
What's left? Prose responses. About 6,000 tokens. The conversational fluff — "Sure, I'd be happy to help!", "The reason this is happening is because..." — that's what Caveman compresses.
Cut that by 65% and you save ~4,000 tokens.
4,000 out of 100,000 = 4%.
So is Caveman useless? Honestly, no — and I say that as someone who just spent way too long doing token math to prove a point.
It's oddly satisfying. Responses feel faster. There's something weirdly fun about an AI that just says "bug here, fix this, done."
But 4% is 4%. Not 75%.
Sometimes the hype is the feature. This might be one of those times.
Here is how exactly you can test it
(https://docs.google.com/document/d/1ssSyVbJ_KPcsv3PZVJrjfYOBg_AHHrww8Ve1VC6CLkw/edit?usp=sharing)

Top comments (0)