Introduction
"why use many token when few token do trick"
This is article #114 in the "One Open Source Project a Day" series. Today's project is caveman — a Skill that makes AI coding agents talk like a caveman, cutting 65% of output tokens while keeping code content byte-for-byte exact.
82,947 Stars. This is a meme project and a serious engineering tool.
LLMs have a default verbosity bias trained into them by RLHF: they like saying "Certainly!", "I'd be happy to help!", "The reason you're experiencing this issue is most likely because...", "In summary". These phrases have value in conversation. In a coding workflow they're pure token waste. You want the answer, not the preamble.
Caveman's solution: install a constraint that tells the agent "drop filler, keep substance, use fragments — but never touch code, commands, or error messages." The result: 65% output token reduction, faster answers, smaller bills — and a 2026 paper found that brevity constraints actually improve accuracy by ~26 points on some benchmarks.
What You'll Learn
- How caveman compression works: what it cuts, what it keeps
- Four compression levels: lite / full / ultra / wenyan and their respective tradeoffs
- The honest numbers: 65% is output tokens — why whole-session savings are smaller, and when caveman goes net-negative
- The full toolset:
/caveman-compressfor permanent savings on memory files - The caveman ecosystem: five related projects, each solving a different piece of the token problem
Prerequisites
- Experience with Claude Code or any AI coding tool
- Basic understanding of LLM tokens (output tokens cost money)
Project Background
Overview
caveman is a cross-agent Skill/Plugin. After installation, it tells your AI coding agent: answer without filler words, use fragments instead of full sentences — but leave code, commands, and error messages completely untouched.
It doesn't solve "AI answers are wrong." It solves "AI talks too much." Large language models develop a politeness bias through RLHF training — they like "Of course!", "Great question!", "As you can see", "In summary". These words have value in chat. In a coding workflow they're noise.
The project was created by Julius Brussee. Website: caveman.so.
Author / Team
- Author: Julius Brussee
- Primary Language: JavaScript
- License: MIT
- Website: caveman.so
Project Stats
- ⭐ GitHub Stars: 82,900+
- 🍴 Forks: 4,629+
- 📄 License: MIT
- 📅 Created: April 2026 (82k stars in ~3 months)
Features
Before / After
| Normal agent — 69 tokens | Caveman agent — 19 tokens |
|---|---|
| "The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object." | "New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo." |
| "Sure! I'd be happy to help you with that. The issue you're experiencing is most likely caused by your authentication middleware not properly validating the token expiry. Let me take a look and suggest a fix." | "Bug in auth middleware. Token expiry check use < not <=. Fix:" |
Same fix. A third of the words. Nothing technical lost.
┌────────────────────────────────────────────┐
│ output tokens saved █████████ 65% │
│ input tokens saved ░░░░░░░░░ 0% │
│ technical accuracy █████████ 100% │
│ vibes █████████ OOG │
└────────────────────────────────────────────┘
Caveman no make brain smaller. Caveman make mouth smaller. It shrinks what the agent says, not what it knows.
Install
One command. Finds every agent on your machine. Installs for each.
# macOS · Linux · WSL · Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
# Windows PowerShell 5.1+
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
~30 seconds. Needs Node ≥18. Skips agents you don't have. Safe to re-run.
Activation: On Claude Code, caveman is active from message one (hook mechanism, no command needed). On other agents, type /caveman or say "talk like caveman" to turn it on, "normal mode" to turn it off.
Install for a specific agent:
# Claude Code plugin
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
# Cursor / Windsurf / Cline / Codex, via skills registry
npx skills add JuliusBrussee/caveman -a cursor
# Gemini CLI
gemini extensions install https://github.com/JuliusBrussee/caveman
Pick Your Grunt (Four Levels)
Switch anytime with /caveman <level>. Level sticks until you change it or the session ends.
| Level | Same sentence, shrunk |
|---|---|
| normal agent | "You should wrap the object in useMemo, since a new reference is created on every render." |
lite |
"Wrap object in useMemo. New ref created every render." |
full (default)
|
"New ref each render. Wrap object in useMemo." |
ultra |
"New ref/render. useMemo it." |
wenyan |
Classical Chinese — packs the most meaning per token |
Language note: Caveman keeps your language. Write in Spanish, caveman grunts Spanish. French, Portuguese — same. It compresses style, it never translates. wenyan is the deliberate exception: classical Chinese packs more meaning per token than any modern language.
Full Toolset
| Command | What it does |
|---|---|
| `/caveman [lite\ | full\ |
{% raw %}/caveman-commit
|
Conventional Commit messages, ≤50-char subject. Why over what. |
/caveman-review |
One-line PR comments: L42: 🔴 bug: user null. Add guard.
|
/caveman-stats |
Real session token usage, lifetime savings, USD. Tweetable line with --share. |
/caveman-compress <file> |
Rewrite a memory file (like CLAUDE.md) into caveman-speak. ~46% input token reduction every session after. Code, URLs, paths preserved byte-for-byte. |
caveman-shrink |
MCP middleware. Wraps any MCP server, compresses its tool descriptions. npm. |
cavecrew-* |
Caveman subagents (investigator, builder, reviewer). ~60% fewer tokens than vanilla, so main context lasts longer. |
Deep Dive
Benchmark Data
Real token counts from the Claude API. 10 typical coding prompts, measured against default verbose replies:
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
The benchmark code is in benchmarks/ and evals/ — reproducible yourself.
The Honest Numbers Warning
The README includes a rare disclaimer worth quoting directly:
Honest number warning. Caveman only shrinks output tokens. Input and reasoning tokens are untouched, and the skill itself adds ~1–1.5k input tokens per turn. So whole-session savings run smaller than the output number, and on already-terse workloads they can go net-negative. The real win is readability and speed. Cost savings are the bonus.
When caveman wins:
- Explanation tasks (architecture explanations, concept walkthroughs) → maximum savings (80%+)
- Debugging explanations → large savings (70-80%)
- Code generation → moderate savings (the code itself doesn't get compressed)
When caveman loses:
- Already-terse tasks ("run this command") → skill overhead > savings
- Short sessions → initial token cost not amortized
This kind of honest framing is uncommon in productivity tools.
/caveman-compress: The Permanent Savings Play
/caveman-compress <file> is different from per-reply compression. It compresses memory files that load every session — CLAUDE.md, AGENTS.md, project notes.
/caveman-compress CLAUDE.md
Real numbers from the README:
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md |
706 | 285 | 59.6% |
project-notes.md |
1145 | 535 | 53.3% |
claude-md-project.md |
1122 | 636 | 43.3% |
todo-list.md |
627 | 388 | 38.1% |
mixed-with-code.md (with code) |
888 | 560 | 36.9% |
| Average | 898 | 481 | 46% |
Code, URLs, and paths are byte-preserved. Only prose descriptions are compressed. Do it once, every future session starts with a ~46% smaller context — permanent savings, not a one-time win.
How It Works
1. Install
└── Drops a skill file into the agent's rules/config directory
2. Skill instruction (the core)
└── Tells the agent:
✅ Drop filler words ("certainly" / "as you can see" / "in summary")
✅ Use fragments instead of complete sentences
✅ Skip transition sentences and introductory phrases
❌ Never modify code, commands, or error messages
3. Claude Code auto-activation (hook mechanism)
└── Install writes a tiny flag file
└── Hook fires each session; agent talks caveman from message one
4. /caveman-stats
└── Reads local session log, computes tokens saved
└── Writes to statusline: [CAVEMAN] ⛏ 12.4k
5. Zero telemetry
└── No network calls after install
└── Skill is a local prompt, hooks are local scripts, stats read a local log
The Research Paper Backing
A March 2026 paper (arXiv:2604.00025), "Brevity Constraints Reverse Performance Hierarchies in Language Models", tested 31 models and found that constraining large models to brief answers improved accuracy by ~26 points on some benchmarks.
Short isn't just cheaper. Sometimes short is more correct.
The Caveman Ecosystem
The author has built five related projects around a single thesis: agents should do more with less:
| Repo | What it shrinks |
|---|---|
| caveman (this one) | What the agent says |
| caveman-code | The whole agent, end to end (~2× fewer tokens than Codex on identical tasks) |
| cavemem | What the agent remembers across sessions |
| cavekit | The build loop — spec-driven, no guessing |
| cavegemma | Compression baked into weights (Gemma fine-tune) |
Plus five sibling skills (grill-me, interface-kit, junior-to-senior, loop-factory), all installable via:
npx skills@latest add JuliusBrussee/skills
Claude Code Status Line Integration
After installing, the Claude Code statusline shows:
[CAVEMAN] ⛏ 12.4k
That's your lifetime output tokens saved, updated on every /caveman-stats. Silence it with CAVEMAN_STATUSLINE_SAVINGS=0.
Resources
Official Links
- 🌟 GitHub: JuliusBrussee/caveman
- 🌐 Website: caveman.so
- 📊 Benchmarks:
benchmarks/andevals/(reproducible yourself) - 📄 Honest numbers doc:
docs/HONEST-NUMBERS.md
Summary
caveman is a serious tool wearing a meme costume. The caveman gimmick is a joke. The 65% output token reduction is real measured data from reproducible benchmarks.
What makes this project stand out beyond the numbers is its intellectual honesty: the README explicitly tells you "this is output tokens, whole-session savings are smaller, short tasks may go net-negative" — more candid than most productivity tools that lead with only the best-case numbers.
/caveman-compress is the highest long-term leverage feature: compress the memory files that load every session (your CLAUDE.md, project notes), save ~46% input tokens, and that saving compounds across every future session. If your CLAUDE.md has grown long, this is worth trying once.
For developers who spend significant time with Claude Code or any AI coding tool — many coding conversations per day — caveman is currently one of the lowest-friction, lowest-side-effect token optimization tools available. One install command, 30 seconds, and every AI reply afterward is faster, shorter, and cheaper.
Caveman save token. Caveman save money. Star cost zero. Fair trade. ⭐
Explore PrimeSkills — a curated marketplace of AI agents and skills, each validated against real enterprise workflows. No hype, just what actually works.
Visit my personal site for more insights and interesting products.
Top comments (0)