WonderLab

Posted on Jul 4

Open Source Project #114: caveman — Why Use Many Token When Few Token Do Trick

#opensource #ai #claude #promptengineering

Introduction

"why use many token when few token do trick"

This is article #114 in the "One Open Source Project a Day" series. Today's project is caveman — a Skill that makes AI coding agents talk like a caveman, cutting 65% of output tokens while keeping code content byte-for-byte exact.

82,947 Stars. This is a meme project and a serious engineering tool.

LLMs have a default verbosity bias trained into them by RLHF: they like saying "Certainly!", "I'd be happy to help!", "The reason you're experiencing this issue is most likely because...", "In summary". These phrases have value in conversation. In a coding workflow they're pure token waste. You want the answer, not the preamble.

Caveman's solution: install a constraint that tells the agent "drop filler, keep substance, use fragments — but never touch code, commands, or error messages." The result: 65% output token reduction, faster answers, smaller bills — and a 2026 paper found that brevity constraints actually improve accuracy by ~26 points on some benchmarks.

What You'll Learn

How caveman compression works: what it cuts, what it keeps
Four compression levels: lite / full / ultra / wenyan and their respective tradeoffs
The honest numbers: 65% is output tokens — why whole-session savings are smaller, and when caveman goes net-negative
The full toolset: /caveman-compress for permanent savings on memory files
The caveman ecosystem: five related projects, each solving a different piece of the token problem

Prerequisites

Experience with Claude Code or any AI coding tool
Basic understanding of LLM tokens (output tokens cost money)

Project Background

Overview

caveman is a cross-agent Skill/Plugin. After installation, it tells your AI coding agent: answer without filler words, use fragments instead of full sentences — but leave code, commands, and error messages completely untouched.

It doesn't solve "AI answers are wrong." It solves "AI talks too much." Large language models develop a politeness bias through RLHF training — they like "Of course!", "Great question!", "As you can see", "In summary". These words have value in chat. In a coding workflow they're noise.

The project was created by Julius Brussee. Website: caveman.so.

Author / Team

Author: Julius Brussee
Primary Language: JavaScript
License: MIT
Website: caveman.so

Project Stats

⭐ GitHub Stars: 82,900+
🍴 Forks: 4,629+
📄 License: MIT
📅 Created: April 2026 (82k stars in ~3 months)

Features

Before / After

Normal agent — 69 tokens	Caveman agent — 19 tokens
"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."	"New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
"Sure! I'd be happy to help you with that. The issue you're experiencing is most likely caused by your authentication middleware not properly validating the token expiry. Let me take a look and suggest a fix."	"Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"

Same fix. A third of the words. Nothing technical lost.

┌────────────────────────────────────────────┐
│   output tokens saved   █████████       65% │
│   input tokens saved    ░░░░░░░░░         0% │
│   technical accuracy    █████████      100%  │
│   vibes                 █████████       OOG  │
└────────────────────────────────────────────┘

Caveman no make brain smaller. Caveman make mouth smaller. It shrinks what the agent says, not what it knows.

Install

One command. Finds every agent on your machine. Installs for each.

# macOS · Linux · WSL · Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows PowerShell 5.1+
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

~30 seconds. Needs Node ≥18. Skips agents you don't have. Safe to re-run.

Activation: On Claude Code, caveman is active from message one (hook mechanism, no command needed). On other agents, type /caveman or say "talk like caveman" to turn it on, "normal mode" to turn it off.

Install for a specific agent:

# Claude Code plugin
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

# Cursor / Windsurf / Cline / Codex, via skills registry
npx skills add JuliusBrussee/caveman -a cursor

# Gemini CLI
gemini extensions install https://github.com/JuliusBrussee/caveman

Pick Your Grunt (Four Levels)

Switch anytime with /caveman <level>. Level sticks until you change it or the session ends.

Level	Same sentence, shrunk
normal agent	"You should wrap the object in `useMemo`, since a new reference is created on every render."
`lite`	"Wrap object in `useMemo`. New ref created every render."
`full` (default)	"New ref each render. Wrap object in `useMemo`."
`ultra`	"New ref/render. `useMemo` it."
`wenyan`	Classical Chinese — packs the most meaning per token

Language note: Caveman keeps your language. Write in Spanish, caveman grunts Spanish. French, Portuguese — same. It compresses style, it never translates. wenyan is the deliberate exception: classical Chinese packs more meaning per token than any modern language.

Full Toolset

Command	What it does
`/caveman [lite\	full\
{% raw %}`/caveman-commit`	Conventional Commit messages, ≤50-char subject. Why over what.
`/caveman-review`	One-line PR comments: `L42: 🔴 bug: user null. Add guard.`
`/caveman-stats`	Real session token usage, lifetime savings, USD. Tweetable line with `--share`.
`/caveman-compress <file>`	Rewrite a memory file (like `CLAUDE.md`) into caveman-speak. ~46% input token reduction every session after. Code, URLs, paths preserved byte-for-byte.
`caveman-shrink`	MCP middleware. Wraps any MCP server, compresses its tool descriptions. npm.
`cavecrew-*`	Caveman subagents (investigator, builder, reviewer). ~60% fewer tokens than vanilla, so main context lasts longer.

Deep Dive

Benchmark Data

Real token counts from the Claude API. 10 typical coding prompts, measured against default verbose replies:

Task	Normal	Caveman	Saved
Explain React re-render bug	1180	159	87%
Fix auth middleware token expiry	704	121	83%
Set up PostgreSQL connection pool	2347	380	84%
Explain git rebase vs merge	702	292	58%
Refactor callback to async/await	387	301	22%
Architecture: microservices vs monolith	446	310	30%
Review PR for security issues	678	398	41%
Docker multi-stage build	1042	290	72%
Debug PostgreSQL race condition	1200	232	81%
Implement React error boundary	3454	456	87%
Average	1214	294	65%

The benchmark code is in benchmarks/ and evals/ — reproducible yourself.

The Honest Numbers Warning

The README includes a rare disclaimer worth quoting directly:

Honest number warning. Caveman only shrinks output tokens. Input and reasoning tokens are untouched, and the skill itself adds ~1–1.5k input tokens per turn. So whole-session savings run smaller than the output number, and on already-terse workloads they can go net-negative. The real win is readability and speed. Cost savings are the bonus.

When caveman wins:

Explanation tasks (architecture explanations, concept walkthroughs) → maximum savings (80%+)
Debugging explanations → large savings (70-80%)
Code generation → moderate savings (the code itself doesn't get compressed)

When caveman loses:

Already-terse tasks ("run this command") → skill overhead > savings
Short sessions → initial token cost not amortized

This kind of honest framing is uncommon in productivity tools.

/caveman-compress: The Permanent Savings Play

/caveman-compress <file> is different from per-reply compression. It compresses memory files that load every session — CLAUDE.md, AGENTS.md, project notes.

/caveman-compress CLAUDE.md

Real numbers from the README:

File	Original	Compressed	Saved
`claude-md-preferences.md`	706	285	59.6%
`project-notes.md`	1145	535	53.3%
`claude-md-project.md`	1122	636	43.3%
`todo-list.md`	627	388	38.1%
`mixed-with-code.md` (with code)	888	560	36.9%
Average	898	481	46%

Code, URLs, and paths are byte-preserved. Only prose descriptions are compressed. Do it once, every future session starts with a ~46% smaller context — permanent savings, not a one-time win.

How It Works

1. Install
   └── Drops a skill file into the agent's rules/config directory

2. Skill instruction (the core)
   └── Tells the agent:
       ✅ Drop filler words ("certainly" / "as you can see" / "in summary")
       ✅ Use fragments instead of complete sentences
       ✅ Skip transition sentences and introductory phrases
       ❌ Never modify code, commands, or error messages

3. Claude Code auto-activation (hook mechanism)
   └── Install writes a tiny flag file
   └── Hook fires each session; agent talks caveman from message one

4. /caveman-stats
   └── Reads local session log, computes tokens saved
   └── Writes to statusline: [CAVEMAN] ⛏ 12.4k

5. Zero telemetry
   └── No network calls after install
   └── Skill is a local prompt, hooks are local scripts, stats read a local log

The Research Paper Backing

A March 2026 paper (arXiv:2604.00025), "Brevity Constraints Reverse Performance Hierarchies in Language Models", tested 31 models and found that constraining large models to brief answers improved accuracy by ~26 points on some benchmarks.

Short isn't just cheaper. Sometimes short is more correct.

The Caveman Ecosystem

The author has built five related projects around a single thesis: agents should do more with less:

Repo	What it shrinks
caveman (this one)	What the agent says
caveman-code	The whole agent, end to end (~2× fewer tokens than Codex on identical tasks)
cavemem	What the agent remembers across sessions
cavekit	The build loop — spec-driven, no guessing
cavegemma	Compression baked into weights (Gemma fine-tune)

Plus five sibling skills (grill-me, interface-kit, junior-to-senior, loop-factory), all installable via:

npx skills@latest add JuliusBrussee/skills

Claude Code Status Line Integration

After installing, the Claude Code statusline shows:

[CAVEMAN] ⛏ 12.4k

That's your lifetime output tokens saved, updated on every /caveman-stats. Silence it with CAVEMAN_STATUSLINE_SAVINGS=0.

Resources

Official Links

🌟 GitHub: JuliusBrussee/caveman
🌐 Website: caveman.so
📊 Benchmarks: benchmarks/ and evals/ (reproducible yourself)
📄 Honest numbers doc: docs/HONEST-NUMBERS.md

Summary

caveman is a serious tool wearing a meme costume. The caveman gimmick is a joke. The 65% output token reduction is real measured data from reproducible benchmarks.

What makes this project stand out beyond the numbers is its intellectual honesty: the README explicitly tells you "this is output tokens, whole-session savings are smaller, short tasks may go net-negative" — more candid than most productivity tools that lead with only the best-case numbers.

/caveman-compress is the highest long-term leverage feature: compress the memory files that load every session (your CLAUDE.md, project notes), save ~46% input tokens, and that saving compounds across every future session. If your CLAUDE.md has grown long, this is worth trying once.

For developers who spend significant time with Claude Code or any AI coding tool — many coding conversations per day — caveman is currently one of the lowest-friction, lowest-side-effect token optimization tools available. One install command, 30 seconds, and every AI reply afterward is faster, shorter, and cheaper.

Caveman save token. Caveman save money. Star cost zero. Fair trade. ⭐

Explore PrimeSkills — a curated marketplace of AI agents and skills, each validated against real enterprise workflows. No hype, just what actually works.

Visit my personal site for more insights and interesting products.

DEV Community