Andrew

Posted on May 5 • Originally published at andrew.ooo

Caveman Review: The Claude Code Skill That Cuts 65% of Tokens

#caveman #claudecode #claudeskills #tokenoptimization

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

Caveman is a Claude Code skill (and Codex / Gemini CLI plugin) that overrides the agent's default verbosity by instructing it to "talk like caveman" — short fragments, no filler, no "I'd be happy to help" preamble. The bit is that it works: the project's own ten-prompt benchmark suite shows a 65% mean output-token reduction with full technical accuracy preserved, and the repo has rocketed to 54,000+ GitHub stars in under three weeks.

Key facts:

Open source on GitHub at JuliusBrussee/caveman — MIT license, 54K+ stars, climbing GitHub Trending
One-line install that auto-detects 30+ agents (Claude Code, Codex, Gemini CLI, Cursor, Windsurf, Cline, Copilot, Continue, Goose, Aider, opencode, Roo, Warp, Devin, Replit Agent, Antigravity…)
Three intensity levels — lite (drop filler, keep grammar), full (default caveman), ultra (telegraphic abbreviations) plus a 文言文 (Wenyan / classical Chinese) mode for the truly token-pilled
Companion skills for terse commits, one-line PR reviews, an MCP middleware (caveman-shrink) that compresses MCP tool descriptions, and a caveman-compress tool that shrinks CLAUDE.md files by ~46%
Honest claim: only output tokens are affected; input/context/thinking tokens are untouched
Independent benchmarks are mixed — community reproductions land at ~30–50% in normal use, with a 6-line homemade prompt occasionally beating the full skill

If you're paying for Claude Code by the token and find yourself skim-reading walls of "Sure! Let me help you with that…" preamble, Caveman is the most fun way to fix it. If you're chasing maximum context-window utilization, the wins are smaller than the headline number suggests — but they're real, and the install is one line.

Quick Reference

Field	Value
Repo	JuliusBrussee/caveman
Stars	54,081 (as of May 2026)
License	MIT
Install	`curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh \
Supported agents	30+ (Claude Code, Codex, Gemini CLI, Cursor, Windsurf, Cline, Copilot, …)
Modes	lite, full, ultra, wenyan-lite, wenyan-full, wenyan-ultra
MCP middleware	{% raw %}`npx caveman-shrink`
Average output-token saving (vendor)	65% (range 22–87%)
Average input-token saving from caveman-compress	46% on CLAUDE.md-style files
Trigger	`/caveman`, `$caveman` (Codex), or "talk like caveman"

What Caveman Actually Is

Strip away the meme and Caveman is three things bundled together:

A system-prompt skill that tells the agent to drop articles, contractions, filler, and meta-narration, and to answer in short fragments. It does not change reasoning, code generation, or tool-use — only the style of the natural-language wrapper around them.
An installer that auto-detects 30+ AI coding agents and registers the skill in each one's native format (Claude plugin, Gemini extension, Cursor .mdc rule, Windsurf rule, Copilot instructions, AGENTS.md). One command, every tool you have.
A small ecosystem of companion utilities — caveman-stats for real session token accounting, caveman-compress for shrinking memory files, caveman-shrink (MCP middleware) for compressing tool/prompt descriptions, and cavecrew subagents that emit ~60% fewer tokens than vanilla Claude Code subagents.

The hook — "why use many token when few token do trick" — came from a viral Reddit post by user flatty observing Claude happily produced the same correct answers in caveman-speak. Drona Gangarapu first packaged it as a CLAUDE.md drop-in (the 3.3K-star precursor). Julius Brussee added the multi-agent installer, levels, Wenyan mode, and MCP middleware, and shipped the trending version.

Install

For most readers there is exactly one command:

curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

This auto-detects every supported agent on your machine and installs Caveman for each. If you only want it in one place:

# Claude Code only
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman

# Gemini CLI only
gemini extensions install https://github.com/JuliusBrussee/caveman

# Cursor / Windsurf / Cline / Copilot
npx skills add JuliusBrussee/caveman -a cursor   # or windsurf, cline, github-copilot

By default the Claude Code install also wires:

Hooks + a statusline savings badge ([CAVEMAN] ⛏ 12.4k lifetime tokens saved)
caveman-shrink registered as an MCP middleware for npx-style servers
caveman-stats reading your real Claude Code session JSONL for honest accounting

Pass --minimal for the plugin only, --all to also drop per-repo .cursor/rules/, .windsurf/rules/, .github/copilot-instructions.md, and AGENTS.md files into the current directory so the rule auto-loads in any IDE that supports the convention.

Usage

After install, trigger with any of:

/caveman (Claude Code)
$caveman (Codex)
"talk like caveman" / "caveman mode" / "less tokens please"

Switch levels with /caveman lite, /caveman full, /caveman ultra, or one of the Wenyan variants. The level sticks until you change it or end the session. Stop with "stop caveman" or "normal mode."

Real Before/After

The repo's own example, from the README:

Normal Claude (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."

Caveman Claude (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Ultra (12 tokens):

"Inline obj prop → new ref → re-render. useMemo."

Same fix. Same correctness. Far less to read. The output is dense enough that a fast reader covers it in one glance.

Companion Skills

The skills that ship alongside the core mode:

Skill	What it does
`/caveman-commit`	Generates terse Conventional Commits messages, ≤50 char subject line, focused on the why, not the what
`/caveman-review`	One-line PR review comments: `L42: 🔴 bug: user null. Add guard.` No throat-clearing.
`/caveman-stats`	Real per-session and lifetime token usage + estimated savings + USD, read from the Claude Code session JSONL — no model-side guessing
`/caveman:compress <file>`	Rewrites a memory file (e.g. `CLAUDE.md`) into caveman-speak with `<file>.original.md` backup. Cuts ~46% of input tokens every session start
`cavecrew-investigator/builder/reviewer`	Caveman subagents that emit ~60% fewer tokens than vanilla Claude Code subagents

The caveman-compress tool is arguably the bigger long-term win than the runtime mode. Every Claude Code session re-injects your CLAUDE.md into context. If you cut 46% of those tokens once, you save them on every session for the life of the project.

caveman-shrink: the MCP Middleware

The most technically interesting piece of the project. caveman-shrink is a stdio proxy that wraps any MCP server and intercepts tools/list, prompts/list, and resources/list responses to compress the description fields. Code, URLs, paths, and identifiers stay byte-for-byte identical.

{
  "mcpServers": {
    "fs-shrunk": {
      "command": "npx",
      "args": ["caveman-shrink", "npx", "@modelcontextprotocol/server-filesystem", "/path/to/dir"]
    }
  }
}

V1 only touches metadata, not request/response bodies. If you have a dozen MCP servers each injecting a few thousand tokens of tool descriptions on session start, this matters more than the runtime mode does.

Benchmarks

The vendor's own ten-prompt suite, reproducible with the scripts under benchmarks/, claims a 65% mean output-token reduction with a range of 22–87%. The big wins are on verbose explanatory tasks (Explain React re-render bug: 87%); the small wins are on tasks that are already terse (Refactor callback to async/await: 22%).

The community has reproduced this — and pushed back. Two notable independent benchmarks, both posted to r/ClaudeAI and r/ClaudeCode:

"Caveman vs 'be brief'" (1 week ago, 24 dev prompts × 5 arms): caveman lite/full/ultra all beat baseline, but a one-line "be brief." instruction captured most of the savings on its own.
"6-line version beat the original" (1 month ago): on structured-output coding tasks, a hand-rolled 6-line prompt outperformed the full Caveman skill on the quality/token tradeoff. The 75% headline number was largely an artifact of comparing against "You are a helpful assistant" baselines that were unusually verbose.

A third post ("Does caveman plugin really help with context usage?") on r/ClaudeCode landed at the most useful nuance: real-world savings are typically 30–50% on output tokens, not 75%, and caveman only affects output — the cheapest part of a Claude Code bill. The expensive part is input/context tokens (CLAUDE.md, files read into context, MCP tool descriptions). For those, you want caveman-compress and caveman-shrink, not the runtime mode.

The vendor is honest about this — it's printed in an [!IMPORTANT] callout in the README:

Caveman only affects output tokens — thinking/reasoning tokens are untouched. Caveman no make brain smaller. Caveman make mouth smaller. Biggest win is readability and speed, cost savings are a bonus.

There is also a March 2026 paper, "Brevity Constraints Reverse Performance Hierarchies in Language Models", that argues constraining models to brief responses can improve accuracy by up to 26 percentage points on certain benchmarks by reducing the surface area for hallucination and contradiction. If that result holds up — and it's still preprint-stage — caveman-style prompting may be doing two useful things at once.

Community Reactions

Across r/ClaudeCode, r/ClaudeAI, r/ChatGPT, and Hacker News, the discussion clusters into three camps:

The converts. "I started talking to Claude like a caveman. My credits lasted 3x longer. I'm not joking." (r/ChatGPT, 2 weeks ago.) Multiple anonymous reports of API bills cut in half.

The skeptics with receipts. The Reddit benchmarks above — caveman is real, the headline number is inflated by adversarial baselines, and a 6-line be brief prompt captures most of the value. "75% is not realistic for normal English in my experience."

The accuracy worriers. A recurring concern that ultra mode degrades quality, not just verbosity. The vendor's own eval suggests this is overstated for lite and full, but ultra does occasionally drop important caveats. Most heavy users settle on full.

What does not show up: complaints about the install. The auto-detect installer is the most-quoted positive surprise.

Honest Limitations

Six things to know before you install:

Output tokens are the cheap part. On Claude Sonnet 4.6, output is $15/M and input is $3/M but you typically use 5–10× more input than output. Caveman cuts the smaller half of your bill. Use caveman-compress and caveman-shrink if you want the bigger half.
Reasoning/thinking tokens are untouched. Extended thinking traces are on the input side. Caveman does not shrink them.
Quality tradeoff at ultra. Telegraphic responses occasionally drop edge cases. Use full unless you're token-starved.
Some agents ignore it. Claude Code respects the rule consistently. Codex sometimes drifts back to verbose mode mid-session and needs re-prompting. Cursor is hit-or-miss without --with-init writing the per-repo rule files.
Output is harder for non-experts to read. "Inline obj prop → new ref → re-render. useMemo." is great for senior devs and brutal for juniors learning React. If you're using Claude as a teaching tool, leave it off.
It looks unprofessional in screenshots. Caveman-formatted Claude output does not screenshot well into a Slack channel where stakeholders are watching. Toggle off before demos.

Caveman vs Alternatives

Tool	Approach	Typical output savings	Setup cost
Caveman (full)	Skill/plugin + level system + companion utilities	30–50% real-world (65% vendor)	One line
"Be brief." prompt	One-line instruction in CLAUDE.md	25–40%	Manual
6-line community prompt	Hand-tuned brevity rule	30–55%	Copy/paste
Custom system prompt	Full DIY	Variable	Hours of iteration
Lower max_tokens	API parameter cap	Forces truncation, not compression	Trivial but lossy
Smaller model (Haiku)	Different model entirely	80%+ on cost, but quality drops	Free

Caveman's strongest argument over the homemade alternatives is not the prompt itself but the install ergonomics, the stats badge (you can see your savings in real time), and the MCP middleware. If you have one CLAUDE.md and no MCP servers, a 6-line prompt is fine. If you have ten projects, three IDEs, and a dozen MCP servers, the skill ecosystem is worth the install.

FAQ

Will Caveman make my Claude Code bill 75% smaller?
No. Output tokens are typically 10–30% of a Claude Code bill on coding tasks; input/context tokens dominate. Caveman cuts ~30–50% of output in normal use, which is a 5–15% bill cut. The bigger wins come from caveman-compress (one-time CLAUDE.md shrink, 46% saving) and caveman-shrink (per-session MCP description compression).

Does Caveman degrade code quality?
Independent quality evals suggest lite and full modes preserve correctness on coding tasks. ultra mode occasionally drops edge cases — community testers saw a small but real regression on tasks requiring nuanced explanations. Use full unless you're explicitly trying to maximize compression.

Does it work outside Claude Code?
Yes. The auto-installer detects 30+ agents (Codex, Gemini CLI, Cursor, Windsurf, Cline, Copilot, Continue, Aider, Goose, Warp, Devin, Replit Agent, Antigravity, opencode, Roo, …) and registers the skill in each one's native format. Quality is most consistent in Claude Code; Codex and Cursor occasionally drift back to verbose mode mid-session.

What is the Wenyan mode?
Classical Chinese (文言文) is one of the most token-efficient written languages humans have ever produced — its grammar omits articles, copulas, and subjects ruthlessly. Caveman's Wenyan modes use it to push compression further than English caveman-speak can. Useful as a curiosity; not recommended for production output unless your team reads classical Chinese.

Is caveman-shrink safe to use with arbitrary MCP servers?
V1 only touches metadata fields (description on tools/prompts/resources). It does not modify request bodies, response bodies, or any content the LLM actually receives at tool-call time. Safe by design — the worst it can do is hide a tool's full documentation from the model, which the model can still introspect via its name and parameters.

Can I uninstall it cleanly?
Yes. claude plugin uninstall caveman, gemini extensions uninstall caveman, or npx skills remove caveman per agent. The standalone Claude Code hooks have their own uninstaller. Per-repo rule files (AGENTS.md, .cursor/rules/caveman.mdc, etc.) are left in place — delete manually if you want a fully clean revert.

Is the 54K-star count real?
Yes, but it should be read in context. The repo went viral on Hacker News and r/ClaudeAI in mid-April 2026 and accumulated stars at an unusual rate. The signal is "people loved the meme and bookmarked it," not necessarily "54,000 developers use this in production." Treat the number as a marketing metric, not a quality metric — and look at the active issue count and benchmark reproductions instead.

Verdict

Caveman is a real tool dressed in a meme. The headline 75% savings number is inflated — independent benchmarks land closer to 30–50% on output tokens, and output tokens are not where most of your Claude Code bill comes from. The runtime mode is a quality-of-life upgrade more than a cost-cutter.

The genuinely valuable pieces of the project are the ones that don't fit on a tweet: caveman-compress for shrinking the CLAUDE.md files Claude Code re-injects on every session start, and caveman-shrink for compressing the MCP tool descriptions that bloat every long-running session. Those target input tokens, which is where the actual money is.

Install the whole thing — it's one command, MIT licensed, and the auto-detect installer is among the cleanest pieces of multi-agent ergonomics shipped in 2026. Use full mode by default, skip ultra unless you're a senior dev who reads code like prose, and treat the runtime savings as a nice side effect of the real product, which is the input-side compression layer underneath the meme.

If nothing else, your daily Claude conversations get more readable. Brain still big. Mouth small. Money stay home.

DEV Community