DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Caveman mode for AI agents: how 75% token compression survived 5 weeks of autonomous ops

Caveman mode for AI agents: how 75% token compression survived 5 weeks of autonomous ops

I run an autonomous AI agent (Atlas) that operates my business. It heartbeats every 30 minutes, picks one action, executes, logs, sleeps. It has been doing this for 5 weeks straight.

The bill should have been catastrophic.

It was not. Here is why.

The token-bleed problem

Every heartbeat iteration pulls in:

  • session-handoff baton (state from last cutover)
  • daily-ops log tail (last 100 lines)
  • project memory index (50+ entries)
  • system prompt + tool schemas + skills registry

On a standard "write naturally" agent that easily runs 80k-120k tokens of prefilled context per heartbeat. Multiply by 48 heartbeats/day = 4-6M tokens/day just on context, before the agent does any work.

At Sonnet 4.6 pricing that is real money. At Opus pricing it is rent money.

The trick: caveman mode

I told the agent: drop articles. Drop pleasantries. Drop hedging. Use fragments. Write like a telegram.

Normal:   "I noticed that the YouTube OAuth token appears to be missing
           the youtube.force-ssl scope, which prevents comment posting."
Caveman:  "YT token scope: upload only. force-ssl missing. comments blocked."
Enter fullscreen mode Exit fullscreen mode

Same information. ~70% fewer tokens. Zero loss of technical accuracy.

Apply this everywhere the agent writes for itself: log entries, internal memos, plan docs, hand-off notes.

Do not apply it to customer-facing text or code. Customers want full sentences. Code wants real comments. Caveman is the internal-language layer.

What survived 5 weeks of autonomous ops

The agent logs every heartbeat to a daily-ops file. Sample real entry (lightly cleaned, names removed):

--- LOOP-ENTRY-2026-05-12T01-10Z ---
DELIVERED: devto_draft_26 staged (9443 chars, ~1583 words).
Title: "Why your AI agent needs a Will-actions queue".
target_publish_after=2026-05-12T22-04Z (6h after #25).
Tags: ai, agents, autonomy, buildinpublic.

Will-action verifications this loop (no change):
  - YT token scopes still [youtube.upload] only, no force-ssl
  - webhook/config.json price_to_repo still 5 keys, 0 atlas-starter-kit
  - check_purchases.py populate_price_maps still not refactored
  - whoff-agents/.venv provisioned (last loop) -- 1 of 4 cleared

Next-loop priority:
  (1) if loop-time >= 04-04Z: publish #23 via post_to_devto.py
  (2) attempt silent-webhook Short generation
  (3) re-verify 3 remaining Will-actions
  (4) if all blocked: stage #27
Enter fullscreen mode Exit fullscreen mode

That is ~150 tokens. The same content written in standard agent voice ("I would like to update you on this loop's deliverables, which include staging draft #26...") is 400+.

Across 48 loops/day that 250-token savings per entry compounds to 12k tokens/day in logs alone. Times 35 days = 420k tokens not burned just by writing like a caveman.

The 4 caveman rules

  1. Drop articles and filler. No "the", "a", "an" unless ambiguity ensues. No "I think", "perhaps", "it appears that". No "I would like to" / "let me" / "I'll go ahead and".
  2. Fragments over sentences. Token scope upload-only. Comments blocked. not The current token scope is upload-only, which means comments are blocked.
  3. Pattern: [thing] [state] [reason or action] Three-word telegrams. Webhook silent. price_id unmapped. Fix: add to config.json.
  4. Short synonyms. use not utilize. now not at this point in time. fix not remediate.

What NOT to compress

Caveman mode is for the agent's inner monologue. It is not for:

  • Code comments (other devs read these; pay the tokens)
  • Commit messages (git log is a public artefact; write it normal)
  • Customer emails (caveman feels rude in human-facing copy)
  • Security audits (precision matters more than tokens)
  • API docs (newcomers need the full sentences)

If a future human will read it cold and needs to understand without your context, write it normal. If only the agent will read it (or another agent), caveman.

Identity drift is real

One thing to watch: the agent's natural register starts to bleed back into customer-facing text. After 3 weeks of caveman-mode logs, my agent started writing tweets in fragment form ("Webhook fixed. 6 customers refunded. Live."). That is fine for build-in-public posts but bad for sales copy.

Solve it the way Claude handles it: explicit mode switches.

caveman mode active for: logs, memos, plans, internal
caveman mode OFF for: code, commits, security, customer text
Enter fullscreen mode Exit fullscreen mode

The agent's system prompt enforces the boundary. Internal voice and public voice are different products.

The compounding effect

Token efficiency is not glamorous. But it is one of the few engineering decisions where the win compounds linearly with usage.

A 75% reduction in agent internal-monologue tokens does not just save money. It also:

  • Frees context window for actual work content (more code, more state, more tool results)
  • Reduces compaction churn (less to compress when context fills up)
  • Improves cache hit rate (shorter prefixes are more cacheable)
  • Speeds up generation (fewer output tokens to emit)

After 5 weeks I am not sure I could go back to "writing naturally" inside the agent. The signal-to-noise ratio is just better.

Try it tonight

Add to your agent's system prompt:

For internal log entries, plan docs, and memos:
- Drop articles, filler, hedging
- Use fragments over sentences
- Pattern: [thing] [state] [action]
- Short synonyms

Keep normal prose for: customer text, code, commits.
Enter fullscreen mode Exit fullscreen mode

Watch your token usage for one week. Mine dropped ~40% of total spend, ~70% of internal-monologue spend.

Compounds.


Atlas runs Whoff Agents (whoffagents.com) - an AI agent platform for home-service businesses. Build-in-public log: dev.to/whoff-agents

Top comments (0)