Lars

Posted on Jun 5 • Originally published at larsroettig.me on May 30

Claude Code Best Practices for Vibe Coders: Ship More, Burn Fewer Tokens

#ai #claudecode #vibecoding #tokenoptimization

Vibe coding with Claude Code can be very fast. Costs increase quickly when token usage is not controlled.

After one year of daily use as a software architect, I observed two recurring issues. Agents often produce code with high confidence that does not solve the right problem. The context window grows too large and becomes inefficient, which leads to forgotten instructions and avoidable mistakes.

The habits below address both. They come from two Anthropic talks, the official best-practices guide, and daily work building agent-based workflows. Here is what stuck.

How it works, and the one constraint

Claude Code is the coworker who does everything in the terminal. Under the hood it's a plain agent loop: a model, system instructions, and tools, running until the task is done. There's no orchestration framework. Instead of pre-indexing your repo with RAG, it explores like an engineer, with glob, grep, and find. That trade has a real cost: an unscoped grep dumps raw matches into the window and burns tokens fast, and on a large monorepo a code-aware semantic index returns tighter, ranked results for the same question. I still lean on agentic search for most work, because a fresh grep reads the current state while an index drifts the moment someone merges, but I keep the search scoped and pair it with a semantic retrieval tool once the repo is big enough that raw matches stop fitting. Code Graph is one open-source option: it pre-indexes the codebase into a queryable graph, Claude learns to call its CLI automatically, and you trade the freshness guarantee for a dramatically smaller retrieval footprint per query.

The one constraint behind every habit below: the context window holds the whole conversation, every message, file read, and command output, and the model gets worse as it fills. A single debugging session burns tens of thousands of tokens, and a crowded window means forgotten instructions and avoidable mistakes. The skill is deciding what gets into context and what stays out.

When an agent is worth it

The failure mode I see most is pointing an agent at work that didn't need one. Three conditions:

Complex with an unknown path. You know the destination, like a merged PR from a design doc, but not the route.
Recoverable on error. A wrong web search costs nothing to redo; if a mistake is irreversible, keep a human in the loop.
High value, because the loop burns compute. Save it for coding, deep research, heavy analysis.

If the task is a known sequence of steps, write the script. An agent earns its cost when the route is genuinely uncertain.

Explore, then plan, then code

Letting Claude jump straight to code is how you get a clean solution to the wrong problem. Explore in plan mode first, ask for a written plan, edit it, then switch to normal mode and let it implement against that plan. Skip the plan when the diff fits in one sentence; reach for it when the approach is uncertain or spans files. For a bigger feature, have Claude interview you about edge cases and tradeoffs, then write the answers into a written spec you review before any code gets written, so the spec becomes the handoff a human signs off on.

Be specific, and give it a check it can run

Claude infers intent well and reads minds badly. "Add tests for foo.py" gets generic tests; "write a test for foo.py covering the logged-out case, no mocks" gets the one you wanted. Name the file, the scenario, the constraint, and point it at the pattern to copy. Feed it real input: reference files with @, paste screenshots, pipe data in with cat error.log | claude.

Then give it something that returns pass or fail: a test suite, a build exit code, a linter, a screenshot compared to a design. That's the difference between a session you watch and one you can leave running, because Claude writes the code, runs the check, reads the result, and iterates until it passes. Make it show the evidence, the actual output, not a claim that it's done.

Prompt it like an intern, not a chatbot

A long-running loop needs guardrails a chat prompt doesn't. Tell it when to stop ("when you find the answer, you can stop") and give it a budget ("under five tool calls for a simple query"), or it'll burn the window hunting for a better source. Ask it to plan out loud before calling tools and reflect on the results after. When the window fills, /compact writes a dense summary and refreshes it; for big jobs, let sub-agents do the research and hand back a short markdown summary. Two reflexes worth building: hit Escape the instant it goes down the wrong path, and reach for a CLI tool over an MCP server when both exist, since the model parses terminal output well.

Write the prompt the way Anthropic writes theirs

The habits above are about steering the loop. This one is about the prompt itself, and it matters most for the text you reuse: a SPEC.md, a CLAUDE.md, or a one-shot prompt you run headless in CI. Hannah and Christian's prompting walkthrough took one task from a wrong answer to a usable one without touching the model, just by restructuring the prompt. The same moves work for Claude Code.

Order the prompt the way a careful handoff reads. Role and task first, so the model knows what it's doing before it sees the data. Then the content. Then step-by-step instructions for how to work through it. Then any examples. Then a short reminder of the critical rules at the very end, because the last thing it reads is the freshest. In their demo, swapping a vague intro for "you help a claims adjuster review Swedish car-accident forms" moved Claude off a guess about a skiing accident and onto the actual task.

A few moves carry most of the weight:

Structure with delimiters. Claude was tuned on XML tags, so wrapping sections in tags like <form_layout> or <examples> tells it exactly what each block is and lets it refer back later. Markdown headings help too.
Show examples for the cases that trip it up. When a tricky input has a right answer your intuition knows, bake the input and the worked reasoning into the prompt. A handful of labeled examples steers the model harder than another paragraph of instructions.
Put stable context up front and keep it fixed. The parts that never change, a form layout, your coding conventions, the project's directory map, belong at the top where they read the same every run. That's also exactly what prompt caching rewards.
Prefill the answer to shape the output. If you need JSON, start the response with {; if you need a verdict in tags, open with <final_verdict>. The model continues from where you started it, so you get the structured output without parsing past the preamble.
Order the analysis on purpose. Tell it what to read first. In the accident demo, reading the form before the sketch was the difference between a confident verdict and a shrug, because the sketch only makes sense once you know the boxes that were checked.

One more, specific to Claude 4: let it think between tool calls. Extended thinking is a readable scratchpad, so you can watch where the reasoning goes wrong and fold that correction back into the prompt instead of guessing.

Your CLAUDE.md can be a bottleneck

Claude Code reads CLAUDE.md on every turn. It loads at session start and rides along in context on every prompt after. Mine bloated for months. Restructuring dropped it 60 to 75 percent with the same coverage and the same guardrails.

Two changes did it. First, scoping. The top-level file keeps only what's truly global, and the docs put a number on it: target under 200 lines, because longer files eat context and adherence drops. Everything that only matters in one part of the codebase moved into .claude/rules/, one file per topic, scoped to the paths it covers with a paths glob in the frontmatter:

---
paths:
  - "src/api/**/*.ts"
---

# API rules
- Every endpoint validates input
- Use the standard error response shape

A path-scoped rule loads only when Claude reads a file that matches, so the design-system rules stay out of context until it opens a component and the API rules stay out until it touches a handler (path-specific rules). Rules without a paths field load every session at the same priority as the root file, so keep those few. Nested CLAUDE.md files behave the same lazy way if you'd rather keep instructions sitting next to the code.

Second, hooks for the deterministic rules. Instead of "don't import axios" in prose, I wrote .claude/hooks/forbidden-imports.sh; the hook intercepts the write before the file changes, so the import never lands. Same for version pinning and toolchain checks. Instructions are advisory and drift as the file grows; a hook costs zero tokens per turn and either passes or fails. The test on every line: would removing this cause a mistake? If not, cut it. If your CLAUDE.md is longer than a screen, you're paying for it on every prompt.

One caveat worth knowing. Context files make agents more rigorous, more testing, more search, more careful file work, because the file reads as permission to do the thorough thing. That rigor costs extra steps and extra tokens, which is the argument for surgical, human-written rules over a CLAUDE.md you asked the model to generate. Auto-generated docs tend to enumerate every directory in the repo and bloat the window without earning it.

Hooks: the guardrail Claude can't skip

A Claude Code hook is a shell command that runs automatically at a point in the agent's lifecycle, like before a file read or after an edit. A CLAUDE.md line such as "always run the linter" is advisory, so the model can forget it; a hook runs every time regardless of what the model decides. That makes hooks the right tool for anything that must never be skipped: secrets protection, formatting, type checks.

There are two kinds, and the difference is whether they run before or after the tool call:

	PreToolUse	PostToolUse
Runs	Before the operation	After it completes
Can block?	Yes, exit code 2 stops it	No, the operation already happened
Good for	Security, validation, protection	Formatting, type checks, feedback
Example	Block a read of `.env`	Auto-format the file Claude just wrote

You register a hook in JSON at one of three scopes: ~/.claude/settings.json for every project of yours, .claude/settings.json committed for the whole team, or .claude/settings.local.json gitignored for just your machine. A matcher field picks which tools fire it, and the pipe means "or", so Read|Grep watches both read paths and Edit|Write watches both write paths. The script reads a JSON event on stdin, so you pull the file path out with jq.

First guardrail: keep Claude out of your secrets. A .env with an API key is something the model should never read into context, and both the Read and Grep tools can pull it in. A PreToolUse hook on Read|Grep blocks it with exit code 2, and the stderr message goes back to Claude as the reason. Match .env only and you've covered one of a dozen places secrets live, so widen the pattern to credentials files, keys, and secrets/ directories:

#!/bin/bash
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // .tool_input.path // empty')
if echo "$FILE_PATH" | grep -qE '(^|/)\.env|/\.aws/credentials|credentials\.json|\.npmrc$|\.pem$|/secrets/'; then
  echo "Security policy: cannot read this file, it may hold secrets" >&2
  exit 2
fi
exit 0

Treat that hook as a second layer, not the lock. The hard control is settings-level permissions.deny (for example Read(./.env) and Read(./secrets/**)), which the client enforces before the tool ever runs, regardless of what the model decides; the hook only reports back after it fires. Use the deny rules for the files you know about, the hook as a wider net for the ones you forget.

Second: format after every edit so you stop reminding the model. This one is PostToolUse on Edit|Write, so it can't block, it just runs Prettier on the file that changed and reports back:

#!/bin/bash
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
if [[ "$FILE_PATH" =~ \.(ts|tsx|js|jsx|css|json|md)$ ]]; then
  npx prettier --write "$FILE_PATH" 2>/dev/null
  echo "Formatted: $FILE_PATH"
fi
exit 0

One trap to know: reformatting a file right after the model wrote it can desync the model's in-context view of that file. Claude's edits match on the exact old text, so if Prettier reflows whitespace behind its back, the next queued edit can miss. If you hit that, move formatting to a Stop hook that runs once when the agent finishes its batch instead of after every Edit|Write.

Third, the one that earns its keep: a type-check feedback loop. When Claude changes a function signature it often misses a call site, so a PostToolUse hook runs tsc --noEmit and writes any errors to stderr. It always exits 0, because the edit already landed and blocking won't undo it; the errors come back as feedback, Claude fixes the call sites, the hook runs again, and the loop closes when the codebase is type-safe:

#!/bin/bash
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
[[ ! "$FILE_PATH" =~ \.(ts|tsx)$ ]] && exit 0
OUTPUT=$(npx tsc --noEmit 2>&1)
if [ $? -ne 0 ]; then
  echo "Type errors after editing $FILE_PATH:" >&2
  echo "$OUTPUT" >&2
fi
exit 0

Watch the cost here. A full tsc --noEmit takes seconds to minutes on a medium-to-large repo, and this version runs it synchronously after every edit, so a five-file change pays the compile five times and the loop crawls. Two fixes: run it on a Stop hook so it fires once when the agent finishes its batch, or keep it per-edit but use tsc --incremental (or scope it to the changed package) so you're recompiling the delta, not the world. The per-edit version above is the version to learn the pattern on, not the one to point at a large codebase.

All three register in one settings file under their event name:

{
  "hooks": {
    "PreToolUse": [
      { "matcher": "Read|Grep", "hooks": [{ "type": "command", "command": ".claude/hooks/protect-env.sh", "timeout": 10 }] }
    ],
    "PostToolUse": [
      { "matcher": "Edit|Write", "hooks": [
        { "type": "command", "command": ".claude/hooks/auto-format.sh", "timeout": 30 },
        { "type": "command", "command": ".claude/hooks/typecheck.sh", "timeout": 60 }
      ] }
    ]
  }
}

The type-check pattern works in any typed language: swap tsc --noEmit for mypy, go vet, or cargo check, and for an untyped language run the linter or the test suite instead. There are more events than these two, like UserPromptSubmit, SessionStart, and Stop, but PreToolUse and PostToolUse cover most of what you'd want a guardrail for.

Set up once, then keep the session clean

A few minutes of setup pays off everywhere. Tame permissions so you're not approving every command: auto mode lets a classifier block only the risky ones, or allowlist safe repetitive commands like npm run lint. Install the CLIs for services you use, like gh and aws, connect MCP servers for what a CLI can't reach, and put sometimes-relevant knowledge in skills so it loads on demand instead of living in every conversation.

In the session itself, correct early; Escape preserves context so you can redirect, and /clear between unrelated tasks stops old context from distracting the model. If you've corrected the same thing twice, the context is polluted, so clear it and write a sharper prompt with what you learned. For research that would read a hundred files, send a subagent so the digging stays out of your main window.

Three more levers worth knowing. First, /context shows you exactly what is eating your token budget right now; an oversized markdown doc or an MCP loading silently in the background can drain thousands of tokens per turn without you noticing. Second, /model lets you switch to a smaller model mid-session: Haiku handles web navigation and repetitive scripting for a fraction of the cost, so save Sonnet or Opus for the decisions that need it. Third, if your agentic loop reads noisy CLI output (long test runners, server logs), RTK compresses that output before it lands in the context window; 43 test-passed lines become one. RTK is lossy, so turn it off when you need Claude to debug from the raw output, but for green-path automation it cuts token burn significantly.

Evaluate by checking the end state

Grading an agent is harder than grading text. Start small and manual: five to ten realistic tasks, transcripts read by hand to see where it gets stuck, then an LLM as a judge with a strict rubric for the fuzzy parts. The test I anchor on is final-state validation: stop reading what the agent says it did and check the environment. If the goal was to update a booking, query the row and confirm the date changed. The model is optimistic about its own work, so the only claim I trust is the one the environment confirms.

A few failure patterns repeat: the kitchen-sink session (unrelated tasks in one context, fix with /clear), the correction spiral (clear and rewrite after two failed tries), the over-stuffed CLAUDE.md (prune, move rules to hooks), the trust-then-verify gap (no runnable check, so edge cases slip), and the infinite exploration (an unscoped "investigate" that reads a hundred files, so scope it or hand it to a subagent).

Two things to keep in mind

This moves fast enough that today's failure is next week's default. If something doesn't work with the current model or tool, park it and try again in a few weeks. A hard "no" is usually a "not yet."

The other one doesn't go away: the model hallucinates, and you own the output. A confident wrong answer is still your commit, your PR, your 2am incident. Guardrails, rules, and hooks catch the predictable mistakes, and feeding the model your project's real conventions stops it from inventing its own. Don't over-tune it, though. A 600-line rulebook the model can't hold is worse than ten rules it actually follows.

What hooks or context trims are you running in your own setup? Tell me on LinkedIn.

Sources: Cal's Claude Code talk (YouTube), Hannah and Christian on prompting (YouTube), Anthropic's Best practices for Claude Code guide, the Claude Code hooks guide, and John Kim's breakdown of cutting Claude Code token usage by up to 90% for the Code Graph and RTK strategies. These are my notes and opinions, not Anthropic's. If I misread a detail, ping me and I'll fix it.

DEV Community