Harshit Rathod

Posted on Jul 5

Why Large Codebases Drain Your AI Token Budget and How to Fit It

#ai #contextengineering #tokenoptimizations #claude

A technical breakdown of where context budget goes in large codebases, and the structural patterns that reduce consumption while increasing accuracy.

Hook: You add a new feature to a large NestJS monorepo with Claude's help. By the end of the conversation, the model has read 40 files, re-derived your module conventions twice, and still got the service placement wrong. You spent thousands of tokens. Claude spent most of them guessing. This guide explains why — and what to do about it.

Key Takeaways

Context windows don't scale with codebases. A large repo has more decisions, more conventions, and more ambiguity — all of which drive up token consumption.
The largest token sinks are re-derivation and uncertainty. Claude reads more files when it doesn't know the architecture; it generates more text when it doesn't know the conventions.
CLAUDE.md is the highest-leverage token saver. A 300-line briefing file can replace hundreds of lines of exploratory file reads per session.
Rules save tokens by preventing regeneration. A wrong pattern that gets corrected costs twice — once to generate, once to fix. A prohibition rule prevents it entirely.
Skills load context on demand. Templates and references in skills only enter the context window when invoked, not on every session.

Quick Answer

Large repositories burn through AI tokens because Claude must read files to infer what any experienced team member already knows: which service owns what, what the error-handling pattern is, how responses are shaped, what's been deliberately removed. Every exploratory read, every wrong-service guess, every pattern that violates a convention and gets corrected is wasted context. The fix is to externalize that tribal knowledge into CLAUDE.md (a briefing loaded every session), rules (prohibitions that prevent wrong output), and skills (on-demand workflows with templates). One-time setup; recurs on every feature.

The Problem: Context Budget Is a Finite Resource

Every token Claude reads or writes is budget spent. Context windows are measured in tokens — not files, not lines — and every session starts with the same ceiling. In a small project, that budget is generous. In a large monorepo with several services, dozens of modules, hundreds of DTOs, and cross-cutting conventions, the budget evaporates fast.

There are two places budget goes:

Input tokens — everything Claude reads: your prompt, files it opens, rules and docs it loads, the running conversation history.
Output tokens — everything Claude generates: code, explanations, corrections, re-generations.

Both are expensive. But in a large repo, the input side is where the hemorrhage starts — because Claude reads aggressively to compensate for what it doesn't know.

Where the Tokens Actually Go

1. Exploratory File Reads

When Claude doesn't know where something lives, it goes looking. In a well-indexed codebase, that might mean reading 3–4 files to locate the right module. In a large monorepo with ambiguous service boundaries, it can mean reading 15–20 before it's confident enough to act.

A typical unconfigured session flow for "add a consumer-visible status to a booking package":

Reads services/ directory listing — orientation
Reads operator-service/src/modules/order/ — wrong service, discovers mismatch
Reads api-service/src/modules/order/ — corrects course
Reads order.entity.ts — finds the shape
Reads order.service.ts — finds the pattern
Reads order.repository.ts — finds the query layer
Reads order.controller.ts — finds the response shape
Reads order.dto.ts — finds the DTO convention
Reads transform-response.helper.ts — finds the adapter pattern
Reads 3–4 more files to verify cross-service adapter usage

That's ~10 file reads before writing a single line of output. Each read costs input tokens. In a large NestJS monorepo, files are not small — services, repositories, and controllers routinely exceed 200–400 lines each.

The same task in a configured session: Claude reads CLAUDE.md, knows the service boundary immediately, reads the 3 directly relevant files, and starts generating. The exploratory reads never happen.

2. Convention Re-derivation

Even when Claude lands in the right file, it still needs to infer your team's conventions. Without explicit documentation, it reads examples to reconstruct patterns:

It reads existing controllers to learn how responses are shaped
It reads existing services to find which logger you use
It reads existing guards to understand your permission system
It reads multiple DTOs to infer your validation patterns

This is pattern-matching by example. It works — but it costs tokens for every session, for every engineer, indefinitely. The conventions don't change; the re-derivation does.

Convention re-derivation cost per session (rough estimate for a large NestJS monorepo):

Convention Claude Must Re-Derive	Files Typically Read	Estimated Token Cost
Service ownership (which belongs where)	3–5 entry points	~2,000–4,000
Response shape and interceptor pattern	2–3 controllers	~1,500–3,000
Permission system (guards + decorators)	2–3 guards/controllers	~1,000–2,000
Error handling (constants + exceptions)	2–3 services	~1,000–2,000
Logger usage	1–2 services	~500–1,000
Import style (alias vs relative)	Several files	~300–600
Total per session (estimation)	~13–18 files	~6,300–12,600 tokens

That's before the prompt, before the output, and before any task-specific reads. It's overhead — pure re-derivation of knowledge your team already has.

3. Wrong-Service Placement and Corrections

When a codebase has multiple services with overlapping domains — say, a package entity that exists in both the operator service (definition) and the consumer API (consumer-visible metadata) — Claude will guess. Sometimes it guesses right. Often it doesn't.

A wrong-service guess costs tokens in three ways:

Exploration in the wrong service — reading module structure, entities, and services in a service that won't receive the change
Generating the wrong output — producing code in the wrong service
Correction — you point out the error; Claude re-reads, re-reasons, and regenerates

In practice, one wrong-service guess can cost more tokens than the correct implementation from the start.

4. Pattern Violations and Regeneration

Without explicit prohibitions, Claude uses its training defaults. Those defaults are reasonable — but they won't match your team's conventions. Common mismatches in NestJS projects:

console.log instead of a structured logger
if (user.roles.includes('admin')) instead of PermissionCodes
Relative ../../ imports instead of the src/ alias
Hardcoded error strings instead of ErrorMessages constants
Raw DB queries in a service instead of going through a repository

Each violation costs tokens twice: once to generate the wrong pattern, once to correct it. A single code generation that needs three corrections has effectively quadrupled the output token cost for that block.

5. Conversation History Accumulation

As a session grows, the running conversation history is included in every subsequent call. In a long debugging or implementation session:

Early file reads stay in context even when they're no longer relevant
Corrections and re-generations add to history
Claude's reasoning about dead ends accumulates

A session that spans 20 exchanges in a large codebase can have a conversation history that alone approaches the token limit of shorter sessions.

The Fix: Externalize Knowledge, Don't Re-Derive It

The underlying cause of excess token consumption is the same in every case: Claude is spending tokens to learn things your team already knows. The fix is to make that knowledge available upfront, in a form Claude can read efficiently, rather than re-deriving it from source files every session.

Three mechanisms do this, each suited to a different type of knowledge:

Mechanism	What it encodes	When Claude reads it	Token impact
`CLAUDE.md`	Architecture, ownership, conventions	Loaded automatically, every session	Replaces exploratory reads
Rules	Hard prohibitions and invariants	Always active; prevents wrong output	Eliminates regeneration cost
Skills	Multi-step workflows + templates	On demand, when invoked	Keeps templates out of always-on context

CLAUDE.md: Replace Exploration With Declaration

CLAUDE.md is the single highest-leverage tool for reducing token consumption. It's loaded at the start of every session, before any file reads, and it replaces the exploratory phase entirely for the information it contains.

The key insight is specificity over completeness. A bloated CLAUDE.md that describes everything about NestJS is worse than a lean one that answers the exact questions Claude would otherwise need to read files to answer:

Which service owns what kind of data?
What does the standard response shape look like?
How does cross-service communication work?
What are the project-specific patterns (logger, errors, permissions)?

A 300-line CLAUDE.md that precisely answers those questions saves more tokens than a 1,000-line one that also explains what a repository pattern is.

What to Include

Service Routing Decision Table

This is the single most effective thing you can document. Instead of Claude reading 10 files to figure out which service owns a new feature, it reads one table:

Which Service Owns What

- operator-service — if the data describes _what an experience is_:
  package definitions, pricing, taxes, availability, operator accounts.
- api-service — if the data describes _a consumer's interaction_:
  bookings, users, auth, trip planning, payments, notifications.

| Task involves...                   | Service                            |
| ---------------------------------- | ---------------------------------- |
| Package categories, pricing, slots | operator-service                   |
| Booking status, user profiles      | api-service                        |
| Stripe payments (consumer side)    | api-service                        |
| Cross-service booking flow         | Start in api-service → IMS adapter |

This table costs ~150 tokens to include in the context. It replaces 2,000–4,000 tokens of exploratory service reads. That's a 10–25× return on every session.

Conventions That Would Otherwise Be Re-Derived

Document only what isn't already obvious from reading a single file:

## What Claude Must Not Do

- `console.log` → use `logMessage()` from `src/common/utils/logger`
- Check `user.role` → use `PermissionCodes` + `@Permissions()`
- DB call inside a loop → use bulk queries (`IN`, `JOIN`)
- Relative `../../` imports → use the `src/` alias
- TypeORM `synchronize: true` → Flyway manages all schema changes

Five prohibitions. ~100 tokens. Replaces all the file reads Claude would do to infer these patterns — and eliminates the regeneration cost when it gets them wrong.

Response Shape

Include the exact shape, not a prose description:

## Standard Response Shape

Controllers return `{ data, meta }`. TransformInterceptor wraps automatically:

return { data: { items }, meta: { path: request.path } };
// → { success: true, data: { items }, meta: { path }, error: null }

Without this, Claude reads 2–3 controllers to infer the pattern. With it, Claude knows immediately — and gets it right the first time.

What to Exclude

The token efficiency of CLAUDE.md depends as much on what you leave out as what you put in:

Framework documentation — Claude already knows what a NestJS guard is. Document only your project-specific guard setup.
Code that's derivable from files — if Claude can read one file and learn a pattern, don't repeat it in CLAUDE.md.
Stable, obvious conventions — use TypeScript, use async/await, follow the module pattern. These are defaults; they cost tokens but add no information.
Task-specific templates — templates belong in skills, not CLAUDE.md. They should enter the context on demand, not every session.

Target: under 300 lines. If your CLAUDE.md is longer, audit it for derivable content. Every line that doesn't prevent a file read or a wrong pattern is a line that should be cut.

Rules: Prevent Regeneration at the Source

A wrong pattern that's regenerated after correction costs twice the tokens of getting it right the first time. Rules cut this cost to zero by making the wrong pattern impossible to generate.

Rules live in .claude/rules/ as Markdown files. They're phrased as prohibitions — not preferences — because "NEVER call the DB in a loop" is enforced, while "prefer bulk queries" is suggestion that gets ignored under pressure.

Token Cost of a Rule vs. a Correction

Consider N+1 queries. Without a rule, here's what happens:

Claude generates code with a DB call inside a loop (output tokens)
You read the generated code and identify the N+1 (your time)
You explain the issue to Claude (input tokens)
Claude re-reads the context to understand the correction (context re-processing)
Claude regenerates the correct bulk-query version (output tokens)

Total: roughly 2× the output tokens for that block, plus the input tokens for the correction exchange.

With a rule:

Rule is in context; Claude generates the correct bulk-query pattern (output tokens)

Total: 1× output tokens.

The rule itself costs ~200 tokens to include in the session. It saves 1× output tokens plus correction overhead on every N+1 occurrence. In a session with 3–4 such violations, it pays for itself many times over.

High-Value Rules for NestJS Projects

Rules that deliver the highest token savings are those that prevent patterns Claude's training considers reasonable but that violate your project's conventions:

N+1 Query Prevention — Claude defaults to per-entity lookups in loops; this stops it:

# Database Query Rules — No N+1

NEVER make a database call inside a loop. Use a single bulk query, then
group results in memory. If unsure whether a pattern causes N+1, ask first.

Permission System — Claude defaults to role-name checks; this redirects it:

# Authorization — Never Check Role Names

NEVER check `user.role`, `user.roles`, or any role-name string.
Use `@Permissions([PermissionCodes.X])` with `PermissionsGuard` instead.

Import Style — Claude defaults to relative imports; this enforces the alias:

# Imports — Always Use the Path Alias

NEVER use relative imports that traverse more than one directory level.
Use `src/modules/...` alias imports everywhere.

Each of these rules costs ~100–200 tokens per session to include. Each saves the token cost of generating a violation plus the correction exchange — conservatively 500–2,000 tokens per occurrence.

Skills: Keep Templates Out of Always-On Context

Skills are on-demand context. Templates, references, and workflow guides for repeated tasks are exactly the kind of high-token content that should not be in CLAUDE.md — because they're only relevant when you're doing that specific task.

A controller template for your NestJS service is ~50–100 lines. In a CLAUDE.md, that's 50–100 lines of context included in every session, even when you're debugging, writing tests, or reading logs. In a skill, those same lines only enter the context when you run /api-development.

Token Budget Impact of Skills

Content	In CLAUDE.md (always loaded)	In a skill (on demand)
Controller template (80 lines)	~400 tokens × every session	~400 tokens × only when invoked
Service template (80 lines)	~400 tokens × every session	~400 tokens × only when invoked
Repository template (60 lines)	~300 tokens × every session	~300 tokens × only when invoked
DTO examples (40 lines)	~200 tokens × every session	~200 tokens × only when invoked
Total	~1,300 tokens per session	~1,300 tokens when you need it

If you scaffold endpoints 3 times a week but run 20 sessions per week, templates in CLAUDE.md cost 20 × 1,300 = 26,000 tokens. In a skill, they cost 3 × 1,300 = 3,900 tokens. Same result, 85% fewer tokens for the templates alone.

Skill Structure

.claude/skills/api-development/
├── SKILL.md                    # Workflow: trigger, steps, decision points
├── assets/
│   ├── controller-template.md  # Real, compilable code templates
│   ├── service-template.md
│   ├── repository-template.md
│   └── dto-template.md
└── references/
    ├── nestjs-conventions.md
    └── swagger-documentation.md

The SKILL.md workflow guides Claude through the task deterministically — DTO first (it defines the contract), then repository (queries only), then service (business logic), then controller (HTTP routing). Without the workflow, Claude invents its own order and sometimes generates the controller before the DTO exists, requiring re-generation.

Architecture: Per-Service CLAUDE.md Files

A monorepo has a root context problem. Everything in the root CLAUDE.md loads every session, regardless of which service you're working in. Details specific to operator-service's DI patterns are wasted tokens when you're debugging auth in api-service.

The fix is a two-level structure:

.claude/CLAUDE.md                   # Cross-cutting: routing, shared conventions
services/
  api-service/CLAUDE.md             # api-service specific: module patterns, auth, DI
  operator-service/CLAUDE.md        # operator-service specific: IMS adapter, pricing

Claude reads the root CLAUDE.md always, and the service-level file when you work in that directory. Cross-cutting concerns (service ownership, response shape, shared error constants) live at root. Service-specific depth (auth flows, module patterns, DI tokens) lives in the service.

Token impact: If each service file is 150 lines (~750 tokens), keeping them separate means you load at most 750 extra tokens per session instead of both services' details at all times. For a multi-service monorepo, that's up to 2,250 tokens of context avoided per session.

Measuring the Impact

Here's the before/after for a realistic "add a consumer-visible booking status" task in a large NestJS monorepo:

Without Configuration

Phase	Token Estimate
Exploratory directory reads (service placement)	~3,000
Convention re-derivation (logger, perms, errors)	~6,000
Wrong-service generation and correction	~4,000
Pattern violations and regeneration (3×)	~3,000
Actual task output (correct code, 4 files)	~4,000
Total	~20,000

With CLAUDE.md + Rules + Skills

Phase	Token Estimate
CLAUDE.md load (routing, conventions, shape)	~1,500
Rules load (3 rules)	~600
Targeted file reads (3 directly relevant files)	~2,500
Skill invocation (templates, workflow)	~1,300
Actual task output (correct code, 4 files)	~4,000
Total	~9,900

~50% reduction in total token consumption for a single task. For a team running dozens of sessions per week, the savings are substantial — and they compound because the ratio stays roughly constant across tasks.

Common Mistakes That Maximize Token Waste

Each of these looks harmless and silently burns budget across every session.

Putting templates in CLAUDE.md. Templates are task-specific. Loading them every session is waste. Move them to skills.
Writing CLAUDE.md as a tutorial. Explaining NestJS fundamentals Claude already knows inflates context without replacing any file reads. Document only what's unique to your project.
Omitting the routing decision table. Service placement ambiguity is the single largest driver of exploratory reads. A table makes the decision free.
Writing rules as preferences. "Prefer bulk queries" leaves Claude room to choose. "NEVER call the DB in a loop" doesn't. The second form prevents regeneration; the first doesn't.
One flat CLAUDE.md for a monorepo. All service-specific content loads every session. Split into root + per-service files.
No cross-service communication rules. Without explicit guidance, Claude reaches across service boundaries — reads the wrong DB, imports across services. The correction exchange is expensive. One sentence in CLAUDE.md prevents it.

Real-World Example: Token Budget on a Feature Request

The request: "Add a consumer-visible status to a booking package."

Unconfigured session token trace:

[read] services/                              ~200 tokens
[read] operator-service/src/modules/order/ ~150 tokens
[read] operator-service order.entity.ts    ~600 tokens  ← wrong service
[read] api-service/src/modules/order/      ~150 tokens  ← course correction
[read] api-service order.entity.ts         ~500 tokens
[read] api-service order.service.ts        ~800 tokens
[read] api-service order.repository.ts     ~600 tokens
[read] api-service order.controller.ts     ~700 tokens
[read] api-service order.dto.ts            ~400 tokens
[read] transform-response.helper.ts          ~300 tokens
[read] ims-adapter.ts                        ~700 tokens
...                                          ~3,000 more (permissions, logger, errors)
[generate] wrong permission check            ~400 tokens
[correction] permission explanation          ~200 tokens
[generate] correct permission                ~200 tokens
[generate] actual implementation (4 files)  ~4,000 tokens
Total: ~12,900 input + ~4,600 output ≈ 17,500 tokens

Configured session token trace:

[load] CLAUDE.md (routing, conventions)      ~1,500 tokens
[load] rules (3 rules)                        ~600 tokens
[invoke] /api-development skill             ~1,300 tokens
[read] api-service order.service.ts        ~800 tokens  ← targeted read
[read] api-service order.repository.ts     ~600 tokens  ← targeted read
[read] ims-adapter.ts                        ~700 tokens  ← targeted read
[generate] implementation (4 files, correct) ~4,000 tokens
Total: ~5,500 input + ~4,000 output ≈ 9,500 tokens

The implementation output is nearly identical. The savings come entirely from eliminating exploratory reads, wrong-service generation, and pattern violations.

Summary: The Token Budget Formula

Token consumption in a large repo follows a predictable formula:

Total tokens = (exploration overhead) + (re-derivation overhead) + (correction overhead) + (actual task output)

Configuration attacks the first three terms:

CLAUDE.md eliminates exploration overhead and most re-derivation overhead
Rules eliminate correction overhead
Skills shift task-specific re-derivation from every-session to on-demand

The actual task output — the code you asked for — is roughly constant. Everything else is overhead, and it's reducible.

FAQ

Why does Claude read so many files in a large repo?
Claude compensates for uncertainty with exploration. When it doesn't know which service owns a feature, it reads multiple services to decide. When it doesn't know your response shape, it reads multiple controllers. CLAUDE.md replaces that uncertainty with declared knowledge, eliminating the reads.

Does a larger CLAUDE.md always save more tokens?
No — the relationship inverts past a threshold. A 300-line CLAUDE.md that answers the right questions saves more tokens than a 1,000-line one that also includes tutorials and templates. The goal is maximum information density per line, not maximum coverage.

How much do wrong patterns actually cost?
A single N+1 query correction in a longer session can cost 500–2,000 tokens: input for your correction message plus output for the re-generation. Across a week of development with multiple sessions, prevention rules pay for themselves quickly.

Can I measure token usage per session in Claude Code?
Claude Code doesn't currently expose per-session token metrics directly. Proxy signals: session length before context compression kicks in, whether you see "context window approaching limit" warnings, and how many file reads appear in the tool call log.

Are skills just macros?
Skills are more than macros — they bundle a workflow (ordered steps, decision points), templates (compilable code), and references (conventions, Swagger patterns) into a single invocable command. The workflow is what prevents out-of-order generation and the correction overhead that comes with it.

Should CLAUDE.md be the same for every engineer on the team?
Yes — commit it to git so the configuration is shared and versioned. Personal overrides go in .claude/settings.local.json, which is gitignored. The team-level conventions in CLAUDE.md apply to everyone.

Does this approach work for repos that aren't NestJS?
The token-saving logic applies to any large codebase with team-specific conventions. The specific content changes (replace NestJS patterns with your stack's patterns), but the structure — CLAUDE.md for routing and conventions, rules for prohibitions, skills for repeated workflows — is framework-agnostic.

Key Takeaways (Recap)

Token waste in large repos has a root cause: Claude reads files to learn what your team already knows. The fix is to tell it upfront.
CLAUDE.md replaces exploration. A service routing table costs ~150 tokens; it replaces 2,000–4,000 tokens of exploratory reads per session.
Rules eliminate regeneration. A prohibition costs ~200 tokens once; it saves 500–2,000 tokens every time the wrong pattern would have been generated.
Skills keep templates off the always-on context. Load them only when you're scaffolding, not in every session.
Split by service. Per-service CLAUDE.md files prevent service-specific context from loading in unrelated sessions.
The savings compound. Every session, every engineer, every feature — the overhead reduction is consistent because the root cause is structural.

Conclusion

Token exhaustion in large repositories isn't a limitation of the AI — it's an information architecture problem. Claude reads aggressively because the knowledge it needs to act confidently isn't in any single file. It re-derives conventions that haven't changed. It generates wrong patterns that get corrected. It explores the wrong service before finding the right one.

The fix isn't a larger context window. It's externalizing your team's tribal knowledge into structured, version-controlled form — a CLAUDE.md that answers the questions Claude would otherwise spend tokens asking, rules that prevent the patterns it would otherwise spend tokens regenerating, and skills that load task-specific context only when you need it.

The investment is a few hours. The savings recur on every session, for every engineer, indefinitely.

DEV Community