UC Jung

Posted on Apr 6

Chapter 7. Context Management and Token Optimization

#agents #ai #llm #promptengineering

2.1 Why an AI Agent Produces Inconsistent Quality

Even when you give the same AI Agent instructions of the same caliber, you sometimes get excellent results and sometimes get something way off. This isn't because the AI Agent's capabilities are inconsistent.

There are two causes.

① It performs exploratory browsing on its own

When an AI Agent receives an instruction, it carries out exploratory browsing before executing. It decides on its own which files to read, which code to reference, and what structure to assume as its starting point.

The problem is that this exploration happens in areas not specified by the instruction.

[Same instruction, different exploration outcomes]

> "Implement the authentication API"

[Session A — exploration went well]
  → reads src/auth/ folder first
  → discovers existing auth patterns
  → generates code consistent with existing patterns
  → Result: excellent

[Session B — exploration went off track]
  → reads src/users/ folder first
  → loads code unrelated to authentication into the context
  → generates code on top of the wrong premises
  → Result: not what was intended

② The same instruction is interpreted differently depending on context

An AI Agent interprets the same instruction differently depending on what is in the context window at that moment. Previous conversation content, files that were read in, results retrieved via MCP — all of it influences how the next instruction is interpreted.

Inconsistent output quality is not due to variation in the AI Agent's ability — it's due to variation in the context from session to session.

2.2 Context Determines Output Quality

Every judgment an AI Agent makes is shaped by the accumulated context within that session.

┌─────────────────────────────────────────────────────────┐
│              Context Window (200K tokens)               │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  System prompt + tool descriptions    ← fixed area      │
│  CLAUDE.md guidelines                 ← loaded at       │
│                                         session start   │
│  MCP tool descriptions                ← scales with     │
│                                         active servers  │
│  ─────────────────────────────                          │
│  Files the AI Agent has read          ← exploration     │
│                                         results         │
│  Previous conversation content        ← grows as        │
│                                         conversation    │
│                                         accumulates     │
│  Tool execution results               ← command output  │
│  ─────────────────────────────                          │
│  ▼ The next instruction is interpreted and executed     │
│    based on all of this context ▼                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

The key insight: what content occupies how much of the context is what drives the AI Agent's next judgment.

Context state	AI Agent's judgment	Output quality
Relevant files loaded precisely	Evidence-based judgment	High
Large volume of irrelevant files loaded	Judgment driven by noise	Unstable
Conversation grown long, guidelines pushed out	Judgment based on recent conversation	Gradually declining
Previous failed attempts accumulated	Stuck in failure patterns	Vicious cycle

2.3 Narrow the AI Agent's Range of Judgment

In Advanced Ch. 1, we covered clear instructions (the 7 elements). That was about narrowing down "what to do."

Here, we go one step further. You also need to narrow down "the range of judgment about how to execute."

When an AI Agent receives an instruction, it internally makes two kinds of judgments:

[User instruction]
> Implement the user authentication API

[AI Agent's internal judgment]

  Judgment 1: "What do I do?" (What)
    → Inferred from the user instruction — implement the auth API

  Judgment 2: "How do I execute?" (How to execute)
    → Which files should I read first?
    → Should I explore for existing code patterns?
    → Which libraries should I reference?
    → How should I structure the tests?
    → ← This is the realm of discretionary exploration

The instruction method from Advanced Ch. 1 narrows Judgment 1; what this chapter covers is narrowing Judgment 2.

Ways to narrow Judgment 2:

Method	Description	Example
Specify the role	Define precisely what role the AI Agent should play	"Implement this as a senior backend developer"
Specify reference files	Designate what to look at, blocking arbitrary exploration	"Follow the patterns in src/auth/auth.service.ts"
Specify the procedure	Define the execution order directly	"1) Confirm schema → 2) Implement service → 3) Write tests"
Provide decision criteria	Give the principle for ambiguous situations	"If you're not sure, ask before executing"

2.4 People Can't Specify Everything

Here we hit a practical limit.

It isn't feasible for a person to include every reference file, every execution step, and every decision criterion in every instruction.

[Ideal but unrealistic instruction]
> Implement the user authentication API.
> - Follow the patterns in src/auth/auth.service.ts
> - Use the custom decorators in src/common/decorators/
> - Reference the User model in prisma/schema.prisma
> - Follow the response format in docs/api-convention.md
> - Reference the test patterns in tests/auth/
> - ...

→ Only possible if the person has memorized every file relationship in the project
→ Becomes impossible as the project grows

If you can't write instructions like that every time, you need to create a structure where the AI Agent can find the right files on its own.

That structure is the output index and the task-type reference guide.

2.5 The Output Index — Build a Map of Your Project

What is an output index?

It's a document that catalogs the structure and entry points of key output files in your project. It serves as a map when the AI Agent needs to judge "which file should I read first?"

It's the same reason the first thing a person looks for when joining a new project is a "project structure document." With this map, an AI Agent navigates with purpose-driven exploration rather than arbitrary browsing.

Without an index vs. with an index

[Without an index]
  AI Agent: "I need to build an auth API... I guess I'll scan all the files first"
  → reads 100 files → 50% of context consumed → not enough space to actually work

[With an index]
  AI Agent: "According to PROJECT_INDEX.md, auth lives in src/auth/,
             and API conventions are in docs/api-convention.md"
  → reads 3 files → 5% of context consumed → plenty of space to work

PROJECT_INDEX.md — Example

# PROJECT_INDEX — Project Output Index

> This file serves as the entry point when an AI Agent explores the project.
> Read this file first when starting a new task.

## Project Overview
- Weekly report system (NestJS 11 + React 18)
- Team of ~9, split into DX and AX sub-groups

## Key Directory Structure

| Directory | Role | Entry file |
|-----------|------|-----------|
| src/api/ | REST API routes | src/api/README.md |
| src/auth/ | Authentication & authorization | src/auth/auth.module.ts |
| src/services/ | Business logic | src/services/README.md |
| src/components/ | React components | src/components/index.ts |
| prisma/ | DB schema & migrations | prisma/schema.prisma |
| docs/ | Design & convention docs | docs/README.md |
| tests/ | Test code | tests/README.md |

## Key Design Documents

| Document | Purpose | When to reference |
|----------|---------|-------------------|
| docs/api-convention.md | API response format, error code conventions | When implementing or modifying APIs |
| docs/auth-design.md | Authentication architecture design | For auth-related work |
| docs/db-schema.md | ERD and table descriptions | When changing DB schema |
| docs/git-workflow.md | Branching strategy, commit conventions | When committing or opening PRs |

## Tech Stack Summary
- Backend: NestJS 11, TypeScript, Prisma, PostgreSQL
- Frontend: React 18, Tailwind CSS
- Test: Jest, Testing Library
- CI: GitHub Actions

Core Principles for the Index

Principle	Description
Specify entry points	Point to a specific file to read, not just a directory
Organize by purpose	Connect as "reference this when doing X" rather than just listing files
Keep it light	If the index itself is long, it defeats the purpose. Keep it under 50–100 lines
Keep it current	When the file structure changes, update the index (check at task end)

2.6 Task-Type Reference Guides

If the output index is "the map of the project," task-type reference guides are the signposts that say "for this task, look at this part of the map."

Why this is needed

When an AI Agent receives an instruction to "implement an authentication API," it needs to find authentication-related files. Even with an index, which combination of files is needed for an auth task is still a judgment the AI Agent has to make on its own.

Task-type reference guides are a way to pre-define even that judgment.

How to write them in CLAUDE.md

## Task-Type Reference Guides

### When implementing a new API
1. docs/api-convention.md — check response format and error codes
2. Similar API files in src/api/ — reference existing patterns
3. prisma/schema.prisma — check relevant models
4. Similar tests in tests/ — reference test patterns

### When modifying an API or fixing a bug
1. The API file + related service files
2. Corresponding tests in tests/ — check existing tests
3. After changes, run npm test to check for regressions

### When changing DB schema
1. docs/db-schema.md — review current ERD
2. prisma/schema.prisma — make schema changes
3. After generating the migration, update docs/db-schema.md
4. Check the impact scope on related service files

### When adding a frontend component
1. src/components/index.ts — check existing component list
2. Similar components in src/components/ — reference patterns
3. Tailwind class ordering: docs/frontend-convention.md

### For authentication/authorization work
1. docs/auth-design.md — review the auth architecture
2. src/auth/ — review existing auth code
3. After changes, run the full auth test suite

### For deployment/infrastructure work
1. docs/deploy-guide.md — review deployment procedure
2. .github/workflows/ — review CI/CD pipelines
3. docker-compose.yml — review container configuration

The effect of reference guides

[Without a reference guide]
  User: "Implement the payment API"
  AI Agent: (doesn't know where to start — scans everything)
  → wasted tokens + risk of referencing wrong files

[With a reference guide]
  User: "Implement the payment API"
  AI Agent: (consults "When implementing a new API" in CLAUDE.md)
  → 1. Read api-convention.md
  → 2. Check similar API patterns
  → 3. Check schema
  → 4. Begin implementation
  → purpose-driven exploration, tokens saved, consistent quality

2.7 The Relationship Between Output Index and Reference Guides

┌──────────────────────────────────────────────────────────┐
│  CLAUDE.md                                               │
│  ├── Project overview, tech stack, build commands        │
│  ├── Task-type reference guides  ←── "for X, look at Y" │
│  └── Core rules                                          │
├──────────────────────────────────────────────────────────┤
│  PROJECT_INDEX.md                                        │
│  ├── Directory structure + entry files  ←── "full map"   │
│  └── List of key design documents                        │
├──────────────────────────────────────────────────────────┤
│  docs/*.md                                               │
│  ├── api-convention.md                                   │
│  ├── auth-design.md        ←── "the actual references"   │
│  └── db-schema.md                                        │
└──────────────────────────────────────────────────────────┘

AI Agent's navigation path:
  ① CLAUDE.md → "This is an API task — check the reference guide"
  ② Reference guide → "It says to look at api-convention.md and similar APIs"
  ③ PROJECT_INDEX.md → "API directory entry point is src/api/README.md"
  ④ Read actual files → load only the minimum necessary for the task

Once this structure is in place:

Arbitrary exploration decreases → prevents wasted tokens
Exploration paths are consistent → output quality variance is reduced
People don't have to specify reference files every time → instructions become more concise

2.8 Practical Application Checklist

When setting up a project

□ Has PROJECT_INDEX.md been written?
  - Are key directories and entry files specified?
  - Is the list of key design documents included?
□ Does CLAUDE.md include task-type reference guides?
  - Are all frequently performed task types covered?
  - Are the files to reference and their order specified for each type?
□ Does CLAUDE.md specify "read PROJECT_INDEX.md first at the start of a new task"?

When giving a work instruction

□ Is there a reference guide in CLAUDE.md for this task type?
  - If yes: just give the instruction — no need to specify references separately
  - If no: include the reference files directly in the instruction
□ If this is a new task type, was the reference guide updated after completion?

When ending a task

□ If the file structure changed, was PROJECT_INDEX.md updated?
□ If new design documents were added, were they reflected in the index?
□ If a new task type was encountered, was it added to the reference guide?

2.9 Context Isolation — Don't Ask the Developer to Test Their Own Code

"You shouldn't test your own code"

There's a long-standing principle in software development: don't ask the developer to test their own code. Because they know what they built and how, they unconsciously assume "this will obviously work" or "this path is fine" and skip over things. Rather than testing edge cases where problems actually live, they only validate the happy path they intended.

An AI Agent is not immune to this problem.

Problems that arise when implementation and verification happen in the same session

The session in which the AI Agent wrote the code still carries all of the context from the implementation process. What design decisions were made, which files were modified, what direction was taken to solve problems — all of it remains in memory.

If you then say "test this" in that same session:

[Implementation + verification in the same session]

  AI Agent's context:
  ├── Design decisions from the implementation (already knows them)
  ├── List of files modified (already knows them)
  ├── Direction taken to solve the problem (already knows it)
  └── Remaining technical debt (can rationalize it)

  Problems that arise during verification:
  · "I built it this way, so this obviously works" → skipped
  · "I designed the error handling with 3 retries, so that's correct"
    → doesn't question the design itself
  · "This path flows as I intended" → unintended paths not verified
  · Doesn't recognize the parts it inferred as potential defects

This is exactly the trap human developers fall into when testing their own code. Knowing too much becomes the problem.

Verification must always happen in a new session

When verification is performed in a new session, the AI Agent judges the output purely on its own merits, without the context of how it was built.

[Verification in a new session]

  AI Agent's context:
  ├── CLAUDE.md guidelines (loaded clean)
  ├── Implementation result report (documented facts only)
  └── The actual code (as-is)

  How verification differs:
  · Doesn't know the implementation intent → if code can't explain itself, flags it as a defect
  · Doesn't know the design decisions → evaluates the validity of decisions independently
  · Doesn't know the inferred parts → treats undocumented behavior with suspicion
  · No bias from conversational flow → objective, guideline-based verification

How to instruct verification

Once implementation is complete, save an implementation result report to a file, then instruct a new session to verify using that report as the reference.

In the implementation session:

> Once implementation is complete, save an implementation result report to
  ./reports/auth-impl-report.md with the following:
>
> - Implementation scope (what was done)
> - List of files changed
> - Key design decisions and their rationale
> - Tests run and their results
> - Known limitations or unresolved issues
> - Items to verify during the review

In the new session:

> Read ./reports/auth-impl-report.md and verify the implementation.
> - Whether the implementation scope matches the requirements (docs/auth-requirements.md)
> - Code quality, security vulnerabilities, performance issues
> - Perform all items listed under "Items to verify" in the report
> - Save any issues found to ./reports/auth-review-report.md
> - Call out any additional items that need follow-up

Useful tip: Background execution for automatic session isolation

In Claude Code, after implementation is complete, instructing verification to run in the background automatically runs it in a new session — completely isolated from the current session's context.

> Implementation is complete.
> Now run the following verification tasks in the background.
>
> Background tasks:
>   1. Reference ./reports/auth-impl-report.md and verify the implementation
>   2. Save the results of npm test to ./reports/auth-test-output.log
>   3. Save the verification report to ./reports/auth-review-report.md
>   4. Output the result file paths when complete
>
> I'll continue with other work in this session.

Advantages of background execution:

Advantage	Description
Context isolation	Verifies purely on the output, without the implementation context
Time savings	Continue the next task without waiting for verification to finish
Parallel work	Implementation (current session) + verification (background) run simultaneously
Objectivity	The implementer's bias doesn't influence the verification

Advanced tip: Run as a separate process and monitor

When more explicit isolation is needed, you can instruct the AI Agent to run as a separate process and monitor the output.

> Run the following verification scripts as a separate process,
> and record the output to ./reports/verification-output.log.
> Monitor the log until the process completes,
> and report immediately if any failures are found.
>
> Verifications to run:
>   npm test -- --coverage
>   npm run lint
>   npm run type-check

Context Isolation Principles — Summary

┌──────────────────────────────────────────────────────────┐
│           3 Principles of Context Isolation              │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  1. Don't verify in the same session that implemented    │
│     → Knowing too much becomes the problem               │
│                                                          │
│  2. Verification references only the implementation      │
│     result report                                        │
│     → Judge by the output, not the implementation process│
│                                                          │
│  3. Use background execution or a separate process       │
│     → Physically separate sessions to guarantee          │
│       isolation                                          │
│                                                          │
└──────────────────────────────────────────────────────────┘

DEV Community