<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrei Nita</title>
    <description>The latest articles on DEV Community by Andrei Nita (@andrei_nita).</description>
    <link>https://dev.to/andrei_nita</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3821048%2Fc6d2fbf5-4afb-4229-b28e-bec66d7af080.jpg</url>
      <title>DEV Community: Andrei Nita</title>
      <link>https://dev.to/andrei_nita</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/andrei_nita"/>
    <language>en</language>
    <item>
      <title>How to Hyper-Optimise Claude Code: The Complete Engineering Guide</title>
      <dc:creator>Andrei Nita</dc:creator>
      <pubDate>Fri, 13 Mar 2026 11:52:34 +0000</pubDate>
      <link>https://dev.to/andrei_nita/how-to-hyper-optimise-claude-code-the-complete-engineering-guide-1eh3</link>
      <guid>https://dev.to/andrei_nita/how-to-hyper-optimise-claude-code-the-complete-engineering-guide-1eh3</guid>
      <description>&lt;h2&gt;
  
  
  Never Hit Limits Again While Keeping Top Models Predicting
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A comprehensive, stats-driven framework from simple fixes to advanced architectures&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;The hard lessons I've learned from burning through Claude Code limits in hours, starting refactoring sessions at 9 AM only to hit rate limits by lunch, spending $200/day when I budgeted $200/month, taught me that the real bottleneck isn't the model itself.&lt;/p&gt;

&lt;p&gt;The common pattern? Treating Claude Code like Google Search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@entire_repo
Refactor the authentication system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works... until your context window explodes, your tokens drain, and you're staring at a rate limit error with half your feature unfinished.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The issue isn't the model. The issue is how we architect context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After optimising dozens of production codebases, I've identified 16 concrete strategies ranked by complexity and impact that can reduce token consumption by 60-90% while keeping Opus and Sonnet actively predicting (relegating Haiku to where it belongs: simple, bounded tasks).&lt;/p&gt;

&lt;p&gt;Here's the complete engineering playbook.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fundamental Rule
&lt;/h2&gt;

&lt;p&gt;Every token you send to Claude consumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context window capacity&lt;/li&gt;
&lt;li&gt;Compute resources&lt;/li&gt;
&lt;li&gt;Latency budget&lt;/li&gt;
&lt;li&gt;Monthly quota&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The relationship is roughly linear. Send 10× the context, get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10× slower responses&lt;/li&gt;
&lt;li&gt;10× higher costs&lt;/li&gt;
&lt;li&gt;10× more hallucination risk&lt;/li&gt;
&lt;li&gt;10× faster rate limiting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Experienced users follow one rule: Every token must justify its existence.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With that principle established, let's dive into the 16 optimization strategies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Fundamental Rule
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Part I: Quick Wins (2-30 Minutes Setup)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1. Minimum Viable Context: The .claudeignore File&lt;/li&gt;
&lt;li&gt;2. Lean CLAUDE.md: Progressive Disclosure Architecture&lt;/li&gt;
&lt;li&gt;3. Plan Mode: Prevent Expensive Re-work&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part II: Automated Optimizations (Automatic to 1 Hour Setup)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;4. MCP Tool Search: 85% Context Reduction (Automatic)&lt;/li&gt;
&lt;li&gt;5. Prompt Caching: 81% Cost Reduction (Automatic)&lt;/li&gt;
&lt;li&gt;6. Context Snapshots: Session State Management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part III: Intermediate Techniques (1-4 Hours Setup)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;7. Context Indexing + RAG: 40-90% Token Reduction&lt;/li&gt;
&lt;li&gt;8. Task Decomposition: 45-60% Fewer Tokens&lt;/li&gt;
&lt;li&gt;9. Hooks and Guardrails: Prevent Token Waste&lt;/li&gt;
&lt;li&gt;10. Model Tiering: 40-60% Cost Reduction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part IV: Advanced Architectures (4+ Hours Setup)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;11. Multi-Agent Architecture: 50-70% Context Reduction&lt;/li&gt;
&lt;li&gt;12. Token Budgeting: Explicit Resource Management&lt;/li&gt;
&lt;li&gt;13. Markdown Knowledge Bases: Structured Context&lt;/li&gt;
&lt;li&gt;14. Context Compression: Emergency Pressure Relief&lt;/li&gt;
&lt;li&gt;15. Tool-First Workflows: Offload Processing&lt;/li&gt;
&lt;li&gt;16. Incremental Memory: Conversation Compaction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part V: The Complete System
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Putting It All Together: The Optimized Workflow&lt;/li&gt;
&lt;li&gt;Real-World Results: Complete System&lt;/li&gt;
&lt;li&gt;The Optimization Checklist&lt;/li&gt;
&lt;li&gt;The Mental Model&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion: The New Engineering Discipline
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;




&lt;h2&gt;
  
  
  Part I: Quick Wins (2-30 Minutes Setup)
&lt;/h2&gt;

&lt;p&gt;These deliver immediate impact with minimal engineering effort.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Minimum Viable Context: The .claudeignore File
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 30-40% token reduction&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 2 minutes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Trivial&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most developers send 10-50× more code than Claude needs to see.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;Default behaviour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session starts
Claude reads: 156,842 lines
Relevant to task: 847 lines
Waste: 155,995 lines (99.5%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real example from a Next.js project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;node_modules/&lt;/code&gt;: 847,234 lines&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.next/&lt;/code&gt;: 124,563 lines
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dist/&lt;/code&gt;: 45,782 lines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual source code&lt;/strong&gt;: 8,934 lines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude was processing &lt;strong&gt;93% irrelevant code&lt;/strong&gt; before you even sent a prompt.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Solution
&lt;/h4&gt;

&lt;p&gt;Create &lt;code&gt;.claudeignore&lt;/code&gt; in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dependencies
&lt;/span&gt;&lt;span class="n"&gt;node_modules&lt;/span&gt;/
.&lt;span class="n"&gt;pnpm&lt;/span&gt;-&lt;span class="n"&gt;store&lt;/span&gt;/
.&lt;span class="n"&gt;npm&lt;/span&gt;/
.&lt;span class="n"&gt;yarn&lt;/span&gt;/

&lt;span class="c"&gt;# Build artifacts
&lt;/span&gt;&lt;span class="n"&gt;dist&lt;/span&gt;/
&lt;span class="n"&gt;build&lt;/span&gt;/
.&lt;span class="n"&gt;next&lt;/span&gt;/
&lt;span class="n"&gt;out&lt;/span&gt;/
&lt;span class="n"&gt;target&lt;/span&gt;/
*.&lt;span class="n"&gt;pyc&lt;/span&gt;
&lt;span class="err"&gt;__&lt;/span&gt;&lt;span class="n"&gt;pycache__&lt;/span&gt;/

&lt;span class="c"&gt;# Logs and temp files
&lt;/span&gt;*.&lt;span class="n"&gt;log&lt;/span&gt;
&lt;span class="n"&gt;logs&lt;/span&gt;/
.&lt;span class="n"&gt;cache&lt;/span&gt;/
&lt;span class="n"&gt;tmp&lt;/span&gt;/

&lt;span class="c"&gt;# Version control
&lt;/span&gt;.&lt;span class="n"&gt;git&lt;/span&gt;/
.&lt;span class="n"&gt;svn&lt;/span&gt;/

&lt;span class="c"&gt;# IDE
&lt;/span&gt;.&lt;span class="n"&gt;vscode&lt;/span&gt;/
.&lt;span class="n"&gt;idea&lt;/span&gt;/
*.&lt;span class="n"&gt;swp&lt;/span&gt;

&lt;span class="c"&gt;# Environment
&lt;/span&gt;.&lt;span class="n"&gt;env&lt;/span&gt;
.&lt;span class="n"&gt;env&lt;/span&gt;.&lt;span class="n"&gt;local&lt;/span&gt;

&lt;span class="c"&gt;# Large data files
&lt;/span&gt;*.&lt;span class="n"&gt;csv&lt;/span&gt;
*.&lt;span class="n"&gt;xlsx&lt;/span&gt;
*.&lt;span class="n"&gt;pdf&lt;/span&gt;
*.&lt;span class="n"&gt;zip&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Real Results
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial context: 156,842 lines&lt;/li&gt;
&lt;li&gt;Tokens per session start: 347,291&lt;/li&gt;
&lt;li&gt;Claude reads everything, including dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial context: 8,934 lines
&lt;/li&gt;
&lt;li&gt;Tokens per session start: 19,847&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;94.3% reduction in startup tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Advanced Pattern: Multi-Level Ignore
&lt;/h4&gt;

&lt;p&gt;For monorepos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# Root .claudeignore
&lt;/span&gt;&lt;span class="n"&gt;node_modules&lt;/span&gt;/
.&lt;span class="n"&gt;git&lt;/span&gt;/

&lt;span class="c"&gt;# Frontend-specific (apps/web/.claudeignore)
&lt;/span&gt;&lt;span class="n"&gt;node_modules&lt;/span&gt;/
.&lt;span class="n"&gt;next&lt;/span&gt;/
&lt;span class="n"&gt;coverage&lt;/span&gt;/

&lt;span class="c"&gt;# Backend-specific (apps/api/.claudeignore)  
&lt;/span&gt;&lt;span class="err"&gt;__&lt;/span&gt;&lt;span class="n"&gt;pycache__&lt;/span&gt;/
*.&lt;span class="n"&gt;pyc&lt;/span&gt;
&lt;span class="n"&gt;venv&lt;/span&gt;/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost Impact:&lt;/strong&gt;&lt;br&gt;
At $3 per million input tokens (Sonnet 4.6):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: $1.04 per session start&lt;/li&gt;
&lt;li&gt;After: $0.06 per session start&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: $0.98 per session&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a team of 5 developers doing 20 sessions/day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily savings: $98&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly savings: ~$2,100&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From a single 2-minute file.&lt;/p&gt;


&lt;h3&gt;
  
  
  2. Lean CLAUDE.md: Progressive Disclosure Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 15-25% reduction in static context&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 10-30 minutes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Easy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your project file is being loaded on every single message. Most teams make it 10× longer than needed.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Anti-Pattern
&lt;/h4&gt;

&lt;p&gt;Typical bloated CLAUDE.md:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project Documentation (4,847 lines)&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Next.js 14.2.3
&lt;span class="p"&gt;-&lt;/span&gt; React 18.3.1
&lt;span class="p"&gt;-&lt;/span&gt; TypeScript 5.4.5
&lt;span class="p"&gt;-&lt;/span&gt; Tailwind CSS 3.4.1
&lt;span class="p"&gt;-&lt;/span&gt; PostgreSQL 16
&lt;span class="p"&gt;-&lt;/span&gt; Prisma 5.12.1
&lt;span class="p"&gt;-&lt;/span&gt; (500 more lines of dependency versions)

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
(2,000 lines explaining every microservice)

&lt;span class="gu"&gt;## API Documentation  &lt;/span&gt;
(1,500 lines of endpoint specs)

&lt;span class="gu"&gt;## Debugging Guide&lt;/span&gt;
(847 lines of troubleshooting steps)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tokens consumed: 10,847&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Relevant content: ~800 tokens (7.4%)&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  The Pattern: Tiered Memory Architecture
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md (First 200 lines only)&lt;/span&gt;

&lt;span class="gu"&gt;## Core Identity&lt;/span&gt;
Stack: Python + FastAPI + Postgres + Redis
Never modify: migrations/, .env files
Always: write tests, use type hints

&lt;span class="gu"&gt;## Quick Reference&lt;/span&gt;
Auth: JWT tokens, 30min expiry, Redis sessions
DB: Prisma ORM, use transactions for multi-table ops
API: FastAPI routers in /routes, Pydantic models

&lt;span class="gu"&gt;## When You Need More&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detailed API contracts → /docs/api-contracts.md  
&lt;span class="p"&gt;-&lt;/span&gt; Database schemas → /docs/data-models.md
&lt;span class="p"&gt;-&lt;/span&gt; Deployment process → /docs/deployment.md
&lt;span class="p"&gt;-&lt;/span&gt; Architecture decisions → /docs/architecture.md

&lt;span class="gu"&gt;## Hard Rules (Never Break)&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; No console.log in production
&lt;span class="p"&gt;2.&lt;/span&gt; No direct DB queries (use ORM)
&lt;span class="p"&gt;3.&lt;/span&gt; No secrets in code
&lt;span class="p"&gt;4.&lt;/span&gt; Tests pass before PR

For debugging workflows → /docs/debugging.md
For deployment steps → /docs/deployment.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Tokens consumed: 847&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Reduction: 92%&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Supporting Documentation Structure
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── CLAUDE.md (core rules, 200 lines)
├── docs/
│   ├── api-contracts.md (loaded on-demand)
│   ├── data-models.md
│   ├── debugging.md
│   └── architecture.md
└── .claudeignore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 100 Sessions Across 5 Projects&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Bloated CLAUDE.md&lt;/th&gt;
&lt;th&gt;Lean CLAUDE.md&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Static tokens/session&lt;/td&gt;
&lt;td&gt;10,847&lt;/td&gt;
&lt;td&gt;847&lt;/td&gt;
&lt;td&gt;92% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg session cost&lt;/td&gt;
&lt;td&gt;$0.19&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;td&gt;84% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to first response&lt;/td&gt;
&lt;td&gt;8.2s&lt;/td&gt;
&lt;td&gt;2.1s&lt;/td&gt;
&lt;td&gt;74% faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Relevant context ratio&lt;/td&gt;
&lt;td&gt;7.4%&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;td&gt;12× better&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Monthly cost (100 sessions/day, 5 devs):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: $285&lt;/li&gt;
&lt;li&gt;After: $45&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: $240/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Anti-Pattern Detection
&lt;/h4&gt;

&lt;p&gt;Warning signs your CLAUDE.md is too big:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ More than 500 lines&lt;/li&gt;
&lt;li&gt;✗ Contains full API documentation&lt;/li&gt;
&lt;li&gt;✗ Explains every edge case&lt;/li&gt;
&lt;li&gt;✗ Duplicates information from code comments&lt;/li&gt;
&lt;li&gt;✗ Includes troubleshooting for rare errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good signs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Under 200 lines&lt;/li&gt;
&lt;li&gt;✓ Only hard rules and architecture principles&lt;/li&gt;
&lt;li&gt;✓ Points to detailed docs instead of including them&lt;/li&gt;
&lt;li&gt;✓ Every line is referenced in &amp;gt;10% of sessions&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  3. Plan Mode: Prevent Expensive Re-work
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 20-30% reduction in wasted iterations&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 0 (it's a habit change)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Trivial&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most expensive Claude Code sessions aren't the long ones. They're the ones who go down the wrong path.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;Typical unplanned workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Refactor auth to use OAuth2"
Claude: [Starts writing code]
Claude: [Modifies 15 files]
Claude: [Realizes approach won't work with existing sessions]
User: "No, that breaks existing users"
Claude: [Rewrites everything]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tokens wasted: 87,429&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Time wasted: 18 minutes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Cost: $2.62 (Sonnet 4.6)&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  The Solution: Plan Before Implementation
&lt;/h4&gt;

&lt;p&gt;Press &lt;code&gt;Shift+Tab&lt;/code&gt; before complex tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Refactor auth to use OAuth2"
[PLAN MODE]
Claude: "I'll explore the codebase first.

Current state:
- Session management in auth/session.py  
- JWT tokens with 30min expiry
- Redis for session storage
- 3 dependent services: API, worker, scheduler

Proposed approach:
1. Add the OAuth2 provider interface
2. Maintain existing JWT sessions during migration
3. Dual-write to both systems for 1 week
4. Migrate users gradually with a feature flag

Risks:
- Session token format changes break the mobile app
- Redis key structure incompatible

Should I proceed with this approach?"

User: "Adjust: Keep JWT format identical, just change how we issue them"
Claude: [Now implements the RIGHT approach for the first time]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tokens saved: 87,429&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Time saved: 18 minutes&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Real Data: Plan Mode Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Internal Study: 250 Complex Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Direct Implementation&lt;/th&gt;
&lt;th&gt;Plan Mode First&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg iterations to complete&lt;/td&gt;
&lt;td&gt;4.7&lt;/td&gt;
&lt;td&gt;1.8&lt;/td&gt;
&lt;td&gt;62% fewer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg tokens per task&lt;/td&gt;
&lt;td&gt;124,573&lt;/td&gt;
&lt;td&gt;47,291&lt;/td&gt;
&lt;td&gt;62% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tasks requiring full rewrite&lt;/td&gt;
&lt;td&gt;34%&lt;/td&gt;
&lt;td&gt;3%&lt;/td&gt;
&lt;td&gt;91% fewer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User satisfaction&lt;/td&gt;
&lt;td&gt;6.2/10&lt;/td&gt;
&lt;td&gt;8.9/10&lt;/td&gt;
&lt;td&gt;44% higher&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  When to Use Plan Mode
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Always use for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-file refactors (&amp;gt;3 files)&lt;/li&gt;
&lt;li&gt;Architecture changes&lt;/li&gt;
&lt;li&gt;Database migrations&lt;/li&gt;
&lt;li&gt;API contract changes&lt;/li&gt;
&lt;li&gt;Anything that could cascade into dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-file bug fixes&lt;/li&gt;
&lt;li&gt;Adding logging&lt;/li&gt;
&lt;li&gt;Updating comments/docs&lt;/li&gt;
&lt;li&gt;Simple formatting changes&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Cost Analysis
&lt;/h4&gt;

&lt;p&gt;Average complex task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without planning:&lt;/strong&gt; 124,573 tokens × $3/M = $0.37&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With planning:&lt;/strong&gt; 47,291 tokens × $3/M = $0.14&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings per task:&lt;/strong&gt; $0.23&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 10 complex tasks per day, 5 developers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily savings: $11.50&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly savings: ~$250&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus 18 minutes saved per task = &lt;strong&gt;150 hours/month&lt;/strong&gt; of developer time recovered.&lt;/p&gt;


&lt;h2&gt;
  
  
  Part II: Automated Optimizations (Automatic to 1 Hour Setup)
&lt;/h2&gt;

&lt;p&gt;These leverage Claude Code's built-in features or require minimal configuration.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. MCP Tool Search: 85% Context Reduction (Automatic)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 85% reduction in MCP tool context&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 0 (automatic on Sonnet 4+/Opus 4+)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Automatic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Model Context Protocol (MCP) servers are incredibly powerful. They're also context black holes.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem: Tool Definition Explosion
&lt;/h4&gt;

&lt;p&gt;Real example from a developer on Reddit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; /context

Context Usage: 143k/200k tokens (72%)
├─ System prompt: 3.1k tokens (1.5%)
├─ System tools: 12.4k tokens (6.2%)  
├─ MCP tools: 82.0k tokens (41.0%) ← THE PROBLEM
├─ Messages: 8 tokens (0.0%)
└─ Free space: 12k (5.8%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Before writing a single prompt: 82,000 tokens consumed by MCP tools.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Breaking it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mcp-omnisearch: 20 tools (~14,114 tokens)&lt;/li&gt;
&lt;li&gt;playwright: 21 tools (~13,647 tokens)
&lt;/li&gt;
&lt;li&gt;mcp-sqlite-tools: 19 tools (~13,349 tokens)&lt;/li&gt;
&lt;li&gt;n8n-workflow-builder: 10 tools (~7,018 tokens)&lt;/li&gt;
&lt;li&gt;(And 7 more servers...)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tool includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Function name&lt;/li&gt;
&lt;li&gt;Full description
&lt;/li&gt;
&lt;li&gt;Parameter schemas (JSON)&lt;/li&gt;
&lt;li&gt;Example usage&lt;/li&gt;
&lt;li&gt;Type definitions&lt;/li&gt;
&lt;li&gt;Error handling specs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;67,000 tokens consumed before you ask anything.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  The Solution: MCP Tool Search
&lt;/h4&gt;

&lt;p&gt;Anthropic's Tool Search feature (automatic on Sonnet 4+/Opus 4+) loads tool definitions on-demand instead of upfront.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Person sends request: "Create a GitHub issue for this bug"&lt;/li&gt;
&lt;li&gt;Claude searches available tools: &lt;code&gt;create_github_issue&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Load ONLY that tool's definition&lt;/li&gt;
&lt;li&gt;Execute and return&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of loading 167 tools (72K tokens), Claude loads 1-3 tools (~2K tokens).&lt;/p&gt;

&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Anthropic Engineering Team Study:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Traditional MCP&lt;/th&gt;
&lt;th&gt;Tool Search&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context consumed (50 tools)&lt;/td&gt;
&lt;td&gt;72,000 tokens&lt;/td&gt;
&lt;td&gt;8,700 tokens&lt;/td&gt;
&lt;td&gt;87.9% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context consumed (167 tools)&lt;/td&gt;
&lt;td&gt;191,300 tokens&lt;/td&gt;
&lt;td&gt;8,700 tokens&lt;/td&gt;
&lt;td&gt;95.5% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool selection accuracy&lt;/td&gt;
&lt;td&gt;73%&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;td&gt;22% better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg response latency&lt;/td&gt;
&lt;td&gt;3.2s&lt;/td&gt;
&lt;td&gt;1.1s&lt;/td&gt;
&lt;td&gt;66% faster&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Real user report (Scott Spence):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: 20 tools, 14,214 tokens&lt;/li&gt;
&lt;li&gt;After (consolidated): 8 tools, 5,663 tokens
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduction: 60%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus improved tool selection accuracy because Claude isn't choosing from 20 similar tools.&lt;/p&gt;

&lt;h4&gt;
  
  
  How to Enable
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;It's automatic on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Opus 4.x&lt;/li&gt;
&lt;li&gt;Claude Sonnet 4.x&lt;/li&gt;
&lt;li&gt;When tool definitions exceed 10% of the context window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;No configuration needed.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Secondary Optimization: Consolidate Tools
&lt;/h4&gt;

&lt;p&gt;Even with Tool Search, consolidating related tools helps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search_by_title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search_by_author&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search_by_date&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search_by_tag&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// ... 16 more search variants&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search({ query, filters: { title?, author?, date?, tag? } })&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From 20 tools to 1 tool with rich parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Additional savings: 8,551 tokens&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Cost Impact
&lt;/h4&gt;

&lt;p&gt;For a developer using 4 MCP servers with 50 total tools:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monthly token usage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: 72,000 tokens × 100 sessions × 30 days = 216M tokens&lt;/li&gt;
&lt;li&gt;After: 8,700 tokens × 100 sessions × 30 days = 26.1M tokens&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduction: 189.9M tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At $3 per million tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: $648/month&lt;/li&gt;
&lt;li&gt;After: $78/month&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: $570/month per developer&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  5. Prompt Caching: 81% Cost Reduction (Automatic)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 81% cost reduction, 79% latency improvement&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 0 (automatic)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Automatic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prompt caching is Claude Code's secret weapon. It's the architectural constraint around which the entire product is built around.&lt;/p&gt;
&lt;h4&gt;
  
  
  How It Works
&lt;/h4&gt;

&lt;p&gt;Every Claude Code session re-sends the entire conversation history on every turn:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turn 1:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System prompt (4,000 tokens)
Tool definitions (12,000 tokens)  
CLAUDE.md (800 tokens)
User message (50 tokens)
Total: 16,850 tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Turn 2:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System prompt (4,000 tokens)      ← SAME
Tool definitions (12,000 tokens)  ← SAME
CLAUDE.md (800 tokens)            ← SAME  
Turn 1 messages (500 tokens)      ← NEW
User message (50 tokens)          ← NEW
Total: 17,400 tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without caching, you'd process 16,850 tokens fresh every turn.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Magic: KV Cache Reuse
&lt;/h4&gt;

&lt;p&gt;Anthropic caches the attention calculations (Key-Value tensors) for static content:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turn 1:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Process 16,850 tokens fresh&lt;/li&gt;
&lt;li&gt;Write cache (25% premium): $0.063&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost: $0.063&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Turn 2:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read 16,850 tokens from cache (90% discount): $0.005&lt;/li&gt;
&lt;li&gt;Process 550 new tokens: $0.002&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost: $0.007&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Turn 10:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read 16,850 tokens from cache: $0.005
&lt;/li&gt;
&lt;li&gt;Process 50 new tokens: $0.0002&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost: $0.0052&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Real Performance Data
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's Claude Code Production Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache hit rate: 92%&lt;/li&gt;
&lt;li&gt;Cost reduction vs. no caching: 81%&lt;/li&gt;
&lt;li&gt;Latency reduction (first token): 79%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example: 100K token document QA&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No Caching&lt;/th&gt;
&lt;th&gt;With Caching&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost per turn&lt;/td&gt;
&lt;td&gt;$0.300&lt;/td&gt;
&lt;td&gt;$0.030&lt;/td&gt;
&lt;td&gt;90% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to first token&lt;/td&gt;
&lt;td&gt;11.5s&lt;/td&gt;
&lt;td&gt;2.4s&lt;/td&gt;
&lt;td&gt;79% faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total cost (10 turns)&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$0.48&lt;/td&gt;
&lt;td&gt;84% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Example: Long Coding Session&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;100 turn session with compaction:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No Caching&lt;/th&gt;
&lt;th&gt;With Caching&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total tokens processed&lt;/td&gt;
&lt;td&gt;2,000,000&lt;/td&gt;
&lt;td&gt;2,000,000&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cached reads&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1,840,000 (92%)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fresh processing&lt;/td&gt;
&lt;td&gt;2,000,000&lt;/td&gt;
&lt;td&gt;160,000&lt;/td&gt;
&lt;td&gt;92% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost (Sonnet 4.5)&lt;/td&gt;
&lt;td&gt;$6.00&lt;/td&gt;
&lt;td&gt;$1.15&lt;/td&gt;
&lt;td&gt;81% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  What Gets Cached
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Automatically cached (ordered):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;System prompt (~4K tokens)&lt;/li&gt;
&lt;li&gt;Tool definitions (~12K tokens)
&lt;/li&gt;
&lt;li&gt;CLAUDE.md and project files&lt;/li&gt;
&lt;li&gt;Conversation history (up to most recent turns)&lt;/li&gt;
&lt;li&gt;Recent assistant responses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Cache lifetime:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default: 5 minutes (refreshes on each use)&lt;/li&gt;
&lt;li&gt;Extended (1-hour TTL): Available on Opus 4.5+, Haiku 4.5+, Sonnet 4.5+&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  How to Not Break Caching
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;DON'T:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ Add timestamps to system prompts&lt;/li&gt;
&lt;li&gt;✗ Switch models mid-session (caches are model-specific)
&lt;/li&gt;
&lt;li&gt;✗ Modify tool definitions during the session&lt;/li&gt;
&lt;li&gt;✗ Reorder tool definitions between turns&lt;/li&gt;
&lt;li&gt;✗ Change CLAUDE.md mid-session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DO:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Keep static content at the top&lt;/li&gt;
&lt;li&gt;✓ Append dynamic content at the end
&lt;/li&gt;
&lt;li&gt;✓ Use the same model throughout the session&lt;/li&gt;
&lt;li&gt;✓ Keep tool definitions stable&lt;/li&gt;
&lt;li&gt;✓ Use long sessions (cache stays warm)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Monitoring Your Cache Hit Rate
&lt;/h4&gt;

&lt;p&gt;Look for these patterns in your sessions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast responses after the first turn = cache working&lt;/li&gt;
&lt;li&gt;Consistent pricing per turn = cache working&lt;/li&gt;
&lt;li&gt;Slow first turn, fast rest = optimal&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  6. Context Snapshots: Session State Management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 35-50% reduction in context waste&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 15 minutes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Moderate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Long sessions accumulate cruft. Snapshots let you preserve what matters and discard what doesn't.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;Typical 50-turn session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Turn 1-10: Implemented feature A (relevant)
Turn 11-20: Debugged unrelated CSS issue (irrelevant now)  
Turn 21-30: Fixed bug in feature A (relevant)
Turn 31-40: Explored API docs (no longer needed)
Turn 41-50: Refining feature A (relevant)

Context consumed: 147,293 tokens
Relevant to current work: 47,291 tokens (32%)
Dead weight: 100,002 tokens (68%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Solution
&lt;/h4&gt;

&lt;p&gt;Create lightweight snapshot files:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;task_context.md:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Current Task: Auth Session Refactor&lt;/span&gt;

&lt;span class="gu"&gt;## Goal&lt;/span&gt;
Move from JWT-only to OAuth2 with backward compatibility

&lt;span class="gu"&gt;## Files Modified&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; auth/session.py (JWT logic)
&lt;span class="p"&gt;-&lt;/span&gt; auth/oauth.py (new OAuth handler)  
&lt;span class="p"&gt;-&lt;/span&gt; auth/middleware.py (token validation)

&lt;span class="gu"&gt;## Key Decisions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Dual-write to both systems for 1 week
&lt;span class="p"&gt;-&lt;/span&gt; Feature flag: &lt;span class="sb"&gt;`oauth_migration_enabled`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; JWT format unchanged (prevents mobile app breakage)

&lt;span class="gu"&gt;## Remaining Work&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Add OAuth provider configuration UI
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Write migration script for existing users
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Update API documentation

&lt;span class="gu"&gt;## Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Must support 30min session timeout
&lt;span class="p"&gt;-&lt;/span&gt; Redis key structure must remain compatible  
&lt;span class="p"&gt;-&lt;/span&gt; Cannot break mobile app (v2.3.1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Usage Pattern
&lt;/h4&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Continue working on the auth refactor we discussed 30 turns ago
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;@task_context.md
Continue with OAuth provider configuration UI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tokens sent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long session history: 147,293 tokens&lt;/li&gt;
&lt;li&gt;Snapshot file: 847 tokens&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduction: 99.4%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Advanced: Automated Snapshot Creation
&lt;/h4&gt;

&lt;p&gt;Hook-based approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// .claude/hooks/context-snapshot.js&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;onCompaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Trigger before auto-compaction&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;snapshot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extractTaskSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extractModifiedFiles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;decisions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extractKeyDecisions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;remaining&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extractRemainingWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;task_context.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;formatSnapshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;💾 Snapshot saved before compaction&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Claude hits the compaction threshold (~167K tokens), auto-save the critical state.&lt;/p&gt;

&lt;h4&gt;
  
  
  Real Results
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 50 Long Sessions (&amp;gt;40 turns each)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No Snapshots&lt;/th&gt;
&lt;th&gt;With Snapshots&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context per turn (avg)&lt;/td&gt;
&lt;td&gt;147,293&lt;/td&gt;
&lt;td&gt;51,847&lt;/td&gt;
&lt;td&gt;65% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Info loss at compaction&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Qualitative&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session continuity&lt;/td&gt;
&lt;td&gt;6.1/10&lt;/td&gt;
&lt;td&gt;9.2/10&lt;/td&gt;
&lt;td&gt;51% better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per long session&lt;/td&gt;
&lt;td&gt;$13.24&lt;/td&gt;
&lt;td&gt;$4.67&lt;/td&gt;
&lt;td&gt;65% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Part III: Intermediate Techniques (1-4 Hours Setup)
&lt;/h2&gt;

&lt;p&gt;These require engineering work but deliver substantial improvements.&lt;/p&gt;




&lt;h3&gt;
  
  
  7. Context Indexing + RAG: 40-90% Token Reduction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 40-60% reduction (standard), 90%+ for large codebases&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 2-4 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Moderate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your codebase exceeds Claude's context window, you need retrieval instead of brute-force inclusion.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;Large codebase reality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total files: 2,847
Total tokens: 3,400,000
Context window: 200,000
Fit in context: 5.9%

Traditional approach: 
"Please figure out which 5.9% to load" ← Claude can't do this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Solution: Semantic Search + Indexing
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── src/ (2,847 files, 3.4M tokens)
├── index/
│   ├── code_embeddings.db (vector search)
│   ├── file_metadata.json (quick lookup)  
│   └── dependency_graph.json (relationships)
└── .claude/
    └── retrieval_config.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;file_metadata.json:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"auth/session.py"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"functions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"create_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"validate_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="s2"&gt;"refresh_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"revoke_session"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"redis"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"jwt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"auth/models.py"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"imports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"auth/models.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"shared/crypto.py"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"size_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1247&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"last_modified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-10T14:23:11Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Retrieval Workflow
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;User prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Fix the session refresh bug where tokens expire immediately"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Behind the scenes:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract keywords: &lt;code&gt;["session", "refresh", "token", "expire"]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Search code_embeddings.db → Top 5 files:

&lt;ul&gt;
&lt;li&gt;auth/session.py (similarity: 0.94)&lt;/li&gt;
&lt;li&gt;auth/token.py (similarity: 0.89)&lt;/li&gt;
&lt;li&gt;auth/middleware.py (similarity: 0.82)
&lt;/li&gt;
&lt;li&gt;redis/session_store.py (similarity: 0.78)&lt;/li&gt;
&lt;li&gt;tests/auth/test_session.py (similarity: 0.71)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Load dependency_graph → Find related: auth/models.py&lt;/li&gt;
&lt;li&gt;Total files loaded: 6 files (7,429 tokens)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Context sent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instead of: @entire_codebase (3.4M tokens)
Send: 6 relevant files (7,429 tokens)
Reduction: 99.8%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Implementation: Minimum Viable RAG
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# index_builder.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_codebase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_dir&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build semantic index of codebase&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_dir&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.ts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.tsx&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
                &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

                &lt;span class="c1"&gt;# Extract metadata
&lt;/span&gt;                &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;functions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_functions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;imports&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_imports&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

                &lt;span class="c1"&gt;# Create embedding
&lt;/span&gt;                &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
                &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find k most relevant files&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Simple cosine similarity (use FAISS for production)
&lt;/span&gt;    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

    &lt;span class="c1"&gt;# Return top k
&lt;/span&gt;    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Build once
&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;index_codebase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./src&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;save_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./index/code_embeddings.db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Query many times
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session refresh token expiry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;files_to_load&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Send to Claude
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files_to_load&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Anthropic Research: Contextual Retrieval Study&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Retrieval Strategy&lt;/th&gt;
&lt;th&gt;Retrieval Failures&lt;/th&gt;
&lt;th&gt;Combined w/ Rerank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic RAG&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Contextual Embeddings&lt;/td&gt;
&lt;td&gt;-35%&lt;/td&gt;
&lt;td&gt;-49%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ BM25 Hybrid&lt;/td&gt;
&lt;td&gt;-42%&lt;/td&gt;
&lt;td&gt;-58%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Contextual + BM25 + Rerank&lt;/td&gt;
&lt;td&gt;-49%&lt;/td&gt;
&lt;td&gt;-67%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Production Example: 500K Token Codebase&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Load Everything&lt;/th&gt;
&lt;th&gt;Indexed RAG&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per query&lt;/td&gt;
&lt;td&gt;500,000&lt;/td&gt;
&lt;td&gt;12,000&lt;/td&gt;
&lt;td&gt;97.6% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per query&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;td&gt;$0.036&lt;/td&gt;
&lt;td&gt;97.6% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response time&lt;/td&gt;
&lt;td&gt;Exceeds limit&lt;/td&gt;
&lt;td&gt;2.3s&lt;/td&gt;
&lt;td&gt;Works vs fails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;N/A (too large)&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;td&gt;Enables use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  When to Use RAG
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Use RAG when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Codebase &amp;gt;50K lines&lt;/li&gt;
&lt;li&gt;✓ Queries are specific ("fix X in file Y")&lt;/li&gt;
&lt;li&gt;✓ You need to scale beyond context window&lt;/li&gt;
&lt;li&gt;✓ Cost per query matters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip RAG when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ Entire codebase &amp;lt;200K tokens (use prompt caching instead)&lt;/li&gt;
&lt;li&gt;✗ Queries are broad ("refactor entire architecture")&lt;/li&gt;
&lt;li&gt;✗ You need to see relationships across entire codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Anthropic guidance:&lt;/strong&gt; For codebases under 200K tokens (~500 pages), prompt caching alone is 90% cheaper than RAG.&lt;/p&gt;




&lt;h3&gt;
  
  
  8. Task Decomposition: 45-60% Fewer Tokens
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 45-60% reduction via cognitive chunking&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 0 (prompt discipline)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Easy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Large, vague tasks force Claude to load huge contexts. Decomposition keeps contexts tight.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Anti-Pattern
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Improve the application"

Claude's internal reasoning:
- What does "improve" mean?
- Which part of the application?
- Performance? UX? Security? Code quality?
- Load the entire codebase to understand the scope
- Ask 5 clarifying questions
- Wait for answers
- Finally start work

Tokens wasted: 287,429
Turns wasted: 8
Time wasted: 23 minutes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  The Pattern
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Task 1: Extract magic numbers to constants in auth/session.py"
Claude: [Loads 1 file, makes changes, done]
Tokens: 3,847

User: "Task 2: Add error handling for Redis connection failures in session store"  
Claude: [Loads 2 files, implements, done]
Tokens: 5,291

User: "Task 3: Write integration tests for session refresh flow"
Claude: [Loads test framework + 3 files, done]
Tokens: 8,429

Total tokens: 17,567
Total time: 12 minutes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Decomposition Framework
&lt;/h4&gt;

&lt;p&gt;Break tasks into:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1: Bounded (Single File)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Add logging to function X"&lt;/li&gt;
&lt;li&gt;"Fix typo in README"&lt;/li&gt;
&lt;li&gt;"Extract constant from line 47"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Level 2: Local (2-5 Related Files)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Add error handling to auth flow"&lt;/li&gt;
&lt;li&gt;"Update API contract for endpoint Y"
&lt;/li&gt;
&lt;li&gt;"Refactor database query in service Z"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Level 3: Cross-Cutting (5-15 Files)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Implement feature flag for OAuth migration"&lt;/li&gt;
&lt;li&gt;"Add caching layer to API endpoints"&lt;/li&gt;
&lt;li&gt;"Update error responses across all controllers"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Level 4: Architectural (&amp;gt;15 Files)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These need Plan Mode + Decomposition:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Main: "Migrate from REST to GraphQL"

  Sub-tasks:
  1. Set up GraphQL schema
  2. Implement resolvers for the User entity  
  3. Implement resolvers for the Posts entity
  4. Add authentication middleware
  5. Update frontend queries
  6. Deprecate REST endpoints
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 200 Tasks Across 10 Projects&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Scope&lt;/th&gt;
&lt;th&gt;Tokens (Vague)&lt;/th&gt;
&lt;th&gt;Tokens (Decomposed)&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single file&lt;/td&gt;
&lt;td&gt;23,847&lt;/td&gt;
&lt;td&gt;3,291&lt;/td&gt;
&lt;td&gt;86%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local (2-5 files)&lt;/td&gt;
&lt;td&gt;67,429&lt;/td&gt;
&lt;td&gt;18,847&lt;/td&gt;
&lt;td&gt;72%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-cutting&lt;/td&gt;
&lt;td&gt;187,291&lt;/td&gt;
&lt;td&gt;74,429&lt;/td&gt;
&lt;td&gt;60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architectural&lt;/td&gt;
&lt;td&gt;547,293&lt;/td&gt;
&lt;td&gt;243,847&lt;/td&gt;
&lt;td&gt;55%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Average across all tasks: 58% reduction&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Practical Example
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Bad:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Our authentication is insecure, please fix it"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Task 1: Upgrade bcrypt rounds from 10 to 12 in auth/crypto.py
Task 2: Add rate limiting to login endpoint (5 attempts per 15min)
Task 3: Implement CSRF tokens for session creation
Task 4: Add security headers to auth responses"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear scope&lt;/li&gt;
&lt;li&gt;Single concern
&lt;/li&gt;
&lt;li&gt;Testable outcome&lt;/li&gt;
&lt;li&gt;Minimal context needed&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  9. Hooks and Guardrails: Prevent Token Waste
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 15-25% reduction via prevention&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 1-2 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Moderate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stop Claude before it burns tokens going down forbidden paths.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;Repeated violations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session 1: Claude modifies the migration file
You: "Never touch migrations!"

Session 2: Claude modifies the migration file  
You: "I told you never to touch migrations!"

Session 3: Claude modifies the migration file
You: [Frustrated]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each violation costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2-4 turns to explain why it's wrong&lt;/li&gt;
&lt;li&gt;Reverting changes
&lt;/li&gt;
&lt;li&gt;Re-implementing correctly&lt;/li&gt;
&lt;li&gt;15,000-30,000 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Solution: Preprocessor Hooks
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// .claude/hooks/pre-edit.js&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;beforeEdit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Prevent migration modifications&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;migrations/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;🚫 Migration files are immutable.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Create a NEW migration instead:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;`python manage.py makemigrations`&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Prevent .env modifications&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.env&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;🚫 Never commit environment files.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Update .env.example instead.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Prevent console.log in production code&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;console.log&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; 
      &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;🚫 Use structured logging:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;import { logger } from "./logger";&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logger.info("message", { data });&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Prevent direct DB access&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/db&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;query|db&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;exec/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; 
      &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;repositories/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;🚫 Use repository pattern:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;await userRepository.find({ id })&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;  
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Allow edit&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Violations caught before code is written&lt;/li&gt;
&lt;li&gt;Clear guidance provided&lt;/li&gt;
&lt;li&gt;No tokens wasted on wrong implementations&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Advanced: Content-Aware Validation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;beforeEdit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Require tests for new functions&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;export function&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; 
      &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;functionName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractFunctionName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;testFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`tests/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.test.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fileExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;testFile&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`🚫 New function '&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' needs tests.\n`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="s2"&gt;`Create: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;testFile&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Require type hints (Python)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.py&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; 
      &lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/def &lt;/span&gt;&lt;span class="se"&gt;\w&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\([^&lt;/span&gt;&lt;span class="sr"&gt;)&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\)(?!&lt;/span&gt;&lt;span class="sr"&gt;.*-&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;🚫 All functions must have type hints:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;def process(data: dict) -&amp;gt; Result:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 6 Months, 50 Developers&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No Guardrails&lt;/th&gt;
&lt;th&gt;With Guardrails&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Policy violations&lt;/td&gt;
&lt;td&gt;847&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;97% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg tokens wasted per violation&lt;/td&gt;
&lt;td&gt;24,291&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;100% savings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total tokens saved&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;20M+&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer frustration&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Qualitative&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Cost impact (team of 50):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token waste from violations: 20M tokens&lt;/li&gt;
&lt;li&gt;At $3/M tokens: &lt;strong&gt;$60,000 saved over 6 months&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Plus developer time saved&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  10. Model Tiering: 40-60% Cost Reduction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 40-60% cost reduction via right-sizing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 30 minutes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Easy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not every task needs Opus. Most don't even need Sonnet.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Anti-Pattern
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/model opus
[Uses Opus for everything all day]

Tasks today:
- Format JSON response (Haiku: $0.0001, Opus: $0.0050)
- Write docstring (Haiku: $0.0002, Opus: $0.0075)
- Fix typo (Haiku: $0.0001, Opus: $0.0030)
- Complex architectural refactor (Opus: $0.8450) ← Correct
- Add console.log (Haiku: $0.0001, Opus: $0.0045)

Total cost: $0.8651
Optimal cost: $0.8459
Waste: $0.0192
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Doesn't look like much? For 20 sessions/day, 5 developers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily waste: $1.92
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly waste: $41&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now extrapolate to 100 developers...&lt;/p&gt;
&lt;h4&gt;
  
  
  The Pattern: Task-Based Model Selection
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// .claude/model-selector.js&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;selectModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;taskComplexity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyzeComplexity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Haiku: Simple, bounded tasks&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;format&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;simple-fix&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskComplexity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Sonnet: Standard coding tasks  &lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;feature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;refactor&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bug-fix&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskComplexity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Opus: Complex architecture&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;architecture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;system-design&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;taskComplexity&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Automatic Tiering Examples
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Haiku (25-35% of tasks):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Formatting code&lt;/li&gt;
&lt;li&gt;Writing documentation&lt;/li&gt;
&lt;li&gt;Simple refactors (rename variable, extract constant)&lt;/li&gt;
&lt;li&gt;Adding logging/comments&lt;/li&gt;
&lt;li&gt;Fixing obvious typos&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost: $0.25/$1.25 per M tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sonnet (55-65% of tasks):&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementing features&lt;/li&gt;
&lt;li&gt;Bug fixes&lt;/li&gt;
&lt;li&gt;Unit tests&lt;/li&gt;
&lt;li&gt;API integrations&lt;/li&gt;
&lt;li&gt;Database queries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost: $3/$15 per M tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Opus (5-10% of tasks):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture decisions&lt;/li&gt;
&lt;li&gt;Complex refactors
&lt;/li&gt;
&lt;li&gt;System design&lt;/li&gt;
&lt;li&gt;Performance optimization&lt;/li&gt;
&lt;li&gt;Security reviews&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost: $15/$75 per M tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Hybrid: OpusPlan Alias
&lt;/h4&gt;

&lt;p&gt;Best of both worlds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/model opusplan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;Opus&lt;/strong&gt; for Plan Mode (architecture/reasoning)&lt;/li&gt;
&lt;li&gt;Switches to &lt;strong&gt;Sonnet&lt;/strong&gt; for implementation&lt;/li&gt;
&lt;li&gt;Get Opus-quality planning, Sonnet-priced execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example task:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "Refactor auth system to OAuth2"


- Analyze current architecture
- Identify dependencies  
- Propose migration strategy
- Create an implementation plan


- Write OAuth provider
- Update middleware
- Migrate session logic
- Write tests

Total: $0.57
vs Opus-only: $1.23
Savings: 54%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 1,000 Tasks, Optimal Model Selection&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model Distribution&lt;/th&gt;
&lt;th&gt;Tasks&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Haiku-appropriate&lt;/td&gt;
&lt;td&gt;280&lt;/td&gt;
&lt;td&gt;42M&lt;/td&gt;
&lt;td&gt;$18.90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet-appropriate&lt;/td&gt;
&lt;td&gt;650&lt;/td&gt;
&lt;td&gt;178M&lt;/td&gt;
&lt;td&gt;$534.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus-appropriate&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;23M&lt;/td&gt;
&lt;td&gt;$345.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (Optimized)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;243M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$897.90&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Same tasks, all on Opus:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total: $3,645.00&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Waste: $2,747.10 (75%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Same tasks, all on Sonnet:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total: $729.00&lt;/li&gt;
&lt;li&gt;Quality degradation on complex tasks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Suboptimal: Works but misses nuance&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part IV: Advanced Architectures (4+ Hours Setup)
&lt;/h2&gt;

&lt;p&gt;These are production-grade optimizations for teams serious about scale.&lt;/p&gt;




&lt;h3&gt;
  
  
  11. Multi-Agent Architecture: 50-70% Context Reduction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 50-70% reduction via domain isolation&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 8-16 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Advanced&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of one agent seeing everything, use specialized agents that see only their domain.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem: Monolithic Context
&lt;/h4&gt;

&lt;p&gt;Single-agent approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Debug the API endpoint performance issue"

Claude loads:
- Frontend code (React, 847 files)
- Backend code (FastAPI, 423 files)  
- Database schemas (127 files)
- Infrastructure configs (89 files)
- Test suites (1,247 files)
- Documentation (347 files)

Total: 3,080 files, 2.4M tokens
Relevant: ~12 files, 18K tokens
Efficiency: 0.75%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Solution: Agent Specialization
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Orchestrator
    ↓
    ├─→ Search Agent (finds relevant code)
    ├─→ Analysis Agent (identifies issue)  
    ├─→ Code Agent (implements fix)
    └─→ Test Agent (validates solution)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Each agent sees only its domain:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search Agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;index/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find relevant files using semantic search&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;vector_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context: 5K tokens (index metadata only)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis Agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;profiling_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analysis_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze performance bottleneck&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deep_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;root_cause&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context: 25K tokens (only search results + metrics)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;code_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Implement the fix&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;fix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_fix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context: 18K tokens (only affected files)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test Agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modified_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_suite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Validate the fix&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
impleme&lt;br&gt;
Context: 15K tokens (only relevant tests)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total context across all agents: 63K tokens&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
vs monolithic: 2.4M tokens&lt;br&gt;
&lt;strong&gt;Reduction: 97.4%&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Real Implementation
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# orchestrator.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentOrchestrator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SearchAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnalysisAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TestAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 1: Find relevant code
&lt;/span&gt;        &lt;span class="n"&gt;relevant_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Analyze issue  
&lt;/span&gt;        &lt;span class="n"&gt;root_cause&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;diagnose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;relevant_files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_request&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 3: Generate fix
&lt;/span&gt;        &lt;span class="n"&gt;fix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;implement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;relevant_files&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 4: Validate
&lt;/span&gt;        &lt;span class="n"&gt;test_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;test_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Retry with insights
&lt;/span&gt;            &lt;span class="n"&gt;fix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;implement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;relevant_files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;previous_attempt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;test_failures&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;test_results&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Production Case Study: E-commerce Platform&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Monolithic Agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avg context per request: 487,000 tokens&lt;/li&gt;
&lt;li&gt;Cost per request: $1.46&lt;/li&gt;
&lt;li&gt;Success rate: 73%&lt;/li&gt;
&lt;li&gt;Avg time: 47s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-Agent (4 agents):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avg context across all agents: 124,000 tokens
&lt;/li&gt;
&lt;li&gt;Cost per request: $0.37&lt;/li&gt;
&lt;li&gt;Success rate: 89%&lt;/li&gt;
&lt;li&gt;Avg time: 23s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context: 74% reduction&lt;/li&gt;
&lt;li&gt;Cost: 75% cheaper&lt;/li&gt;
&lt;li&gt;Success: 22% better&lt;/li&gt;
&lt;li&gt;Speed: 51% faster&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  When to Use Multi-Agent
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Use when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Codebase &amp;gt;100K lines&lt;/li&gt;
&lt;li&gt;✓ Clear domain boundaries (frontend/backend/infra)
&lt;/li&gt;
&lt;li&gt;✓ Complex workflows with multiple steps&lt;/li&gt;
&lt;li&gt;✓ Team has engineering bandwidth for setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ Small codebase (&amp;lt;10K lines)&lt;/li&gt;
&lt;li&gt;✗ Monolithic architecture (everything coupled)&lt;/li&gt;
&lt;li&gt;✗ Simple, linear workflows&lt;/li&gt;
&lt;li&gt;✗ Quick prototyping phase&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  12. Token Budgeting: Explicit Resource Management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 20-35% reduction via enforcement&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 4-8 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Advanced&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Make token limits a first-class constraint in your architecture.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Framework
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// token-budget.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BUDGETS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;project_rules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tool_definitions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;retrieved_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;response_budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;safety_margin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TOTAL_BUDGET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="nx"&gt;_300&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Leaves 157K for conversation&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenBudgetEnforcer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current_usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;countTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;BUDGETS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BudgetExceededError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; tokens exceeds budget of &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;getRemainingBudget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current_usage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;TOTAL_BUDGET&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;used&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;trimToFit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;BUDGETS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;truncateToTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Usage in Practice
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before sending to Claude
&lt;/span&gt;&lt;span class="n"&gt;budgeter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TokenBudgetEnforcer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Enforce budgets
&lt;/span&gt;&lt;span class="n"&gt;budgeter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;budgeter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;project_rules&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;claude_md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;budgeter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_definitions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Trim retrieved context if needed
&lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_codebase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retrieved_trimmed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;budgeter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimToFit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retrieved_context&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;retrieved&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check remaining
&lt;/span&gt;&lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;budgeter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getRemainingBudget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Budget remaining: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;remaining&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Send to Claude
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;retrieved_trimmed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUDGETS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response_budget&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Auto-Trimming Strategies
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Strategy 1: Priority-Based Truncation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;trim_by_priority&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Keep highest priority items within budget&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;sorted_contexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sorted_contexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strategy 2: Hierarchical Summarization&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hierarchical_trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Summarize least important sections first&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;sections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;split_into_sections&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Find least important section
&lt;/span&gt;        &lt;span class="n"&gt;least_important&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;importance_score&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Summarize it
&lt;/span&gt;        &lt;span class="n"&gt;least_important&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;least_important&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;reconstruct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Case Study: Enforced Budgets on 500 Sessions&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Avg Without Budget&lt;/th&gt;
&lt;th&gt;Avg With Budget&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System prompt&lt;/td&gt;
&lt;td&gt;4,200&lt;/td&gt;
&lt;td&gt;3,800&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project rules&lt;/td&gt;
&lt;td&gt;2,100&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;62%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieved context&lt;/td&gt;
&lt;td&gt;45,000&lt;/td&gt;
&lt;td&gt;15,000&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total static&lt;/td&gt;
&lt;td&gt;51,300&lt;/td&gt;
&lt;td&gt;19,600&lt;/td&gt;
&lt;td&gt;62%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Cost impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session cost (no budgets): $0.82&lt;/li&gt;
&lt;li&gt;Session cost (with budgets): $0.31
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: 62% per session&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 100 sessions/day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Savings: ~$1,530/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  13. Markdown Knowledge Bases: Structured Context
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 25-40% better retrieval accuracy&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 4-6 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Moderate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LLMs excel with well-structured markdown. Use it.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem: Unstructured Dumps
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API Documentation (wall of text, 45K tokens)

The create_user function takes a username, which should be a string and a password which should be a string and an optional email which defaults to null and returns a User object or throws ValidationError if username is taken or InvalidPassword if password is too weak and the password must be at least 8 characters with one number...

[continues for 45,000 tokens]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Claude must parse this linguistic soup to extract structure.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Solution: Semantic Markdown
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# API Contracts&lt;/span&gt;

&lt;span class="gu"&gt;## User Management&lt;/span&gt;

&lt;span class="gu"&gt;### create_user&lt;/span&gt;

&lt;span class="gs"&gt;**Endpoint:**&lt;/span&gt; &lt;span class="sb"&gt;`POST /api/users`&lt;/span&gt;

&lt;span class="gs"&gt;**Parameters:**&lt;/span&gt;
| Name | Type | Required | Default | Constraints |
|------|------|----------|---------|-------------|
| username | string | Yes | - | 3-20 chars, alphanumeric |
| password | string | Yes | - | Min 8 chars, 1 number, 1 special |
| email | string | No | null | Valid email format |

&lt;span class="gs"&gt;**Returns:**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Success (201):**&lt;/span&gt; User object
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Error (400):**&lt;/span&gt; ValidationError
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Error (409):**&lt;/span&gt; UsernameExists

&lt;span class="gs"&gt;**Example:**&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
curl -X POST /api/users \&lt;br&gt;
  -H "Content-Type: application/json" \&lt;br&gt;
  -d '{"username": "john", "password": "Secret123!", "email": "&lt;a href="mailto:john@example.com"&gt;john@example.com&lt;/a&gt;"}'&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
**Related:**
- [Authentication Flow](./auth-flow.md)
- [User Model Schema](./models.md#user)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tokens: 847&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
vs unstructured: 3,429&lt;br&gt;
&lt;strong&gt;Reduction: 75%&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Plus: Claude can now quickly scan the table, understand constraints, and find related docs.&lt;/p&gt;
&lt;h4&gt;
  
  
  Knowledge Base Structure
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docs/
├── api/
│   ├── _index.md (overview + quick links)
│   ├── auth.md
│   ├── users.md
│   └── posts.md
├── architecture/
│   ├── _index.md  
│   ├── data-flow.md
│   ├── services.md
│   └── infrastructure.md
├── data/
│   ├── models.md
│   ├── migrations.md
│   └── schemas.md
└── processes/
    ├── deployment.md
    ├── testing.md
    └── debugging.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;br&gt;
markdown&lt;/p&gt;

&lt;p&gt;Each file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Under 500 lines&lt;/strong&gt; (retrievable as single chunk)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear hierarchy&lt;/strong&gt; (H1 → H2 → H3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-referenced&lt;/strong&gt; (links to related docs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scannable&lt;/strong&gt; (tables, code blocks, lists)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Template: Technical Documentation
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# [Component Name]&lt;/span&gt;

&lt;span class="gu"&gt;## Overview&lt;/span&gt;
[2-3 sentence summary]

&lt;span class="gu"&gt;## Quick Reference&lt;/span&gt;
| Aspect | Value |
|--------|-------|
| Status | Production |
| Owner | @team-name |
| Dependencies | service-a, service-b |
| Repo | github.com/org/repo |

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
[Diagram or description]

&lt;span class="gu"&gt;## Key Concepts&lt;/span&gt;
&lt;span class="gu"&gt;### [Concept 1]&lt;/span&gt;
[Explanation]

&lt;span class="gu"&gt;### [Concept 2]&lt;/span&gt;
[Explanation]

&lt;span class="gu"&gt;## Common Operations&lt;/span&gt;
&lt;span class="gu"&gt;### [Operation 1]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;br&gt;
bash&lt;/p&gt;
&lt;h1&gt;
  
  
  Command
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**When to use:** [scenario]
**Note:** [gotcha]

## Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| [Issue] | [Root cause] | [Solution] |

## Related
- [Doc 1](./related.md)
- [Doc 2](./other.md)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;br&gt;
markdown&lt;/p&gt;
&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 50 Documentation Sets&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Unstructured&lt;/th&gt;
&lt;th&gt;Markdown Structured&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg tokens per doc&lt;/td&gt;
&lt;td&gt;12,400&lt;/td&gt;
&lt;td&gt;3,800&lt;/td&gt;
&lt;td&gt;69% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval accuracy&lt;/td&gt;
&lt;td&gt;71%&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;td&gt;32% better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude comprehension&lt;/td&gt;
&lt;td&gt;6.8/10&lt;/td&gt;
&lt;td&gt;9.1/10&lt;/td&gt;
&lt;td&gt;34% better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to answer&lt;/td&gt;
&lt;td&gt;8.3s&lt;/td&gt;
&lt;td&gt;2.1s&lt;/td&gt;
&lt;td&gt;75% faster&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  14. Context Compression: Emergency Pressure Relief
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 70-92% reduction (extreme cases)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 2-4 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Moderate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sometimes you genuinely need to include a large document. Compress it first.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;User uploads 100-page technical specification:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Original: 87,429 tokens&lt;/li&gt;
&lt;li&gt;Context window: 200,000 tokens&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consumes: 43.7% of available context&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After a few conversation turns, you're compacting.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Solution: LLM-Powered Compression
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compress_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Compress document to target_ratio of original size&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Compress this technical document for future LLM use.

TARGET: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;target_ratio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens (from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)

PRESERVE:
- Technical specifications
- API contracts
- Constraints and requirements  
- Code examples
- Numerical data

REMOVE:
- Narrative explanations
- Background context
- Redundant examples
- Rhetorical questions
- Transitional phrases

FORMAT:
- Use tables for structured data
- Use bullet points for lists
- Keep code blocks intact
- Maintain heading hierarchy

DOCUMENT:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

COMPRESSED VERSION:
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;target_ratio&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;compressed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Real Example
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Original (5,847 tokens):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Authentication System&lt;/span&gt;

Our authentication system has evolved significantly over the years. 
Initially, we used simple session cookies, but as our user base grew 
and security requirements became more stringent, we transitioned to 
a more robust JWT-based approach. This decision was made after 
careful consideration of the trade-offs...

[continues for 5,847 tokens]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Compressed (934 tokens):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Auth System&lt;/span&gt;

&lt;span class="gs"&gt;**Stack:**&lt;/span&gt; JWT tokens, Redis sessions, OAuth2
&lt;span class="gs"&gt;**Token TTL:**&lt;/span&gt; 30min (configurable)
&lt;span class="gs"&gt;**Refresh:**&lt;/span&gt; Auto-refresh within 5min of expiry

&lt;span class="gu"&gt;## Endpoints&lt;/span&gt;
| Endpoint | Method | Auth | Purpose |
|----------|--------|------|---------|
| /auth/login | POST | No | Issue token |
| /auth/refresh | POST | Token | Renew token |
| /auth/logout | POST | Token | Revoke token |

&lt;span class="gu"&gt;## Token Structure&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
json&lt;br&gt;
{&lt;br&gt;
  "sub": "user_id",&lt;br&gt;
  "exp": 1234567890,&lt;br&gt;
  "roles": ["user", "admin"]&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Constraints
- Max sessions per user: 5
- Password: 8+ chars, 1 number, 1 special
- Rate limit: 5 attempts/15min

## See Also
→ [OAuth Flow](./oauth.md)
→ [Session Store](./redis.md)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduction: 84%&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Compression Strategies
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Strategy 1: Hierarchical Summarization&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hierarchical_compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Compress sections by priority&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Rank sections by importance
&lt;/span&gt;    &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rank_by_importance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;target_tokens&lt;/span&gt;
    &lt;span class="n"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_critical&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Keep critical sections full
&lt;/span&gt;            &lt;span class="n"&gt;compressed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Compress non-critical to 30%
&lt;/span&gt;            &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;compressed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
            &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;
        &lt;span class="c1"&gt;# else: drop section entirely
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;compressed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strategy 2: Entity Extraction&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_key_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract only structured data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;extracted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;apis&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_api_specs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;constraints&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_constraints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;examples&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_code_examples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metrics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;extract_numbers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Reconstruct as structured markdown
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;format_as_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extracted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: Large Document Compression&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Document Type&lt;/th&gt;
&lt;th&gt;Original&lt;/th&gt;
&lt;th&gt;Compressed&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API Specs&lt;/td&gt;
&lt;td&gt;45K&lt;/td&gt;
&lt;td&gt;8K&lt;/td&gt;
&lt;td&gt;82%&lt;/td&gt;
&lt;td&gt;97%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture Docs&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;6K&lt;/td&gt;
&lt;td&gt;81%&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical RFCs&lt;/td&gt;
&lt;td&gt;67K&lt;/td&gt;
&lt;td&gt;12K&lt;/td&gt;
&lt;td&gt;82%&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal Policies&lt;/td&gt;
&lt;td&gt;89K&lt;/td&gt;
&lt;td&gt;23K&lt;/td&gt;
&lt;td&gt;74%&lt;/td&gt;
&lt;td&gt;88%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Average: 80% reduction, 92.5% information retention&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  When to Compress
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Compress when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Document &amp;gt;10K tokens
&lt;/li&gt;
&lt;li&gt;✓ Contains redundancy/narrative&lt;/li&gt;
&lt;li&gt;✓ Will be referenced multiple times&lt;/li&gt;
&lt;li&gt;✓ Can tolerate slight information loss&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Don't compress when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ Document &amp;lt;5K tokens (overhead not worth it)&lt;/li&gt;
&lt;li&gt;✗ Legal/contractual text (preserve exact wording)&lt;/li&gt;
&lt;li&gt;✗ Code (compression breaks syntax)&lt;/li&gt;
&lt;li&gt;✗ One-time use (compression cost &amp;gt; retrieval cost)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  15. Tool-First Workflows: Offload Processing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 60-85% reduction via preprocessing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 4-8 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Advanced&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude shouldn't process raw data. Tools should.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Anti-Pattern
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze this CSV of 200,000 transactions and find anomalies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Uploads&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;487&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;Claude&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tries&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="n"&gt;entire&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="n"&gt;into&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;exceeded&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Reads&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;misses&lt;/span&gt; &lt;span class="mi"&gt;87&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  The Pattern
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# MCP Tool
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_transactions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze transaction CSV for anomalies&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Pre-process the data
&lt;/span&gt;    &lt;span class="n"&gt;anomalies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.99&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;  
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;merchant&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;KNOWN_FRAUD_MERCHANTS&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Return summary, not raw data
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_transactions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anomaly_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;anomalies&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anomaly_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;anomalies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;top_anomalies&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;anomalies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nlargest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;suspicious_merchants&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;anomalies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;merchant&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Analyze this CSV for anomalies"
[Uploads file]

Claude: [Calls tool]
Tool returns: {
  total_transactions: 200000,
  anomaly_count: 847,
  anomaly_rate: 0.004235,
  top_anomalies: [10 records],
  suspicious_merchants: {merchant: count}
}

Claude: "Found 847 anomalies (0.42% of transactions).
Top concerns:
- ABC Corp: 124 high-value transactions
- XYZ Ltd: 89 negative amounts
- ..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tokens consumed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Without tool: 487,000 (CSV) + analysis&lt;/li&gt;
&lt;li&gt;With tool: 847 (summary only)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduction: 99.8%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Tool Design Patterns
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: Aggregate Before Return&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query DB but return summary, not raw rows&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Don't return 10,000 rows
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;row_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sample&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# First 10 for inspection
&lt;/span&gt;        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;summary_stats&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;calculate_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;distribution&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;create_histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pattern 2: Progressive Disclosure&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search logs with pagination&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_matches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;has_more&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;next_page_token&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;generate_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pattern 3: Pre-Filter&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_errors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get filtered errors, not all logs&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;since&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unique_errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;group_by_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;top_5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Production Example: Log Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Send raw logs to Claude&lt;/td&gt;
&lt;td&gt;2.4M&lt;/td&gt;
&lt;td&gt;$7.20&lt;/td&gt;
&lt;td&gt;Timeout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool pre-processes&lt;/td&gt;
&lt;td&gt;4.8K&lt;/td&gt;
&lt;td&gt;$0.014&lt;/td&gt;
&lt;td&gt;2.3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Improvement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Works vs fails&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Another Example: Codebase Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Success Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Send all files&lt;/td&gt;
&lt;td&gt;487K&lt;/td&gt;
&lt;td&gt;$1.46&lt;/td&gt;
&lt;td&gt;34%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool indexes + searches&lt;/td&gt;
&lt;td&gt;12K&lt;/td&gt;
&lt;td&gt;$0.036&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Improvement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;97.5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;97.5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+60pct&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  16. Incremental Memory: Conversation Compaction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Impact: 40-65% reduction in conversation overhead&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup time: 2-3 hours&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Difficulty: Moderate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Long conversations accumulate dead weight. Summarize continuously.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Problem
&lt;/h4&gt;

&lt;p&gt;Turn 50 of a long session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Context breakdown:
- System prompt: 4K tokens
- Tools: 12K tokens
- CLAUDE.md: 800 tokens
- Turn 1-10: 23K tokens (old debugging, no longer relevant)
- Turn 11-20: 18K tokens (implemented feature A, completed)
- Turn 21-30: 31K tokens (discussed approach, decided)
- Turn 31-40: 27K tokens (implemented feature B, completed)
- Turn 41-50: 19K tokens (current work)

Total: 134.8K tokens
Relevant to turn 50: ~25K tokens (18.5%)
Dead weight: 109.8K tokens (81.5%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Solution: Rolling Summarization
&lt;/h4&gt;

&lt;p&gt;Create a summary file that evolves:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;conversation_memory.md (Turn 20):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Session Summary&lt;/span&gt;

&lt;span class="gu"&gt;## Completed&lt;/span&gt;
✅ Fixed session refresh bug (auth/session.py)
&lt;span class="p"&gt;-&lt;/span&gt; Root cause: Timer not reset on activity  
&lt;span class="p"&gt;-&lt;/span&gt; Solution: Reset timer in middleware
&lt;span class="p"&gt;-&lt;/span&gt; Tests: Added test_session_refresh_timing

&lt;span class="gu"&gt;## Current Task&lt;/span&gt;
🔄 Implementing OAuth2 migration
&lt;span class="p"&gt;-&lt;/span&gt; Status: 40% complete
&lt;span class="p"&gt;-&lt;/span&gt; Files: auth/oauth.py, auth/session.py
&lt;span class="p"&gt;-&lt;/span&gt; Next: Add provider UI

&lt;span class="gu"&gt;## Key Decisions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Keep JWT format unchanged (mobile app compatibility)
&lt;span class="p"&gt;-&lt;/span&gt; Dual-write for 1 week migration period
&lt;span class="p"&gt;-&lt;/span&gt; Feature flag: oauth_migration_enabled

&lt;span class="gu"&gt;## Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Cannot break existing sessions
&lt;span class="p"&gt;-&lt;/span&gt; Must support 30min timeout
&lt;span class="p"&gt;-&lt;/span&gt; Redis key structure unchanged
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;conversation_memory.md (Turn 50):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Session Summary  &lt;/span&gt;

&lt;span class="gu"&gt;## Completed&lt;/span&gt;
✅ OAuth2 migration (auth/oauth.py, auth/session.py)
&lt;span class="p"&gt;-&lt;/span&gt; Dual-write implemented
&lt;span class="p"&gt;-&lt;/span&gt; Provider UI complete
&lt;span class="p"&gt;-&lt;/span&gt; Migration script ready
✅ Session refresh bug fix
✅ Rate limiting added (5 attempts/15min)

&lt;span class="gu"&gt;## Current Task&lt;/span&gt;
🔄 Writing integration tests for OAuth flow
&lt;span class="p"&gt;-&lt;/span&gt; Status: 60% complete
&lt;span class="p"&gt;-&lt;/span&gt; File: tests/auth/test_oauth_integration.py  
&lt;span class="p"&gt;-&lt;/span&gt; Next: Test edge cases (token expiry, provider failures)

&lt;span class="gu"&gt;## Active Context&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; OAuth providers: Google, GitHub configured
&lt;span class="p"&gt;-&lt;/span&gt; Test environment: staging DB + mock providers
&lt;span class="p"&gt;-&lt;/span&gt; Coverage target: &amp;gt;90%

&lt;span class="gu"&gt;## Decisions Archive&lt;/span&gt;
[Previous decisions moved to archive...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Turn 51:
Instead of loading 134.8K tokens of history,
Load: conversation_memory.md (1,247 tokens)
Reduction: 99.1%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Implementation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Auto-summarize every N turns
&lt;/span&gt;&lt;span class="n"&gt;SUMMARY_INTERVAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConversationMemory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;conversation_memory.md&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_turn_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_count&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;SUMMARY_INTERVAL&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recent_turns&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;current_summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Ask Claude to update summary
&lt;/span&gt;        &lt;span class="n"&gt;updated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Update this session summary with recent progress.

CURRENT SUMMARY:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;current_summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

RECENT TURNS (last &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SUMMARY_INTERVAL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;):
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;format_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_turns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

UPDATED SUMMARY (preserve structure):
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;📝 Summary updated at turn &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Auto-Compaction Integration
&lt;/h4&gt;

&lt;p&gt;Claude Code's built-in auto-compaction triggers at ~167K tokens. Preemptive summarization keeps you below that threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monitor_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_usage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Trigger summary before auto-compaction&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;PREEMPTIVE_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120_000&lt;/span&gt;  &lt;span class="c1"&gt;# Before 167K limit
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context_usage&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;PREEMPTIVE_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;⚠️ Context at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context_usage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;create_summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;clear_old_turns&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Context reduced to safe levels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Measured Impact
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Study: 100 Long Sessions (&amp;gt;40 turns each)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No Summarization&lt;/th&gt;
&lt;th&gt;With Rolling Summary&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg context (turn 50)&lt;/td&gt;
&lt;td&gt;147K&lt;/td&gt;
&lt;td&gt;51K&lt;/td&gt;
&lt;td&gt;65% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sessions hitting auto-compact&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;86% fewer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Info loss at compaction&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Qualitative&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per long session&lt;/td&gt;
&lt;td&gt;$13.24&lt;/td&gt;
&lt;td&gt;$4.67&lt;/td&gt;
&lt;td&gt;65% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Part V: The Complete System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Putting It All Together: The Optimized Workflow
&lt;/h3&gt;

&lt;p&gt;Here's how all 16 strategies combine into a production system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;New Request
    ↓
[.claudeignore] ──→ Filter irrelevant files (30-40% reduction)
    ↓
[Model Selection] ──→ Choose appropriate tier (40-60% cost savings)
    ↓
[Hooks] ──→ Validate against guardrails (prevent waste)
    ↓
[Plan Mode?] ──→ If complex, plan first (20-30% fewer iterations)  
    ↓
[Search/RAG] ──→ Find relevant files (40-90% reduction)
    ↓
[Token Budget] ──→ Enforce limits (20-35% reduction)
    ↓
[CLAUDE.md] ──→ Load lean rules only (15-25% reduction)
    ↓
[Tools] ──→ Pre-process data (60-85% reduction)
    ↓
[Prompt Caching] ──→ Auto-optimize static content (81% cost reduction)
    ↓
[MCP Tool Search] ──→ Load tools on-demand (85% MCP reduction)
    ↓
Execute Request
    ↓
[Snapshot] ──→ Save state periodically (35-50% reduction in restarts)
    ↓
[Memory] ──→ Summarize conversation (40-65% reduction)
    ↓
[Multi-Agent?] ──→ If needed, delegate to specialists (50-70% reduction)
    ↓
Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real-World Results: Complete System
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Case Study: SaaS Platform (50 developers)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before Optimization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avg cost per developer/day: $12.50&lt;/li&gt;
&lt;li&gt;Monthly team cost: $13,125&lt;/li&gt;
&lt;li&gt;Context limit hits: 34/day&lt;/li&gt;
&lt;li&gt;Developer frustration: High&lt;/li&gt;
&lt;li&gt;Haiku usage: 60% (tasks forced to cheaper model)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After Full Implementation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avg cost per developer/day: $3.20&lt;/li&gt;
&lt;li&gt;Monthly team cost: $3,360&lt;/li&gt;
&lt;li&gt;Context limit hits: 2/day
&lt;/li&gt;
&lt;li&gt;Developer frustration: Low&lt;/li&gt;
&lt;li&gt;Haiku usage: 15% (only for appropriate tasks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cost: 74% reduction&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limit hits: 94% reduction&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Opus/Sonnet usage: 45% → 85% of tasks&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Optimization Checklist
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Week 1: Quick Wins (2-4 hours total)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Create .claudeignore (2 min)&lt;/li&gt;
&lt;li&gt;[ ] Trim CLAUDE.md to &amp;lt;200 lines (30 min)&lt;/li&gt;
&lt;li&gt;[ ] Enable Plan Mode habit (0 min, behavior change)&lt;/li&gt;
&lt;li&gt;[ ] Verify MCP Tool Search enabled (0 min, automatic)&lt;/li&gt;
&lt;li&gt;[ ] Review model usage, set up tiering (30 min)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 2: Intermediate (4-8 hours total)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Set up context snapshots (1 hour)&lt;/li&gt;
&lt;li&gt;[ ] Build basic code index (2-4 hours)&lt;/li&gt;
&lt;li&gt;[ ] Implement task decomposition discipline (0 min, behavior change)&lt;/li&gt;
&lt;li&gt;[ ] Add basic hooks (2 hours)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 3: Advanced (8-16 hours total)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Implement token budgeting (4 hours)&lt;/li&gt;
&lt;li&gt;[ ] Convert docs to structured markdown (4-6 hours)&lt;/li&gt;
&lt;li&gt;[ ] Set up conversation memory system (2-3 hours)&lt;/li&gt;
&lt;li&gt;[ ] Build tool-first MCP servers (4-8 hours)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Month 2: Production Scale (Optional)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Multi-agent architecture (8-16 hours)&lt;/li&gt;
&lt;li&gt;[ ] Advanced RAG with reranking (8-12 hours)&lt;/li&gt;
&lt;li&gt;[ ] Automated compression pipeline (4-6 hours)&lt;/li&gt;
&lt;li&gt;[ ] Full monitoring dashboard (4-8 hours)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Mental Model
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stop thinking:&lt;/strong&gt; "How do I make Claude understand my codebase?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start thinking:&lt;/strong&gt; "How do I give Claude exactly what it needs, nothing more?"&lt;/p&gt;

&lt;p&gt;Because in modern AI development:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Context is the real programming language.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every token you send is a line of code in that language.&lt;/p&gt;

&lt;p&gt;Write it carefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: The New Engineering Discipline
&lt;/h2&gt;

&lt;p&gt;Token optimization isn't a nice-to-have. It's a core engineering discipline, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory management in C&lt;/li&gt;
&lt;li&gt;Query optimization in databases&lt;/li&gt;
&lt;li&gt;Bundle size in frontend development&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The teams that master it will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ship 3-5× faster&lt;/li&gt;
&lt;li&gt;Spend 60-90% less
&lt;/li&gt;
&lt;li&gt;Never hit rate limits&lt;/li&gt;
&lt;li&gt;Keep top models actively predicting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The teams that ignore it will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Burn budgets&lt;/li&gt;
&lt;li&gt;Hit limits constantly&lt;/li&gt;
&lt;li&gt;Force developers to Haiku&lt;/li&gt;
&lt;li&gt;Wonder why "AI didn't work for us"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The choice is yours.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Official Documentation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code Docs: &lt;a href="https://code.claude.com/docs" rel="noopener noreferrer"&gt;https://code.claude.com/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Protocol: &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompt Engineering: &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering" rel="noopener noreferrer"&gt;https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompt Caching: &lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;https://platform.claude.com/docs/en/build-with-claude/prompt-caching&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG &amp;amp; Retrieval:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contextual Retrieval: &lt;a href="https://www.anthropic.com/news/contextual-retrieval" rel="noopener noreferrer"&gt;https://www.anthropic.com/news/contextual-retrieval&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;RAG Guide: &lt;a href="https://www.promptingguide.ai/research/rag" rel="noopener noreferrer"&gt;https://www.promptingguide.ai/research/rag&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain RAG: &lt;a href="https://python.langchain.com/docs/use_cases/question_answering/" rel="noopener noreferrer"&gt;https://python.langchain.com/docs/use_cases/question_answering/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ccusage (token tracking): &lt;a href="https://github.com/anthropics/ccusage" rel="noopener noreferrer"&gt;https://github.com/anthropics/ccusage&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;McPick (MCP management): &lt;a href="https://github.com/scottspence/mcpick" rel="noopener noreferrer"&gt;https://github.com/scottspence/mcpick&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code Kit: &lt;a href="https://claudefa.st" rel="noopener noreferrer"&gt;https://claudefa.st&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;What's your biggest token waste? Drop your optimization wins below. 👇&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Andrei Nita&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Chief Technology Officer&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Building production AI systems at scale&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>productivity</category>
      <category>software</category>
    </item>
    <item>
      <title>AI didn’t replace developers.</title>
      <dc:creator>Andrei Nita</dc:creator>
      <pubDate>Thu, 12 Mar 2026 20:50:35 +0000</pubDate>
      <link>https://dev.to/andrei_nita/ai-didnt-replace-developers-6j1</link>
      <guid>https://dev.to/andrei_nita/ai-didnt-replace-developers-6j1</guid>
      <description>&lt;p&gt;It just changed the workflow:&lt;/p&gt;

&lt;p&gt;1️⃣ Ask AI to build everything&lt;br&gt;&lt;br&gt;
2️⃣ Hit usage limits&lt;br&gt;&lt;br&gt;
3️⃣ Wait for the reset&lt;br&gt;&lt;br&gt;
4️⃣ Ask AI to fix everything it built&lt;/p&gt;

&lt;p&gt;Modern software engineering.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3d3y5nb79qg87b6tbc8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3d3y5nb79qg87b6tbc8v.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Return of the QA Tester</title>
      <dc:creator>Andrei Nita</dc:creator>
      <pubDate>Thu, 12 Mar 2026 20:47:12 +0000</pubDate>
      <link>https://dev.to/andrei_nita/return-of-the-qa-tester-1kih</link>
      <guid>https://dev.to/andrei_nita/return-of-the-qa-tester-1kih</guid>
      <description>&lt;p&gt;AI: “Here’s 10,000 lines of code.” 🤖&lt;/p&gt;

&lt;p&gt;Production: “Something is catastrophically broken.” 🔥&lt;/p&gt;

&lt;p&gt;Developer:&lt;br&gt;
“…great. fantastic. love that for us.”&lt;/p&gt;

&lt;p&gt;Remember when developers hated writing tests?&lt;/p&gt;

&lt;p&gt;Turns out tests were the only witnesses to the crime.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodoufhw7js6waf6jt7b7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodoufhw7js6waf6jt7b7.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>qa</category>
    </item>
  </channel>
</rss>
