DEV Community

Cover image for The context problem nobody talks about: why AI coding agents waste 80% of tokens on files they already read yesterday
Creatman
Creatman

Posted on

The context problem nobody talks about: why AI coding agents waste 80% of tokens on files they already read yesterday

Every AI coding agent — Claude Code, Cursor, Codex, Gemini CLI — starts every session completely blind. It doesn't know your projects. It doesn't know your servers. It doesn't remember that you spent three hours yesterday debugging the payment system.

So it greps. It reads file after file. It SSHs into your server to check what's running. It asks you "which project?" for the hundredth time. By the time it's oriented, you've burned half your context window on reconnaissance.

I manage 15 projects across 4 VPS servers. This was costing me hours of context per day. So I built a fix.

The pattern: hierarchical context

The idea is dead simple. Instead of the agent searching bottom-up (grep everything → read files → build understanding), give it a top-down map:

Level 0: Project Map     — knows ALL your projects       (~2KB, always loaded)
Level 1: Project Detail  — architecture of one project   (~5KB, on demand)  
Level 2: Source Files     — actual code                   (only when needed)
Enter fullscreen mode Exit fullscreen mode

That's it. Three files instead of fifty. The agent reads the map, knows where to look, and goes straight to the answer.

What this looks like in practice

Without hierarchy

You: "What payment methods does Project A support?"

Agent:

  1. Greps C:\Users\ for anything payment-related (3 tool calls)
  2. Finds 6 candidate files, reads them all (6 tool calls)
  3. Realizes it's the wrong project, searches more (4 tool calls)
  4. SSHs into your server to read the production config (2 tool calls)
  5. Finally answers — 15+ tool calls, 80K+ tokens, 8 minutes

With hierarchy

You: "What payment methods does Project A support?"

Agent:

  1. Reads Level 0 — sees Project A is at ~/projects/a/ (already loaded, 0 calls)
  2. Reads ~/projects/a/CLAUDE.md — sees "Payments: Stars + CryptoCloud" (1 call)
  3. Answers immediately — 1 tool call, ~15K tokens, 10 seconds

Same question. Same agent. Same model. The only difference is a 2KB file that says "here's where everything is."

Setting it up (10 minutes)

Step 1: Create your project map

Add this to your agent's global instruction file (~/.claude/CLAUDE.md for Claude Code, .cursorrules for Cursor, AGENTS.md for Codex):

## Project Map

| Project | Local path | Server |
|---------|-----------|--------|
| **Auth Service** | `~/projects/auth/` | prod-1:/root/auth/ |
| **Landing Page** | `~/projects/landing/` | Cloudflare Pages |
| **Mobile App** | `~/projects/mobile/` | — |
| **Admin Panel** | `~/projects/admin/` | prod-1 (Docker) |

### Servers
| Name | IP | Key |
|------|-----|-----|
| prod-1 | x.x.x.x | ~/.ssh/prod |
| staging | y.y.y.y | ~/.ssh/staging |

### Rule
Read project CLAUDE.md before reading source files.
Enter fullscreen mode Exit fullscreen mode

This is your Level 0. It's ~2KB. The agent loads it automatically at the start of every session.

Step 2: Add CLAUDE.md to each project

In each project root, create a context file:

# Auth Service — CLAUDE.md

## Status: LIVE
API for user authentication. Handles OAuth, JWT, rate limiting.

## Tech Stack
Python 3.12, FastAPI, PostgreSQL, Redis

## Key Files
- main.py — entry point, route registration
- auth/jwt.py — token generation and validation  
- auth/oauth.py — Google/GitHub OAuth providers
- models/user.py — SQLAlchemy user model

## Deployment
- Server: prod-1 (x.x.x.x)
- Service: auth-service.service
- Logs: journalctl -u auth-service -f
Enter fullscreen mode Exit fullscreen mode

This is Level 1. ~3-5KB per project. The agent reads it when you mention the project and immediately knows the architecture.

Step 3 (optional): Add Graphify for code navigation

Graphify turns your codebase into a knowledge graph. Run it once per project:

pip install graphifyy
cd ~/projects/auth
Enter fullscreen mode Exit fullscreen mode

In your AI agent:

/graphify .
graphify claude install
Enter fullscreen mode Exit fullscreen mode

Now the agent has Level 1.5 — a structural map of your code. Before grepping, it consults the graph and knows exactly which file to read.

Step 4 (optional): Connect Claude Desktop via MCP

If you use Claude Desktop, add Graphify as an MCP server:

{
  "mcpServers": {
    "graphify": {
      "command": "python",
      "args": ["-m", "graphify.serve", "/path/to/graphify-out/graph.json"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Desktop automatically calls query_graph when you ask about your projects. No prompting needed — it just works.

Real test results

I ran the same questions with and without the hierarchy. Same model (Haiku — the cheapest), same machine, same projects.

"What is the architecture of Project A?"

Blind agent With hierarchy
Tool calls 12 1
Behavior Grep → read 4 files → build answer Read CLAUDE.md → answer
Accuracy Correct Correct

"Which of my projects use library X?"

Blind agent With hierarchy
Tool calls 44 2
Behavior Scan entire disk Targeted grep in known paths
Accuracy Missed 1 of 3 projects Found all 3

"Where is Project B deployed? Service name? Logs?"

Blind agent With hierarchy
Tool calls 9 0
Behavior Read configs + SSH into server Answered from context
SSH needed Yes No

The blind agent in T2 actually missed a project that the hierarchy-equipped agent found. More context didn't just save tokens — it produced better answers.

Why this works

AI coding agents are fundamentally search engines. When you ask a question, they search for the answer. The quality of the answer depends on the quality of the search.

Without context, the agent searches blind: grep everything, read everything, hope to find the right files. With a hierarchy, the search is directed: check the map, go to the right project, read the right file.

This isn't a new idea. It's how humans navigate codebases — you don't grep -r your company's entire monorepo every time someone asks about a service. You know which repo, which module, which file. The hierarchy gives the agent the same knowledge.

What this is NOT

  • Not a framework. It's a pattern — three markdown files.
  • Not a token compression tool. The savings come from not reading files, not from compressing them.
  • Not a replacement for Graphify. Graphify handles code-level navigation. This handles project-level navigation. They complement each other.
  • Not magic. If your project doesn't have a CLAUDE.md, the agent still greps. You have to write the context files.

The full setup

Templates, scripts, and multi-platform guides:

github.com/CreatmanCEO/ai-context-hierarchy

Includes:

  • Level 0 and Level 1 templates
  • Conversation indexing scripts (Claude Code sessions + Desktop export → searchable markdown)
  • VPS sync command template
  • Platform-specific setup for Claude Code, Cursor, Codex, Gemini CLI

Bonus: conversation indexing

Your past conversations with the AI contain architectural decisions, debugging sessions, deployment notes. But the agent can't search them.

The repo includes parsers that convert Claude Code session logs (.jsonl) and Claude Desktop exported chats into markdown files with YAML frontmatter:

---
title: "Fixed payment webhook"
date: 2026-04-14
project: auth-service
topics: ["webhook", "cryptocloud", "cloudflare"]
files_touched: ["payments.py", "webhook.py"]
---
Enter fullscreen mode Exit fullscreen mode

Index these with Graphify and the agent can find "what did we decide about the payment flow last week" without you re-explaining it.

Start here

  1. Write a project map (5 minutes)
  2. Add CLAUDE.md to your main project (5 minutes)
  3. Ask the agent about your project in a new session
  4. Watch it answer without grepping

That's the whole thing. No dependencies, no installation, no configuration. Just three markdown files that turn your blind agent into one that knows where to look.


Built with Graphify for code-level navigation. Source and templates: ai-context-hierarchy.

Top comments (0)