houseofmvps

Posted on Apr 9

Slash 90% of Tokens Per Session With This Pre-Compiled Wiki (Karpathy Inspired Workflow)

#claude #ai #claudecode #vibecoding

Last June, Karpathy posted something that got 2.3 million views. He said context engineering matters more than prompt engineering specifically, "the delicate art and science of filling the context window with just the right information for the next step."

Then last week he posted about building structured markdown knowledge bases that LLMs can reason over. Also went viral.

Both ideas point at the same problem: your AI is only as good as the context you give it. And right now, most of us are giving it terrible context.

the problem nobody's measuring

Every time you start a Claude Code session, it spends the first chunk of time just figuring out your project. Reading files. Grepping for routes. Opening package.json. Exploring the import graph. Finding your schema. Checking your env vars.

I started measuring how many tokens this costs. On a real 92-file monorepo (Hono + Drizzle, 4 workspaces): ~66,000 tokens. Every session. Not cached between sessions.

On a 53-file project: ~46,000 tokens. On a 40-file project: ~26,000.

That's your AI burning through your context window (the "RAM" in Karpathy's analogy) just to understand the project before it does anything you actually asked for.

what context engineering looks like in practice

If you follow Karpathy's framing, the solution is obvious: don't let your AI waste context exploring. Pre-compile the context it needs and hand it over at session start.

That's what I built. npx codesight scans your codebase via AST parsing and generates a structured context map (routes, schema, components, dependency graph, env vars, middleware, hot files)in one markdown file your AI reads immediately.

npx codesight

One command. Zero dependencies. It borrows TypeScript from your own node_modules for the compiler API. Falls back to regex for non-TS projects.

the numbers

Real production codebases. Not toy demos.

Project	Files	codesight Output	Manual Exploration	Reduction
SaaS A (Hono + Drizzle monorepo)	92	5,129 tokens	~66,040 tokens	12.9x
SaaS B (raw HTTP + Drizzle)	53	3,945 tokens	~46,020 tokens	11.7x
SaaS C (Hono + Drizzle, 3 workspaces)	40	2,865 tokens	~26,130 tokens	9.1x

Average: 11.2x. Your AI reads 3-5K tokens of structured context instead of burning 26-66K tokens exploring.

why AST matters

Regex-based tools guess at your code structure. AST parsing actually understands it.

When TypeScript is in your project, codesight uses the real TypeScript compiler API. This means:

Follows router.use('/prefix', subRouter) chains (regex misses nested routers)
Combines NestJS @Controller('users') + @Get(':id') into /users/:id
Parses tRPC router({ users: userRouter }) nesting correctly
Extracts Drizzle field types from .primaryKey().notNull() chains
Detects middleware in route handler chains: app.get('/path', auth, handler)
Filters out false positives like c.get('userId') that regex would match as routes

Zero false positives across all three benchmark projects.

25+ frameworks detected. 8 ORMs parsed. React/Vue/Svelte components with props.

blast radius — context engineering for changes

Karpathy's framing isn't just about initial context. It's about giving your AI the right information for "the next step." When the next step is changing a file, your AI needs to know what breaks.

npx codesight --blast src/db/index.ts

BFS through the import graph. Shows every transitively affected file, route, and model.

On BuildRadar, changing the database module correctly identified 10 affected files, 33 routes, and all 12 models. Three hops deep.

Your AI reads this before touching the file. That's context engineering applied to refactoring.

the wiki layer (Karpathy's latest idea, automated)

Karpathy's April 3rd post was about structured markdown wikis that LLMs can reason over. codesight v1.6.2 added --wiki which does exactly this for your codebase:

npx codesight --wiki

It generates a wiki knowledge base in .codesight/wiki/ — an index.md (200 tokens) plus individual articles per topic. Your AI reads the index at session start, then pulls the one relevant article for each question.

Without codesight: AI reads 26-66K tokens exploring.
With codesight: AI reads 3-5K tokens (the full map).
With --wiki: AI reads ~200 tokens at start, then ~160-350 per question.

Combined reduction: ~91x.

it generates context for everything

One command creates context files for every major AI tool:

npx codesight --init

CLAUDE.md for Claude Code
.cursorrules for Cursor
codex.md for OpenAI Codex
AGENTS.md for Codex agents
.github/copilot-instructions.md for GitHub Copilot

Each pre-filled with your actual project structure.

MCP server mode

npx codesight --mcp

Runs as a Model Context Protocol server. Your AI queries specific context on demand instead of loading everything. Session caching — first call scans, subsequent calls return instantly.

the relationship to caveman mode

The caveman prompt trick reduces output tokens (what the AI says back).

codesight reduces input/exploration tokens (what the AI reads to understand your project).

Caveman = make the AI talk less.
codesight = give the AI exactly what it needs to know.

They're complementary. Use both.

try it

npx codesight

Zero deps. MIT. ~200ms scan time. Works with any Node.js project (and has regex fallback for Python, Go, Ruby, Rust, Java, Kotlin, Elixir, PHP).

(https://github.com/Houseofmvps/codesight)

If it saves you tokens, a star helps others find it too.

Karpathy defined the skill. This tool automates it.

Built by Kailesk Khumar, solo founder of houseofmvps.com.

DEV Community