DEV Community

Cover image for Your AI Coding Agent Is Wasting 70x More Tokens Than It Needs To. Graphify Fixes That
Selma Guedidi
Selma Guedidi

Posted on

Your AI Coding Agent Is Wasting 70x More Tokens Than It Needs To. Graphify Fixes That

Every AI coding assistant has the same blind spot: it answers questions about your codebase by grepping and reading files one at a time. That works for "where is this function defined?" It falls apart for "what connects the auth module to the database pool?", the kind of question whose answer is spread across ten files, three markdown docs, and a design PDF nobody has opened since 2024.

There is a second cost hiding in that workflow: tokens. Every file your assistant opens gets pasted into its context window. Ask three architecture questions in a Claude Code session on a large repo and you can burn through a significant share of the 200K context on raw file contents alone, then hit the limit mid-conversation and lose your thread.

Graphify fixes both problems by building a knowledge graph of your entire project, code, docs, PDFs, images, even videos, that your assistant queries instead of re-reading files. You trigger it with one command inside your AI coding assistant:

/graphify .
Enter fullscreen mode Exit fullscreen mode

It works in Claude Code, Codex, OpenCode, Kilo Code, Cursor, Gemini CLI, GitHub Copilot CLI, VS Code Copilot Chat, Aider, Amp, OpenClaw, Factory Droid, Trae, Hermes, Kimi Code, Kiro, Pi, Devin CLI, and Google Antigravity.

How it works: three passes

When you run /graphify ., files take different paths depending on their type. The design principle: use the cheapest, most private path available for each file.

Graphify extraction pipeline: code goes through local tree-sitter AST extraction, docs and PDFs through semantic extraction on your assistant's model API, video through local faster-whisper transcription, all merged, deduplicated, clustered and written to graphify-out/

Pass 1: code structure (free, no API calls). Tree-sitter parses your code files locally and extracts classes, functions, imports, call graphs, and inline comments. Coverage spans 36 grammars: Python, TypeScript, Go, Rust, Java, C/C++, C#, Swift, SQL, Terraform and more. SQL files get special treatment: tables, views, foreign keys, and JOIN relationships are extracted deterministically. Code never goes to an LLM in the normal pipeline; a code-only corpus builds fully offline with zero API cost.

Pass 2: video and audio (local, no API calls). Media files are transcribed with faster-whisper on your machine. A nice touch: the transcription prompt is seeded with the most-connected concepts from your code graph so far, so domain terms come out spelled correctly. Transcripts are cached, and re-runs skip already-processed files.

Pass 3: docs, papers, images (model API, costs tokens once). Your assistant's model runs in parallel subagents over markdown, PDFs, images, and transcripts. Each subagent outputs a JSON fragment of nodes and edges, and the fragments merge into one graph. Inside your IDE this uses the session you are already paying for, so no extra API key is needed.

After extraction, entities are deduplicated (including "ghost duplicates", the same symbol found once by AST parsing and once by semantic extraction), and communities are detected with the Leiden algorithm, which clusters nodes by edge density. No embeddings, no vector database: the graph structure itself is the similarity signal.

Every relationship also carries a confidence tag, so you always know what was found versus guessed:

Tag Meaning
EXTRACTED Found directly in the source (a function call, an import). Always confidence 1.0
INFERRED A reasonable model inference, scored 0.55 (speculative) to 0.95 (near-certain)
AMBIGUOUS Uncertain, flagged in the report for manual review

There is also a SHA256 content cache in graphify-out/cache/: re-runs fingerprint every file and only new or modified files go through extraction again. Parallel AST extraction (via ProcessPoolExecutor, bypassing Python's GIL) ran about 1.66x faster than sequential on an 84-file corpus.

What you get

Everything lands in a single graphify-out/ directory:

The three outputs of /graphify: graph.html for interactive browsing, GRAPH_REPORT.md for highlights including god nodes and surprising connections, graph.json for queries, MCP serving and exports

The report is the part that surprises people. It surfaces god nodes (the most-connected concepts everything flows through), surprising connections (links between things living in different modules, ranked by how unexpected they are), and design rationale mined from # NOTE:, # WHY: and # HACK: comments and docstrings, extracted as separate nodes linked to the code they explain.

Token usage: the benchmark

This is the part that changes your day-to-day. Every file your assistant reads is tokens spent twice: once against the context window, and once against your plan's usage limits.

Take Claude Code as the concrete case. The context window is 200K tokens, and Pro or Max plans meter your usage per session. When Claude answers a codebase question by reading files, each opened file stays in context for the rest of the conversation. A few architecture questions in, the window fills up, Claude compacts the conversation (losing detail), responses slow down, and you hit your usage limit hours earlier than you needed to. Most of those tokens were spent re-reading files that had not changed since the last question.

A knowledge graph flips the economics. Building the graph costs tokens once (Pass 3 only, and zero for pure code). After that, every question reads the compact graph instead of raw files, and the savings compound with every query. Here are the measured numbers from the project's benchmark corpora (each worked/ folder in the repo contains the raw inputs and actual outputs so you can reproduce them):

Token benchmark: 71.5x fewer tokens per query on a 52-file mixed corpus, 5.4x on a 4-file corpus, roughly 1x on a tiny 6-file library

Corpus Files Tokens per query vs raw files
Karpathy repos + 5 papers + 4 images 52 71.5x fewer
graphify source + Transformer paper 4 5.4x fewer
httpx (synthetic Python library) 6 ~1x

The pattern is clear: reduction scales with corpus size. Six files already fit in any context window, so the graph's value there is structural clarity rather than compression. At 52 files the savings are dramatic, and real projects are far bigger than 52 files.

What this means in a Claude Code session, concretely: suppose you ask "how does authentication flow through this service?" on a mid-size repo. Without a graph, Claude greps, then reads 20 or 30 files at a thousand-plus tokens each, easily 30K to 50K tokens for one question, or a quarter of the entire context window. Ask four questions like that and the session compacts. With a graph, graphify query "how does authentication flow through this service?" returns a scoped subgraph of the relevant nodes and edges, typically a few hundred tokens. Same answer quality, roughly 50x cheaper. Over a working day that is the difference between hitting your Claude usage limit at 11am and not hitting it at all, and your context window stays free for what actually matters: the code you are writing.

Concrete examples

Once the graph exists, you and your assistant can interrogate it directly from the terminal:

# trace a relationship across modules
graphify query "what connects DigestAuth to Response?"

# shortest path between two entities
graphify path "UserService" "DatabasePool"

# everything the graph knows about one concept
graphify explain "RateLimiter"
Enter fullscreen mode Exit fullscreen mode

query returns the scoped subgraph relevant to the question. path walks the edges between two nodes and shows each hop with its relation (calls, imports, implements, semantically_similar_to) and its confidence tag. explain gathers a node's neighbors, its source file, and any design-rationale nodes linked to it.

The graph is not limited to your repo, either. Add a paper your architecture is based on, or a conference talk:

/graphify add https://arxiv.org/abs/1706.03762   # fetch a paper and link it in
/graphify add <youtube-url>                       # transcribe and add a video
Enter fullscreen mode Exit fullscreen mode

Now graphify query "which parts of our code implement ideas from the attention paper?" has an actual basis to answer from.

To make your assistant prefer the graph over grepping, run the per-platform install once in your project (for example graphify claude install or graphify cursor install). On platforms with payload-bearing hooks (Claude Code, Gemini CLI), a hook fires before search-style tool calls and nudges the assistant toward graphify query. On the others, persistent instruction files (AGENTS.md, .cursor/rules/ with alwaysApply: true) provide the same query-first guidance.

Installation

You need Python 3.10+ and, ideally, uv or pipx.

⚠️ Naming gotcha: the PyPI package is graphifyy (double-y); other graphify* packages on PyPI are not affiliated. The CLI command is still graphify.

Two steps:

# Step 1: install the package (isolated env, recommended)
uv tool install graphifyy
# alternatives: pipx install graphifyy  |  pip install graphifyy

# Step 2: register the skill with your AI assistant
graphify install
Enter fullscreen mode Exit fullscreen mode

Open your assistant and type /graphify . and you are done. For other platforms, pass the platform name: graphify install --platform codex, graphify cursor install, or graphify install --platform agents for the cross-framework ~/.agents/skills/ location. Add --project to install into the current repository instead of your user profile.

A few platform quirks worth knowing:

  • PowerShell: use graphify ., not /graphify .; the leading slash is a path separator on Windows.
  • Codex: uses $graphify instead of /graphify, and needs multi_agent = true under [features] in ~/.codex/config.toml for parallel extraction.
  • command not found after install: the tool bin dir (~/.local/bin) is not on your PATH yet. Run uv tool update-shell (or pipx ensurepath) and open a new terminal.
  • uvx graphify fails: uv tool run reads the first word as a package name. Use uvx --from graphifyy graphify install.

Heavy dependencies are opt-in extras: uv tool install "graphifyy[pdf]" for PDF extraction, [office] for .docx/.xlsx, [video] for transcription, [mcp] for the MCP server, [neo4j] for graph database push, or [all] for everything.

Team setup: commit the map

graphify-out/ is meant to be committed. One person builds the graph, and everyone else's assistant reads it immediately after pulling. Nobody else spends a single token on extraction:

Team workflow: Dev A commits graphify-out/, teammates pull and their assistants read the graph instantly; a post-commit hook rebuilds AST-only at no API cost and a union-merge driver keeps graph.json free of conflict markers

The workflow:

  1. One person runs /graphify . and commits graphify-out/.
  2. Everyone pulls; their assistant reads the graph immediately.
  3. graphify hook install adds a post-commit hook that rebuilds the graph automatically (AST only, no API cost). It also registers a git merge driver that union-merges graph.json, so two devs committing in parallel never end up with conflict markers.
  4. When docs or papers change, /graphify --update re-extracts only the changed files, thanks to the SHA256 cache.

manifest.json stores keys as relative paths and re-anchors them on load, so committing it is safe and avoids a full rebuild on first checkout. Add graphify-out/cost.json to .gitignore (it is local-only). Ignore rules work like git: .gitignore is respected automatically, and a .graphifyignore (same syntax, including ! negation) is merged on top. It can only exclude more, never re-include.

Serving the graph over MCP

For repeated, structured tool-call access, expose the graph as an MCP (Model Context Protocol) server:

# one local server per developer (stdio)
python -m graphify.serve graphify-out/graph.json

# register it with your client, e.g. Kimi Code:
kimi mcp add --transport stdio graphify -- python -m graphify.serve graphify-out/graph.json

# or one shared HTTP server for the whole team
python -m graphify.serve graphify-out/graph.json --transport http --host 0.0.0.0 --api-key "$SECRET"
Enter fullscreen mode Exit fullscreen mode

MCP architecture: graph.json served by python -m graphify.serve over stdio per developer or HTTP shared, exposing query_graph, get_node, get_neighbors, shortest_path, list_prs, get_pr_impact and triage_prs tools

The server exposes query_graph, get_node, get_neighbors, shortest_path, plus PR-oriented tools (list_prs, get_pr_impact, triage_prs). With HTTP transport, teammates point their IDE's MCP config at http://<host>:8080/mcp and need no local graphify install at all. The default bind is loopback-only; pair --host 0.0.0.0 with --api-key when exposing it on a shared host, or run it in the provided Docker container.

If you use the Docker MCP Toolkit, graphify's docs also include a tested runbook for adding a SQLite MCP server (mcp/sqlite) alongside the graph tools, giving every connected client (Claude Code, Claude Desktop, Cursor, VS Code and others) a persistent SQL workspace with read_query, write_query, create_table, list_tables, describe_table and append_insight. Handy when your assistant needs somewhere durable to stash analysis results between sessions.

Speaking of PRs: graphify prs gives you a dashboard of CI state, review status and worktree mapping; graphify prs --conflicts flags PRs touching the same graph communities (merge-order risk); graphify prs --triage has an LLM rank your review queue.

Headless and CI usage

Outside an IDE, graphify extract runs the same pipeline with an explicit backend: Gemini, Claude, OpenAI, DeepSeek, Azure, Bedrock (IAM, no API key), or fully local Ollama:

graphify extract ./docs --backend claude       # needs ANTHROPIC_API_KEY
graphify extract ./docs --backend ollama       # fully local, no key
graphify extract --postgres "postgresql://user:pass@host/db"  # live schema introspection
Enter fullscreen mode Exit fullscreen mode

For code with data-residency requirements, --backend ollama keeps everything on your machine. There is no telemetry or usage tracking; local query logging to ~/.cache/graphify-queries.log can be disabled with GRAPHIFY_QUERY_LOG_DISABLE=1.

The bottom line

Grep answers "where is this string?" A knowledge graph answers "how does this system fit together?", and it answers it for your docs, diagrams and design PDFs, not just your code. On a realistic mixed corpus it cuts per-question token usage by 71.5x, which in practice means your Claude Code session stops flirting with the context limit every time you ask an architecture question.

The two-step install (uv tool install graphifyy, then graphify install) takes under a minute, and a code-only graph costs zero API calls to build. Run /graphify . on the messiest repo you own and read the surprising connections section of GRAPH_REPORT.md. That is usually the moment it clicks.


References

Top comments (0)