DEV Community

Cover image for Your AI Agent Wastes 87% of Its Tokens Just Finding Code. I Fixed That.
Marjo
Marjo

Posted on

Your AI Agent Wastes 87% of Its Tokens Just Finding Code. I Fixed That.

Or: How I Stopped Worrying and Learned to Love the Trigram

You know that feeling when you ask an AI agent to fix a simple bug, and it spends 45 seconds reading your entire codebase before changing 3 lines?

I do. I watched it happen. Repeatedly. So I decided to count exactly how bad it was.

Turns out, 60-80% of the tokens your AI agent consumes go to navigation -- searching for code, reading files, searching again, reading more files. Not reasoning. Not writing code. Just finding things.

It's like hiring a plumber who spends 4 hours opening every door in your house before fixing the one leaking pipe in the bathroom.

So I built Hypergrep. And the plumber now has a floor plan.


Table of Contents


The Problem Nobody Talks About

Every AI coding agent -- Claude Code, Cursor, Copilot, Cody, aider -- uses the same approach to understand your code:

1. grep for something
2. get raw text lines back
3. have no idea what those lines mean
4. read the full file to understand context
5. repeat 50-200 times per session
Enter fullscreen mode Exit fullscreen mode

This is the equivalent of navigating a city by reading street signs one at a time. It works. It's just painfully inefficient.

Jake Nesler measured this in early 2026 and found that a single question consumed ~12,000 tokens when the actual answer required ~800. The agent read 25 files to locate 3 functions. That's a 93% waste ratio.

And this isn't a bug. It's the architecture. grep was built for humans who want to see matching lines. AI agents don't want lines -- they want understanding.


The Experiment

I picked a real codebase (ripgrep's own source code -- 208 files, 52K lines, because irony is the best testing methodology) and measured what happens when an agent investigates how the Matcher system works.

Approach A: Agent with ripgrep

Step 1: rg "Matcher"
         -> 376 matching lines across dozens of files
         -> 10,174 tokens consumed
         -> Agent's internal monologue: "cool, 376 lines. I understand nothing."

Step 2: Read 5 files to understand context
         -> auth.rs, session.rs, middleware.rs...
         -> 9,284 tokens consumed
         -> Agent: "OK, I think I'm starting to get it..."

Step 3: rg "impl.*Matcher" (refine search)
         -> 43 lines
         -> 1,122 tokens consumed
         -> Agent: "Now I can actually answer the question"
Enter fullscreen mode Exit fullscreen mode

Total: 20,580 tokens. 5+ tool calls. Agent spent 45% of budget reading files.

Approach B: Agent with Hypergrep

Step 1: hypergrep --model
         -> Codebase overview: directory structure, key abstractions, entry points
         -> 1,413 tokens
         -> Agent: "I know this codebase now."

Step 2: hypergrep --layer 1 --budget 1000 "Matcher"
         -> Top results with function signatures + who calls what
         -> 1,400 tokens
         -> Agent: "I see the Matcher trait, its implementations, and the call chain."

Step 3: hypergrep --impact "Matcher"
         -> What breaks if Matcher changes
         -> 1 token
         -> Agent: "And I know the blast radius."
Enter fullscreen mode Exit fullscreen mode

Total: 2,814 tokens. 3 tool calls. Zero file reads.

Same understanding. 87% fewer tokens. Not estimated. Measured.

Tokens consumed:

ripgrep:    ==================================================  20,580
hypergrep:  ======                                               2,814
                                                         87% reduction
Enter fullscreen mode Exit fullscreen mode

How Hypergrep Works

Hypergrep is not a faster grep. Let me be clear about this because I wasted a lot of time thinking the goal was speed.

ripgrep is faster for single text searches: 23ms vs 40ms. If you need to grep once and leave, use ripgrep. Seriously.

Hypergrep is a different tool entirely. It combines three capabilities that no other single tool provides:

1. Trigram-Indexed Text Search

When you search for "authenticate", Hypergrep doesn't scan every file. It looks up which files contain the trigrams "aut", "uth", "the", "hen", "ent", "nti", "tic", "ica", "cat", "ate" -- and only checks those files.

All 208 files
     |
     | trigram filter (which files contain these 3-char sequences?)
     v
12 candidate files
     |
     | regex verification (does the regex actually match?)
     v
3 true matches
Enter fullscreen mode Exit fullscreen mode

This is the same technique behind Google Code Search (Russ Cox, 2012). Zero false negatives -- if a file matches, it's guaranteed to be in the candidate set. The filter can only remove files that provably cannot match.

First query builds the index (~40ms for 200 files, cached to disk). After that, searches are 4.4ms. Seven times faster than ripgrep's 31ms.

2. Tree-sitter Structural Awareness

Here's where it gets interesting.

ripgrep returns this:

src/auth.rs:47:    let hashed = hash_password(pass);
Enter fullscreen mode Exit fullscreen mode

Line 47. Great. Which function is this in? What are the arguments? What does the function return? The agent doesn't know. It has to read the file.

Hypergrep returns this:

$ hypergrep -s "hash_password" src/

src/auth.rs:1-8 function authenticate
fn authenticate(user: &str, pass: &str) -> bool {
    let hashed = hash_password(pass);
    check_db(user, hashed)
}
Enter fullscreen mode Exit fullscreen mode

The complete function body. The agent immediately knows: this is the authenticate function, it takes a user and password, it calls hash_password and check_db, and it returns a bool.

No file read needed. The search result is the understanding.

This works because Hypergrep parses every file with tree-sitter during indexing. It knows where every function, class, struct, method, and trait boundary is. When a line matches, it expands to the smallest enclosing symbol.

16 languages supported: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Ruby, PHP, Swift, C#, Scala, Lua, Zig, Bash. Plus tree-sitter parsing for HTML, CSS, JSON, TOML, YAML, and HCL.

If a file is in a language without a grammar, it falls back to regular line-level search. Same as ripgrep. No worse.

3. Live Call Graph

During indexing, Hypergrep also extracts function call relationships from the AST. It builds a call graph: function A calls function B, function C calls function A.

This enables two commands that no grep tool can answer:

# Who calls this function?
$ hypergrep --callers "authenticate" src/
  src/api.rs:login_handler
  src/api.rs:api_key_verify
  src/middleware.rs:auth_middleware

# What does this function call?
$ hypergrep --callees "authenticate" src/
  src/auth.rs:hash_password
  src/auth.rs:check_db
  src/session.rs:create_session
Enter fullscreen mode Exit fullscreen mode

Response time: 2.5 microseconds. Not milliseconds. Microseconds. It's a hash table lookup.


The Secret Sauce: Semantic Compression

Speed is nice, but the real innovation is information density per token.

I built three output layers:

Layer What you get Tokens When to use
--layer 0 File + function name + kind ~15 "Which files are relevant?"
--layer 1 Signature + calls + called_by ~80-120 "What does this do?" (sweet spot)
--layer 2 Full source code ~200-800 "I need to edit this function"

Layer 1 is the sweet spot. Here's what the agent actually receives:

[
  {
    "name": "search",
    "kind": "function",
    "file": "crates/core/main.rs",
    "line_range": [107, 151],
    "signature": "fn search(args: &HiArgs, mode: SearchMode) -> Result<bool>",
    "calls": ["search_path", "matcher", "printer", "walk_builder"],
    "called_by": ["search_parallel", "run", "try_main"],
    "tokens": 85
  }
]
Enter fullscreen mode Exit fullscreen mode

85 tokens. The agent knows:

  • The function name, file, and line range
  • The full type signature
  • Everything it calls (dependencies going down)
  • Everything that calls it (dependents going up)

Without Hypergrep, getting this understanding means reading main.rs (~2,000 tokens) and probably search.rs too (~3,000 tokens). That's 5,000 tokens for what Hypergrep delivers in 85.

And then there's the budget parameter:

hypergrep --layer 1 --budget 500 --json "authenticate" src/
Enter fullscreen mode Exit fullscreen mode

"Give me the best results that fit in 500 tokens." Hypergrep ranks by relevance and fills the budget with the top results. The agent gets maximum information density within its context constraints.

No other search tool has this concept. grep returns everything. You deal with the overflow. Hypergrep optimizes for the consumer.


Real Benchmarks (No Hand-Waving)

Every number here is from a real run. I benchmarked against ripgrep's own source code (208 Rust files, 52K lines) because I wanted a real project, and also because benchmarking a search tool against the search tool's own code felt appropriately meta.

Speed

Query latency (warm index, median of 20 runs):

"fn search"        hypergrep  4.5ms  ||||
                   ripgrep   31.0ms  |||||||||||||||||||||||||||||||

"Searcher"         hypergrep  3.7ms  |||
                   ripgrep   31.0ms  |||||||||||||||||||||||||||||||

"TODO"             hypergrep  0.5ms  |
                   ripgrep   31.0ms  |||||||||||||||||||||||||||||||

"Result<"          hypergrep  4.9ms  ||||
                   ripgrep   31.0ms  |||||||||||||||||||||||||||||||
Enter fullscreen mode Exit fullscreen mode

Hypergrep is 7x faster for warm queries. ripgrep is constant at ~31ms because it scans every file every time. Hypergrep varies from 0.5ms to 7.5ms depending on how many candidates the trigram filter produces.

50-Query Agent Session

Cumulative time:

Queries:     1     5    10    20    50   100
ripgrep:    31   155   310   620  1550  3100 ms
hypergrep:   4    22    44    88   220   440 ms
                                   ^^^
                                  7x faster
Enter fullscreen mode Exit fullscreen mode

Over a 50-query session: ripgrep takes 1,550ms. Hypergrep takes 220ms. The gap widens linearly because Hypergrep pays the index cost once.

Token Savings

Investigation task: "How does Matcher work?"

                          Tokens
ripgrep + file reads:     ████████████████████░  20,580
hypergrep (L1 + budget):  ██░                     2,814

                          87% reduction
Enter fullscreen mode Exit fullscreen mode

The Full Numbers Table

Metric ripgrep Hypergrep
Warm text search 31ms 4.4ms 7x faster
50-query session 1,550ms 220ms 7x faster
Tokens (3-query task) 20,580 2,814 87% less
Callers query impossible 2.5us new capability
Existence check 31ms 291ns 100,000x faster
Codebase summary N/A 699 tokens new capability
Correctness baseline 5/5 match zero false negatives

All numbers from warm index. Cold start (first-ever run, no cache) is 40ms for text search, 800ms including tree-sitter + graph. The index is cached to disk (.hypergrep/index.bin, ~581 KB) and loads in 25ms on subsequent runs.


The Feature That Changes Everything

Impact analysis.

$ hypergrep --impact "hash_password" src/
Enter fullscreen mode Exit fullscreen mode
Impact analysis for 'hash_password' (depth 3):

  [depth 1] WILL BREAK   src/auth.rs:authenticate
  [depth 2] MAY BREAK    src/api.rs:login_handler
  [depth 3] REVIEW        src/main.rs:setup_routes
Enter fullscreen mode Exit fullscreen mode
Call graph:

  setup_routes -----> login_handler -----> authenticate -----> hash_password
   [REVIEW]            [MAY BREAK]         [WILL BREAK]        [you change this]
Enter fullscreen mode Exit fullscreen mode

No AI agent today checks blast radius before editing. They just edit and hope. "I changed the return type of hash_password from String to Vec<u8>" -- and three files break.

With --impact, the agent sees the damage before it happens. In 2.5 microseconds.

This alone is worth the install. Everything else is a bonus.


Everything Else It Does

Codebase Mental Model

$ hypergrep --model "" src/
Enter fullscreen mode Exit fullscreen mode

Generates a ~700 token structural summary of your entire codebase:

# Codebase Mental Model

## Languages
- Rust: 100+ files, TypeScript: 8 files

## Structure
- src/auth/ (3 files) -- 5 functions, 2 structs
- src/api/ (6 files) -- 12 functions, 3 structs

## Key Abstractions
- function search (main.rs) -- 16 callers, 8 callees
- struct Searcher (searcher/mod.rs) -- 12 callers

## Entry Points
- src/main.rs

## Hot Spots
- src/api/handlers.rs (15 symbols, 340 lines)
Enter fullscreen mode Exit fullscreen mode

Load this once at session start. The agent immediately knows where everything is. Replaces 10-20 exploratory searches.

Bloom Filter Existence Checks

"Does this project use Redis?"

Every agent eventually asks this. With ripgrep, it's a full codebase scan that returns zero results. 31ms to learn nothing.

With Hypergrep:

$ hypergrep --exists "redis" src/
NO: 'redis' is definitely not in this codebase    # 291 nanoseconds
Enter fullscreen mode Exit fullscreen mode

291 nanoseconds. That's not a typo. It's a bloom filter lookup.

The bloom filter parses Cargo.toml, package.json, go.mod, requirements.txt, and pyproject.toml for real dependency names. "NO" is guaranteed correct (zero false negatives). "YES" means "probably" (~2% false positive rate).

Daemon Mode

For heavy sessions (50+ queries), the daemon keeps the index in memory:

$ hypergrep-daemon --background src/
Daemon started (PID 18067)

$ hypergrep-daemon --status src/
Running
  PID:    18067
  Memory: 8.5 MB
  Socket: /tmp/hypergrep-f983e88f.sock
Enter fullscreen mode Exit fullscreen mode

Safety features (because nobody wants a rogue process eating their RAM):

  • Auto-stops after 30 min idle
  • Hard memory cap at 500 MB
  • 0% CPU when idle
  • PID file prevents duplicates
  • --stop to kill it manually

8.5 MB for 100 files. Less than a Chrome tab.


What I Got Wrong

I want to be honest about the limitations because too many project READMEs aren't.

Cold start is slower than ripgrep. On first run with no cache, Hypergrep takes 40ms for text search vs ripgrep's 23ms. With tree-sitter + graph, it's 800ms. The index pays for itself after ~40 queries. If you're doing one search and leaving, use ripgrep.

The call graph is incomplete. Static analysis can't see dynamic dispatch (getattr(obj, "method")()), reflection, callbacks passed as arguments, or macro-generated code. The --impact results will miss things. It's useful for orientation, not for guarantees. If you need 100% accuracy, use your language's type checker.

Binary is 29 MB. 16 tree-sitter grammars embedded. It's a chunky boy. ripgrep is 5 MB. The tradeoff is structural understanding for 16 languages vs text matching.

Large codebases (>10K files) need daemon mode. The CLI cold start with tree-sitter parsing is too slow. The daemon builds once, serves forever (well, for 30 minutes of idle time, then politely exits).

Bloom filter has ~2% false positives. If it says "YES, this project uses graphql", it might be because "graphql" appears in a test fixture string, not because GraphQL is actually used. "NO" is always correct though.


The Prior Art

I didn't invent most of the techniques. Here's what existed before and what Hypergrep actually adds:

System Year Text search Code graph Structural Token compression
Google Code Search 2006 Trigram index - - -
livegrep 2015 Suffix array - - -
Zoekt 2016 Positional trigram - ctags -
GitHub Blackbird 2023 Sparse n-grams - - -
ast-grep 2023 Pattern (no index) - Tree-sitter -
Axon 2025 - Call graph Tree-sitter -
codebase-memory-mcp 2025 - Call graph Tree-sitter -
Cursor indexed search 2025 Client-side n-gram - - -
Hypergrep 2026 Trigram index Call graph Tree-sitter L0/L1/L2 + budget

Text search tools can't do graph queries. Graph tools can't do regex search. Nobody does semantic compression with token budgets. Hypergrep is the first to combine all four.

The full research document (42 references, including the theoretical foundations from Cox 2012, GitHub Blackbird 2023, and the speculative execution papers): RESEARCH.md


Install and Try It

30 seconds:

# macOS / Linux (downloads pre-built binary)
curl -sSfL https://github.com/marjoballabani/hypergrep/releases/latest/download/hypergrep-installer.sh | sh

# Or from source
git clone https://github.com/marjoballabani/hypergrep.git
cd hypergrep && ./install.sh
Enter fullscreen mode Exit fullscreen mode

Try it on your project:

# See the codebase overview
hypergrep --model "" src/

# Search with semantic compression
hypergrep --layer 1 --budget 500 --json "your_function" src/

# Check blast radius before refactoring
hypergrep --impact "your_function" src/

# Does your project use something?
hypergrep --exists "redis" src/
Enter fullscreen mode Exit fullscreen mode

Set Up Your AI Agent

One command configures Claude Code, Cursor, Copilot, and Windsurf:

./hypergrep-setup.sh /path/to/your/project
Enter fullscreen mode Exit fullscreen mode

Your agents will automatically use Hypergrep for code search. No other setup needed.

Uninstall

If it's not for you:

rm -f $(which hypergrep) $(which hypergrep-daemon)
find ~ -name ".hypergrep" -type d -exec rm -rf {} + 2>/dev/null
Enter fullscreen mode Exit fullscreen mode

Clean. Nothing left behind.


What's Next

The daemon needs real-world battle testing with actual agent sessions. The call graph needs type-aware resolution (not just name matching). And I want to build an MCP server so agents can use Hypergrep as a native tool, not just a Bash command.

But the core thesis is proven: agents don't need faster text search. They need smarter results. Return function signatures instead of lines. Return call graphs instead of file contents. Return impact analysis instead of making the agent guess.

87% token reduction. Measured, not projected. On a real codebase. With 120 tests.


Links:

MIT License. 120 tests. 16 languages. Built with Rust.

If you use AI coding agents and burn tokens on navigation, give it a try. Star the repo if it saves you money. Open an issue if it doesn't.


Tags: #rust #cli #ai #devtools #opensource #codesearch #programming #rustlang

Top comments (0)