DEV Community

Egor Fedorov
Egor Fedorov

Posted on

We optimize SQL queries, bundle sizes, API calls... but not how we talk to AI. Why?

Here's a weird thing I noticed.

We spend hours shaving 200ms off a database query. We obsess over tree-shaking to save 12KB in a bundle. We cache API responses, debounce inputs, lazy-load images.

But when it comes to AI coding tools — the thing that's literally billing us per token — we just... let it rip?

I tracked everything for 107 sessions

I built a plugin for Claude Code that silently records every file read, every edit, every search. After 107 sessions, here's what I found:

  • 37% of all tokens went to files that were never edited or meaningfully used
  • Claude re-read page.tsx 189 times across my sessions. 60 of those were pure duplicates
  • A single package-lock.json read? 45,000 tokens. Gone.
  • Total waste: ~1.9M tokens. At Opus pricing, that's roughly $28 I lit on fire in two weeks

And I consider myself a fairly intentional Claude Code user.

The question I keep going back to

Is this even a problem worth solving?

On one hand — $60/month in waste adds up. Especially if you're on a team of 10. That's $7,200/year just on files your AI read and forgot.

On the other hand — maybe the cognitive overhead of "optimizing" your AI workflow is worse than the waste itself. Maybe we should just let the model read whatever it wants and focus on the actual work.

I genuinely don't know. I built https://github.com/egorfedorov/claude-context-optimizer and I use it daily, but I catch myself wondering: am I solving a real problem or am I just scratching a developer's optimization itch?

What the tool actually does (30-second version)

It's a Claude Code plugin. Zero config. Runs silently via hooks.

The killer feature: Read Cache — a PreToolUse hook that blocks Claude from re-reading files it already has in context. Same file, same range, no changes on disk? Blocked. Claude adapts and works with what it has.

Already loaded tracker.js this session (983 lines, ~9.3K tokens saved).

File unchanged — no need to re-read!

It also does .contextignore (like .gitignore but for AI), token budgets with auto-compact, session replay, and a heatmap that shows where your tokens actually went.

Result: 30-60% fewer tokens per session from read deduplication alone.

But here's what I actually want to discuss

Three questions for the community:

  1. Do you track your AI spending at all?

    I'm genuinely curious. Do you know how much you spend per session? Per week? Or is it just a monthly credit card charge you don't think about?

  2. Is "token efficiency" going to matter in 6 months?

    Prices are dropping. Context windows are growing. Maybe optimizing tokens today is like optimizing assembly in the age of high-level languages — technically correct but practically pointless.

  3. Who should solve this — the user or the model?
    Should AI tools be smarter about what they read? Or is it on us to curate context? My plugin takes the "intercept and block" approach, but maybe the right answer is that models should just... stop being wasteful on their own.

The cynical take

Someone will say: "you built a tool to save $60/month and spent 3 weeks building it." And yeah, fair. The ROI on my time is probably negative.

But I've learned more about how AI coding actually works by building this than from any blog post or documentation. Watching the token flow in real-time changes how you think about human-AI collaboration.

And maybe that's the real value — not the $60, but understanding what's actually happening under the hood.


https://github.com/egorfedorov/claude-context-optimizer | MIT | Zero telemetry | All data local

Install: npx skills add https://github.com/egorfedorov/claude-context-optimizer

Drop your answers in the comments. Especially #2 — I go back and forth on this daily.

Top comments (0)