<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Naman Batish</title>
    <description>The latest articles on DEV Community by Naman Batish (@noahbatish).</description>
    <link>https://dev.to/noahbatish</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3888953%2Fae2a8292-cce9-499a-bc8d-ec82b1523b63.jpg</url>
      <title>DEV Community: Naman Batish</title>
      <link>https://dev.to/noahbatish</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/noahbatish"/>
    <language>en</language>
    <item>
      <title>I Tracked Every LLM API Call For a Week — 65% Were Unnecessary</title>
      <dc:creator>Naman Batish</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:23:31 +0000</pubDate>
      <link>https://dev.to/noahbatish/i-tracked-every-llm-api-call-for-a-week-65-were-unnecessary-5h9f</link>
      <guid>https://dev.to/noahbatish/i-tracked-every-llm-api-call-for-a-week-65-were-unnecessary-5h9f</guid>
      <description>&lt;p&gt;I've been using GPT-5 and Claude via API for coding tasks — refactoring, code review, architecture questions, debugging. The bill was creeping past $150/month and I had no idea which calls were actually worth the money.&lt;/p&gt;

&lt;p&gt;Provider dashboards show you totals. Tokens used, dollars spent, done. But they don't tell you &lt;em&gt;which specific calls&lt;/em&gt; were unnecessary. Was that $2.80 request for "where is the auth middleware" really worth sending to GPT-4o?&lt;/p&gt;

&lt;p&gt;So I built a tracker to find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The experiment
&lt;/h2&gt;

&lt;p&gt;I wrote a small Python library called &lt;a href="https://pypi.org/project/llm-costlog/" rel="noopener noreferrer"&gt;llm-costlog&lt;/a&gt; that wraps around any LLM API call and records:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokens used (prompt + completion)&lt;/li&gt;
&lt;li&gt;Cost in USD (built-in pricing for 40+ models)&lt;/li&gt;
&lt;li&gt;Route — did this go to the API, or was it handled locally?&lt;/li&gt;
&lt;li&gt;Intent — what kind of request was this? (code lookup, architecture question, debugging, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five lines to integrate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llm_cost_tracker&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CostTracker&lt;/span&gt;

&lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CostTracker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./costs.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;completion_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a week of tracking everything, I ran the waste analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;p&gt;Total cost: $0.2604&lt;br&gt;
Avoidable: 23 of 35 external calls&lt;br&gt;
Avoidable percent: 65.7%&lt;br&gt;
Wasted: $0.0204&lt;br&gt;
Model downgrade savings: $0.244776&lt;/p&gt;

&lt;p&gt;**65% of my external API calls were for things that didn't need an LLM at all. Symbol lookups, config checks, "where is this function defined," file searches. Stuff that can be answered by searching the codebase directly.&lt;/p&gt;

&lt;p&gt;This was from a small test run. The dollar amounts are tiny because the test used short prompts. But the ratio is what matters — at real-world usage with large contexts (2K-8K tokens per request, which is typical for code work), that 65% avoidable rate translates to serious money. If you're spending $150/month on LLM APIs and 65% of calls are avoidable, that's ~$100/month in waste.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I did about it
&lt;/h2&gt;

&lt;p&gt;Knowing the waste exists is step one. Fixing it automatically is step two.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://pypi.org/project/promptrouter/" rel="noopener noreferrer"&gt;promptrouter&lt;/a&gt; — a gateway that sits between your code and the LLM API. For every prompt, it decides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Can this be answered locally?&lt;/strong&gt; Symbol lookups, config checks, file searches → handled instantly, $0 cost. It has an AST parser that builds a call graph of your codebase, so "what calls this function" is answered from the parse tree, not by asking an LLM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Does this actually need an LLM?&lt;/strong&gt; Architecture questions, code review, complex debugging → sent to the API, but with compacted context. Instead of sending the whole repo, it packs only the 3-5 most relevant files into a token budget you control.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: the calls that stay local cost nothing. The calls that go external use 40-80% fewer input tokens.&lt;/p&gt;
&lt;h2&gt;
  
  
  Watching the waste score drop
&lt;/h2&gt;

&lt;p&gt;The tracker now has a &lt;code&gt;waste_score_trend&lt;/code&gt; feature that shows your efficiency improving over time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waste_score_trend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trend&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apr 12  waste=75.0%  (12/16 avoidable)&lt;br&gt;
Apr 14  waste=66.7%  (8/12 avoidable)&lt;br&gt;
Apr 16  waste=50.0%  (4/8 avoidable)&lt;br&gt;
Apr 18  waste=20.0%  (1/5 avoidable)&lt;br&gt;
Direction: improving ↓&lt;/p&gt;

&lt;p&gt;Watching that number drop from 75% to 20% over a week was the most satisfying part. Every prompt that gets rerouted locally is money that stays in your pocket.&lt;/p&gt;
&lt;h2&gt;
  
  
  The technical bits
&lt;/h2&gt;

&lt;p&gt;For anyone curious about the internals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Routing&lt;/strong&gt;: keyword classification + phrase detection. Not ML-based (yet), but 100% accurate on my test suite of 22 prompt types.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code search&lt;/strong&gt;: BM25 text matching + optional semantic search (sentence-transformers, all-MiniLM-L6-v2). Blended scoring: 60% BM25 + 40% semantic similarity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AST analysis&lt;/strong&gt;: Full call graph and import dependency tracing for Python and TypeScript/JavaScript. Regex-based for TS/JS, stdlib &lt;code&gt;ast&lt;/code&gt; module for Python. Zero external dependencies for either.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git integration&lt;/strong&gt;: Recent commits, blame, diffs as context — so "who changed this and when" doesn't burn tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost tracking&lt;/strong&gt;: SQLite-backed ledger with real token counts from the provider's usage block, priced against a built-in table of 40+ models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM client&lt;/strong&gt;: Speaks OpenAI, Anthropic, Ollama, and any OpenAI-compatible endpoint over plain HTTP. No SDK dependency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both tools are zero-dependency (stdlib only) for the core functionality. Embeddings and precise tokenization are optional extras.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Just want to see where your money goes?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;llm-costlog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/batish52/llm-cost-tracker" rel="noopener noreferrer"&gt;github.com/batish52/llm-cost-tracker&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want to fix the waste automatically?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;promptrouter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/batish52/codecontext" rel="noopener noreferrer"&gt;github.com/batish52/codecontext&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both are MIT licensed. Feedback, issues, and stars welcome — these are my first open source releases and I'm iterating fast based on user feedback. A Reddit commenter asked for TypeScript support and a waste score trend feature — both shipped within 24 hours.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>openai</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
