<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vikrant Shukla</title>
    <description>The latest articles on DEV Community by Vikrant Shukla (@haltonlabs).</description>
    <link>https://dev.to/haltonlabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920482%2F5bf7a95e-c7ed-4301-b55e-e60c1f995c2f.png</url>
      <title>DEV Community: Vikrant Shukla</title>
      <link>https://dev.to/haltonlabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/haltonlabs"/>
    <language>en</language>
    <item>
      <title>I built a local proxy to track exact LLM API costs per project</title>
      <dc:creator>Vikrant Shukla</dc:creator>
      <pubDate>Fri, 08 May 2026 17:16:11 +0000</pubDate>
      <link>https://dev.to/haltonlabs/i-built-a-local-proxy-to-track-exact-llm-api-costs-per-project-1j82</link>
      <guid>https://dev.to/haltonlabs/i-built-a-local-proxy-to-track-exact-llm-api-costs-per-project-1j82</guid>
      <description>&lt;p&gt;The problem was simple: I run a small software studio that builds &lt;br&gt;
client work heavily using Claude. Every project ended with the same &lt;br&gt;
awkward conversation — "what did the AI actually cost?"&lt;/p&gt;

&lt;p&gt;Token estimates drift from the real bill. Nothing attributed costs &lt;br&gt;
per project. I was either undercharging or handwaving, neither of &lt;br&gt;
which builds client trust.&lt;/p&gt;

&lt;p&gt;So I built Halton Meter.&lt;/p&gt;
&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Halton Meter is a local mitmproxy-based daemon that intercepts &lt;br&gt;
outbound LLM API traffic, attributes each request to a project, &lt;br&gt;
computes exact cost from published pricing, and writes everything &lt;br&gt;
to a local SQLite database. Nothing about how you call the API changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;halton-meter
halton-meter init &lt;span class="nt"&gt;--apps&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three processes come up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Edge listener on &lt;code&gt;127.0.0.1:8081&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;proxy interceptor on &lt;code&gt;127.0.0.1:8090&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Loopback API on &lt;code&gt;127.0.0.1:8765&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every LLM API call you make — via the Anthropic SDK, OpenAI SDK, &lt;br&gt;
raw HTTP, whatever — gets intercepted, tagged, costed, and logged.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a proxy and not an SDK wrapper?
&lt;/h2&gt;

&lt;p&gt;SDK wrappers only catch calls you make directly. They miss ChatGPT, &lt;br&gt;
Gemini Code Assist, and anything going through a tool you don't &lt;br&gt;
control. A proxy captures everything on the wire without touching &lt;br&gt;
your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project attribution
&lt;/h2&gt;

&lt;p&gt;The daemon attributes each request using a three-step chain:``&lt;/p&gt;

&lt;p&gt;Claude Code sessions, scripts, notebooks, and direct SDK calls all &lt;br&gt;
get attributed correctly with zero code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terminal report
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;halton-meter report&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30j2q6zxes17avn2k4tj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30j2q6zxes17avn2k4tj.png" alt=" " width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Breakdown by project, model, and date. Numbers come directly from &lt;br&gt;
the provider's published pricing — no estimates, no hidden margins.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's supported
&lt;/h2&gt;

&lt;p&gt;Six adapters across four providers: Claude, OpenAI, Gemini, and Grok. &lt;br&gt;
Direct API surfaces and OAuth surfaces (ChatGPT, Gemini Code Assist) &lt;br&gt;
both intercepted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local only
&lt;/h2&gt;

&lt;p&gt;No cloud. No tracking. API keys never leave your machine. Everything &lt;br&gt;
lives in &lt;code&gt;~/.halton-meter/db.sqlite&lt;/code&gt;. The bundled dashboard is open &lt;br&gt;
source (Apache 2.0) and runs locally.&lt;/p&gt;




&lt;p&gt;Docs and full architecture at &lt;a href="https://haltonmeter.com" rel="noopener noreferrer"&gt;haltonmeter.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Happy to answer questions on the proxy architecture, the attribution &lt;br&gt;
chain, or the cost calculation in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>openai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
