Roshan Mayengbam

Posted on Jun 4

I built a self-hosted AI cost tracker after our team's bill exploded

#ai #llm #monitoring #showdev

Last month something embarrassing happened.

Our team's OpenAI bill doubled.

Nobody knew why. Nobody knew who caused it.
We just got one giant number at the end
of the month with zero breakdown.

I spent a week digging through logs manually
trying to figure out which engineer or
workflow was responsible.

That week I decided to build something.

The Problem

Every company using OpenAI, Claude, or
Gemini has the same issue.

You get one bill. One number. Nothing else.

No breakdown by engineer.
No breakdown by team.
No breakdown by feature.

You cannot set limits per person.
You cannot get warned before it explodes.
You cannot automatically stop runaway spend.

Microsoft felt this at scale when they
cancelled Claude Code licenses company wide.
Uber felt this when they burned their entire
2026 AI budget in 4 months.

But this is not just a big company problem.
Any team with 5+ engineers using AI APIs
faces this every month.

What I Built

TokenGuard is a self-hosted proxy that
sits between your engineers and
OpenAI/Claude/Gemini.

Engineers change 2 lines of code:

Before:
api_key = "sk-openai-real-key"
base_url = "https://api.openai.com"

After:
api_key = "tg_live_yourkey"
base_url = "http://tokenguard.yourserver.com/proxy/openai"

That is the entire integration.
Nothing else changes for engineers.

What You Get

Real-time dashboard showing:
→ Exactly who spent what this month
→ Which team is approaching their limit
→ Which AI model is costing the most
→ Automatic alerts at 80% budget
→ Auto-block or reroute when limit hit

The smart routing part is my favourite feature.

Instead of hard blocking an engineer
when they hit their limit — TokenGuard
silently switches their request to a
cheaper model automatically.

Engineer keeps working.
Company stops overspending.
Nobody notices anything changed.

Why Self-Hosted Matters

Every competitor routes your prompts
through their servers.

That means your code, your data, your
business logic goes through a third
party system.

No enterprise company will accept that.

TokenGuard runs entirely on your own
server. One Docker command to deploy.
Your data never leaves your network.

Current Status

Core proxy is working and tested
with real API calls.

Dashboard is complete with:
→ Usage tracking per employee
→ Team budget management
→ Smart routing rules
→ Alerts console
→ Reports and CSV export

Still finishing some parts but
the core works.

Looking For

I am looking for 3-5 engineering teams
who want to beta test this for free
in exchange for honest feedback.

If your team uses OpenAI, Claude, or
Gemini APIs and gets surprised by
monthly bills — I would love to talk.

Not a sales pitch.
Just want real feedback from teams
dealing with this problem.

Drop a comment or email me:
[your email here]

Built solo over 2 months.
Still learning. Still building.
Honest feedback welcome.

DEV Community