I gave my team Claude Code, Codex, and Kimi. A month later I opened four separate billing pages and still couldn't answer three basic questions: who spent what, on which models, and whether anyone was even still using the tools after the first-week excitement wore off.
Every provider has its own billing page, and none of them line up with my teams and users. So I built a small, self-hostable reference implementation that puts one gateway in front of everything.
Repo (MIT): https://github.com/0xkaz/llm-governance-dashboard
The shape of it
Coding agents (Codex / Claude Code / Kimi)
│ per-user key, base_url = the proxy
▼
LiteLLM proxy ── model name picks the upstream ──► OpenAI / Anthropic / OpenRouter / Kimi
│ logs every request (callback)
▼
BigQuery ──► FastAPI dashboard (cost, budgets, adoption)
- LiteLLM is the gateway: one OpenAI/Anthropic-compatible endpoint for every provider, with per-user virtual keys and budgets.
- A callback streams every request into BigQuery — cost, tokens, latency, team, user, model.
- A small FastAPI app renders the dashboard server-side (no JS build), plus budget alerts to Slack/email and self-service key issue/revoke.
Everything runs on your laptop via Docker; I verified the whole path on real GCP.
Two things it gives you
- Cost governance — spend/latency by user, team and model; budget alerts at 80/100%; virtual keys people can issue and revoke themselves (no one touches the master key).
- Adoption tracking — the more interesting half: is the team actually using the agents, and does it stick? Weekly-active rate, repeat usage, and how many different tools each person uses. Aggregated by team; individual usage stays on each person's own page. I deliberately track adoption, not a per-developer output number — usage that sticks is a signal you can act on; a leaderboard is one people just game.
Three things that surprised me
Streaming hides the cost. Providers often omit token usage on streamed responses, so the gateway's own per-key spend under-counts — sometimes down to $0 for a request that clearly cost money. Two fixes: I inject stream_options.include_usage=true at the proxy, and I backfill cost from tokens into BigQuery so the dashboard is accurate regardless. The lesson that stuck: treat the warehouse as the source of truth for cost, and the gateway's hard budget-block as best-effort. They'll disagree; that's fine once you know which to trust for which job.
Subscription CLIs bypass the gateway — but that's a routing problem, not a measurement limit. A ChatGPT-logged-in Codex, the claude.ai app, and Moonshot's Kimi CLI talk to their vendor directly, so in that mode nothing is logged. But anything that goes through the gateway is measured — including subscription / flat-rate usage (tokens and latency are recorded; cost just shows 0). The fix is routing: force traffic through the proxy (API-key auth, or a generic OpenAI-compatible client for Kimi) and block direct egress. That's org policy, not a tool limitation — worth being honest about both halves.
Per-user attribution needs virtual keys. With the raw master key, everything shows up as one user. Hand each person a scoped key and user_id / team land on every row in BigQuery — and suddenly the dashboard is actually useful.
What it is (and isn't)
It's a demo / reference implementation, not a production product. No TLS, single node, no multi-tenant hardening. BigQuery and Looker are swappable for any warehouse/BI — the real requirement is just one place that holds every per-request log.
It's also deliberately different from the observability tools you'll reach for first. Helicone, Langfuse, and Portkey are great at tracing and debugging calls; this is thinner and pointed at a different question — governance and adoption: who's allowed to spend what, and is the team actually using the tools. Complementary, not a competitor.
If that fits how you'd approach the problem, fork it.
Curious how others are governing AI coding spend across a team — gateway-first like this, or something else?


Top comments (0)