Tracking LLM costs across an entire app is easy. Finding out which customer is actually burning through your OpenAI bill? That's a nightmare.
For a while, we were just eating the cost. You look at the Stripe dashboard, look at the OpenAI invoice, and pray the margins make sense. But when a single power user decides to process 10k documents through Claude on a Saturday night, averages stop mattering real fast.
tbh, most billing dashboards are useless for this. They tell you you spent $400 yesterday, but not who spent it.
Here is what actually works in production, without over-engineering your entire stack.
The Metadata trick
If you're using OpenAI or Anthropic, you can pass metadata with every request. Don't just log it in your db. Pass the user_id or tenant_id directly to the provider.
OpenAI supports this natively. Anthropic has custom headers. OpenRouter makes it trivial.
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [...],
user: `tenant_${customer_id}` // <--- Do this. Always.
});
This simple line changes everything. Now you can at least export CSVs from your provider and run a pivot table. But fwiw, doing that manually every week gets old fast.
Building a real ingestion pipeline
We needed real-time budget alerts. If a user on a $19/mo plan burns $5 in API costs, I want a Slack ping before they hit $20.
We ended up building a pipeline for this: Next.js edge functions for ingestion, Supabase for fast time-series queries, and Inngest for async budget checking.
The flow:
- Proxy the request (or capture the token usage post-request).
- Fire an event to Inngest with
tenant_id,model, andtokens. - Inngest writes to a Supabase table.
- If
sum(cost)>budget_limit-> alert.
No Stripe integration needed. Just raw usage limits. Ymmv, but decoupling billing from cost-tracking saved us a lot of headaches.
Open sourcing the dashboard
I got tired of rebuilding this for every project, so I built LLMeter. It's an open-source dashboard specifically for this: per-tenant LLM cost tracking.
It tracks costs per model, per user, per day. Handles budget alerts. Supports OpenAI, Anthropic, DeepSeek, and OpenRouter out of the box.
Stack is Next.js, Supabase, Inngest. Itβs AGPL-3.0 licensed, completely free if you self-host.
We use it internally to keep margins in check.
Repo is up at llmeter.org.
How are you all handling multi-tenant API costs right now? Anyone found a better way to enforce per-user limits?
Top comments (0)