You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which of your users caused which $0.05.
This is the single most underrated metric in production LLM apps — cost per end-user — and it's surprisingly easy to instrument if you know what to do.
Here are the three approaches I've found work in practice, ranked by setup time.
Approach 1: Wrap your provider client (5 minutes)
Works for Express, Next.js Route Handlers, Fastify — anything that has a single OpenAI or Anthropic client instance.
import OpenAI from 'openai'
import { wrapOpenAI, withTrace } from '@voightxyz/openai'
const openai = wrapOpenAI(new OpenAI(), {
agent: 'production-chat-api',
})
app.post('/api/chat', async (req, res) => {
await withTrace(
async () => {
const r = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: req.body.messages,
})
res.json({ reply: r.choices[0].message })
},
{
routeTag: 'POST /api/chat',
tags: {
userId: req.user.id,
plan: req.user.plan,
},
},
)
})
The trick is withTrace({ tags: { userId } }) at the request boundary. Every LLM call inside the block — direct or nested — inherits those tags automatically via AsyncLocalStorage. You don't have to thread userId through every function.
Pros: simplest. Pros: works with both OpenAI and Anthropic the same way.
Cons: requires you to use the dedicated wrapper SDKs.
Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)
If you're on the Vercel AI SDK, experimental_telemetry.metadata is the equivalent hook:
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
export async function POST(req: Request) {
const result = streamText({
model: openai('gpt-4o-mini'),
prompt: (await req.json()).prompt,
experimental_telemetry: {
isEnabled: true,
metadata: {
userId: session.user.id,
plan: session.user.plan,
},
},
})
return result.toAIStreamResponse()
}
This lifts onto ai.telemetry.metadata.<key> span attributes that any OpenTelemetry-compatible observability tool (Langfuse, Phoenix, Voight, Braintrust, Datadog) picks up.
Pros: zero coupling — pure OTel, swap exporters whenever.
Cons: only works if your SDK emits OTel spans. AI SDK does. Many others don't yet.
Approach 3: Raw event emission (autonomous bots / non-HTTP)
For background workers, agents calling LLMs in loops, or anything that doesn't have a request boundary — emit events manually:
import { Voight } from '@voightxyz/sdk'
const voight = new Voight({ agentId: 'my-bot' })
const t0 = Date.now()
const res = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
body: JSON.stringify({
model: 'gpt-4o-mini',
messages: [...],
}),
}).then((r) => r.json())
voight.log({
type: 'reasoning',
model: 'gpt-4o-mini',
durationMs: Date.now() - t0,
outcome: 'success',
metadata: {
tokens: {
input: res.usage.prompt_tokens,
output: res.usage.completion_tokens,
},
tags: {
userId: job.userId,
tenantId: job.tenantId,
},
},
})
This is more code per call, but you control everything. Useful when the LLM call doesn't fit cleanly inside a wrapper (e.g. you're proxying through your own router).
Pros: full control over what gets emitted.
Cons: more boilerplate. You're responsible for token counting.
What you can answer once userId is in your tags
Once tags.userId (or whatever you name it) is on every event, the questions you can answer change shape:
| Question | How |
|---|---|
| Which user costs me the most this month? | Group by tags.userId, sum cost |
| Is my free tier subsidising power users? | Filter by tags.plan: 'free' + sort by cost |
| Did our last release explode anyone's bill? | Filter by tags.userId + date range |
| What's our cost-to-revenue ratio per customer? | Join with your Stripe data on userId
|
You don't need a separate analytics SDK on the client. You don't need to copy userId into LLM messages. You don't need anything custom on top — the tags propagate from the request boundary down to every span.
A note on GDPR / multi-tenant safety
userId here means your internal stable identifier — user_a3f9c2 or whatever — not the user's email or wallet. Never put PII into telemetry metadata. The good observability tools scrub PII anyway, but garbage-in is still garbage.
For multi-tenant SaaS, add a second tag: tags: { userId, tenantId }. That way you can ask both "which customer is this?" and "which of their users?".
Wrapping up
Three approaches, one mental model: stamp userId at the boundary, let it propagate to every LLM call inside the request.
The wrappers I used here are Apache 2.0:
- @voightxyz/openai for OpenAI
- @voightxyz/anthropic for Anthropic
- @voightxyz/vercel-ai for the Vercel AI SDK
- @voightxyz/sdk for library mode
Same approach works with Langfuse, Phoenix, Braintrust, or your existing OTel pipeline — the metadata.userId pattern is the universal part.
How do you currently track per-user spend in your AI app? Stripe metering? Server logs? Or have you been flying blind?
Top comments (0)