Dangel Jesus Rodríguez

Posted on May 21

Per-user cost attribution for your AI APP

#ai #saas #opensource #agents

You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which of your users caused which $0.05.

This is the single most underrated metric in production LLM apps — cost per end-user — and it's surprisingly easy to instrument if you know what to do.

Here are the three approaches I've found work in practice, ranked by setup time.

Approach 1: Wrap your provider client (5 minutes)

Works for Express, Next.js Route Handlers, Fastify — anything that has a single OpenAI or Anthropic client instance.

import OpenAI from 'openai'
import { wrapOpenAI, withTrace } from '@voightxyz/openai'

const openai = wrapOpenAI(new OpenAI(), {
  agent: 'production-chat-api',
})

app.post('/api/chat', async (req, res) => {
  await withTrace(
    async () => {
      const r = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: req.body.messages,
      })
      res.json({ reply: r.choices[0].message })
    },
    {
      routeTag: 'POST /api/chat',
      tags: {
        userId: req.user.id,
        plan: req.user.plan,
      },
    },
  )
})

The trick is withTrace({ tags: { userId } }) at the request boundary. Every LLM call inside the block — direct or nested — inherits those tags automatically via AsyncLocalStorage. You don't have to thread userId through every function.

Pros: simplest. Pros: works with both OpenAI and Anthropic the same way.
Cons: requires you to use the dedicated wrapper SDKs.

Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)

If you're on the Vercel AI SDK, experimental_telemetry.metadata is the equivalent hook:

import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST(req: Request) {
  const result = streamText({
    model: openai('gpt-4o-mini'),
    prompt: (await req.json()).prompt,
    experimental_telemetry: {
      isEnabled: true,
      metadata: {
        userId: session.user.id,
        plan: session.user.plan,
      },
    },
  })
  return result.toAIStreamResponse()
}

This lifts onto ai.telemetry.metadata.<key> span attributes that any OpenTelemetry-compatible observability tool (Langfuse, Phoenix, Voight, Braintrust, Datadog) picks up.

Pros: zero coupling — pure OTel, swap exporters whenever.
Cons: only works if your SDK emits OTel spans. AI SDK does. Many others don't yet.

Approach 3: Raw event emission (autonomous bots / non-HTTP)

For background workers, agents calling LLMs in loops, or anything that doesn't have a request boundary — emit events manually:

import { Voight } from '@voightxyz/sdk'

const voight = new Voight({ agentId: 'my-bot' })

const t0 = Date.now()
const res = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
  body: JSON.stringify({
    model: 'gpt-4o-mini',
    messages: [...],
  }),
}).then((r) => r.json())

voight.log({
  type: 'reasoning',
  model: 'gpt-4o-mini',
  durationMs: Date.now() - t0,
  outcome: 'success',
  metadata: {
    tokens: {
      input: res.usage.prompt_tokens,
      output: res.usage.completion_tokens,
    },
    tags: {
      userId: job.userId,
      tenantId: job.tenantId,
    },
  },
})

This is more code per call, but you control everything. Useful when the LLM call doesn't fit cleanly inside a wrapper (e.g. you're proxying through your own router).

Pros: full control over what gets emitted.
Cons: more boilerplate. You're responsible for token counting.

What you can answer once `userId` is in your tags

Once tags.userId (or whatever you name it) is on every event, the questions you can answer change shape:

Question	How
Which user costs me the most this month?	Group by `tags.userId`, sum cost
Is my free tier subsidising power users?	Filter by `tags.plan: 'free'` + sort by cost
Did our last release explode anyone's bill?	Filter by `tags.userId` + date range
What's our cost-to-revenue ratio per customer?	Join with your Stripe data on `userId`

You don't need a separate analytics SDK on the client. You don't need to copy userId into LLM messages. You don't need anything custom on top — the tags propagate from the request boundary down to every span.

A note on GDPR / multi-tenant safety

userId here means your internal stable identifier — user_a3f9c2 or whatever — not the user's email or wallet. Never put PII into telemetry metadata. The good observability tools scrub PII anyway, but garbage-in is still garbage.

For multi-tenant SaaS, add a second tag: tags: { userId, tenantId }. That way you can ask both "which customer is this?" and "which of their users?".

Wrapping up

Three approaches, one mental model: stamp userId at the boundary, let it propagate to every LLM call inside the request.

The wrappers I used here are Apache 2.0:

@voightxyz/openai for OpenAI
@voightxyz/anthropic for Anthropic
@voightxyz/vercel-ai for the Vercel AI SDK
@voightxyz/sdk for library mode

Same approach works with Langfuse, Phoenix, Braintrust, or your existing OTel pipeline — the metadata.userId pattern is the universal part.

How do you currently track per-user spend in your AI app? Stripe metering? Server logs? Or have you been flying blind?

DEV Community

Per-user cost attribution for your AI APP

Approach 1: Wrap your provider client (5 minutes)

Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)

Approach 3: Raw event emission (autonomous bots / non-HTTP)

What you can answer once `userId` is in your tags

A note on GDPR / multi-tenant safety

Wrapping up

Top comments (0)

Approach 1: Wrap your provider client (5 minutes)

Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)

Approach 3: Raw event emission (autonomous bots / non-HTTP)

What you can answer once userId is in your tags

A note on GDPR / multi-tenant safety

Wrapping up

What you can answer once `userId` is in your tags