DEV Community: Dangel Jesus Rodríguez

Per-user cost attribution for your AI APP

Dangel Jesus Rodríguez — Thu, 21 May 2026 23:39:13 +0000

You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which of your users caused which $0.05.

This is the single most underrated metric in production LLM apps — cost per end-user — and it's surprisingly easy to instrument if you know what to do.

Here are the three approaches I've found work in practice, ranked by setup time.

Approach 1: Wrap your provider client (5 minutes)

Works for Express, Next.js Route Handlers, Fastify — anything that has a single OpenAI or Anthropic client instance.

import OpenAI from 'openai'
import { wrapOpenAI, withTrace } from '@voightxyz/openai'

const openai = wrapOpenAI(new OpenAI(), {
  agent: 'production-chat-api',
})

app.post('/api/chat', async (req, res) => {
  await withTrace(
    async () => {
      const r = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: req.body.messages,
      })
      res.json({ reply: r.choices[0].message })
    },
    {
      routeTag: 'POST /api/chat',
      tags: {
        userId: req.user.id,
        plan: req.user.plan,
      },
    },
  )
})

The trick is withTrace({ tags: { userId } }) at the request boundary. Every LLM call inside the block — direct or nested — inherits those tags automatically via AsyncLocalStorage. You don't have to thread userId through every function.

Pros: simplest. Pros: works with both OpenAI and Anthropic the same way.
Cons: requires you to use the dedicated wrapper SDKs.

Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)

If you're on the Vercel AI SDK, experimental_telemetry.metadata is the equivalent hook:

import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST(req: Request) {
  const result = streamText({
    model: openai('gpt-4o-mini'),
    prompt: (await req.json()).prompt,
    experimental_telemetry: {
      isEnabled: true,
      metadata: {
        userId: session.user.id,
        plan: session.user.plan,
      },
    },
  })
  return result.toAIStreamResponse()
}

This lifts onto ai.telemetry.metadata.<key> span attributes that any OpenTelemetry-compatible observability tool (Langfuse, Phoenix, Voight, Braintrust, Datadog) picks up.

Pros: zero coupling — pure OTel, swap exporters whenever.
Cons: only works if your SDK emits OTel spans. AI SDK does. Many others don't yet.

Approach 3: Raw event emission (autonomous bots / non-HTTP)

For background workers, agents calling LLMs in loops, or anything that doesn't have a request boundary — emit events manually:

import { Voight } from '@voightxyz/sdk'

const voight = new Voight({ agentId: 'my-bot' })

const t0 = Date.now()
const res = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
  body: JSON.stringify({
    model: 'gpt-4o-mini',
    messages: [...],
  }),
}).then((r) => r.json())

voight.log({
  type: 'reasoning',
  model: 'gpt-4o-mini',
  durationMs: Date.now() - t0,
  outcome: 'success',
  metadata: {
    tokens: {
      input: res.usage.prompt_tokens,
      output: res.usage.completion_tokens,
    },
    tags: {
      userId: job.userId,
      tenantId: job.tenantId,
    },
  },
})

This is more code per call, but you control everything. Useful when the LLM call doesn't fit cleanly inside a wrapper (e.g. you're proxying through your own router).

Pros: full control over what gets emitted.
Cons: more boilerplate. You're responsible for token counting.

What you can answer once `userId` is in your tags

Once tags.userId (or whatever you name it) is on every event, the questions you can answer change shape:

Question	How
Which user costs me the most this month?	Group by `tags.userId`, sum cost
Is my free tier subsidising power users?	Filter by `tags.plan: 'free'` + sort by cost
Did our last release explode anyone's bill?	Filter by `tags.userId` + date range
What's our cost-to-revenue ratio per customer?	Join with your Stripe data on `userId`

You don't need a separate analytics SDK on the client. You don't need to copy userId into LLM messages. You don't need anything custom on top — the tags propagate from the request boundary down to every span.

A note on GDPR / multi-tenant safety

userId here means your internal stable identifier — user_a3f9c2 or whatever — not the user's email or wallet. Never put PII into telemetry metadata. The good observability tools scrub PII anyway, but garbage-in is still garbage.

For multi-tenant SaaS, add a second tag: tags: { userId, tenantId }. That way you can ask both "which customer is this?" and "which of their users?".

Wrapping up

Three approaches, one mental model: stamp userId at the boundary, let it propagate to every LLM call inside the request.

The wrappers I used here are Apache 2.0:

@voightxyz/openai for OpenAI
@voightxyz/anthropic for Anthropic
@voightxyz/vercel-ai for the Vercel AI SDK
@voightxyz/sdk for library mode

Same approach works with Langfuse, Phoenix, Braintrust, or your existing OTel pipeline — the metadata.userId pattern is the universal part.

How do you currently track per-user spend in your AI app? Stripe metering? Server logs? Or have you been flying blind?

Adding observability to your Vercel AI SDK app in 30 seconds

Dangel Jesus Rodríguez — Thu, 21 May 2026 21:33:06 +0000

Last week I was debugging a streamText call in my Next.js chatbot and realized I had no idea how many tokens it actually used, what the latency was, or how much it cost — three things I really should know in production.

The Vercel AI SDK emits OpenTelemetry spans natively the moment you flip experimental_telemetry: { isEnabled: true }. The wire is there. You just need something to listen on the other end.

This is the 30-second setup I ended up with.

Install

npm install @voightxyz/vercel-ai @vercel/otel @ai-sdk/otel

Three small packages:

@vercel/otel — Vercel's OpenTelemetry bootstrap for Next.js
@ai-sdk/otel — bridges the AI SDK's telemetry into OpenTelemetry
@voightxyz/vercel-ai — the SpanExporter that sends the captured spans somewhere you can read them (in my case, the Voight dashboard, but this is just a standard OTel SpanExporter — pair it with whatever)

Register the exporter

In instrumentation.ts (Next.js convention — root of the project, or src/instrumentation.ts if you use src/):

import { registerTelemetry } from 'ai'
import { LegacyOpenTelemetry } from '@ai-sdk/otel'
import { registerOTel } from '@vercel/otel'
import { VoightExporter } from '@voightxyz/vercel-ai'

registerTelemetry(new LegacyOpenTelemetry())

export function register() {
  registerOTel({
    serviceName: 'my-chatbot',
    traceExporter: new VoightExporter({
      agent: 'production-chat-api',
      privacy: 'standard',
    }),
  })
}

That's the whole instrumentation step. Done.

Enable telemetry on your LLM calls

In your route handler:

import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST(req: Request) {
  const result = streamText({
    model: openai('gpt-4o-mini'),
    prompt: (await req.json()).prompt,
    experimental_telemetry: {
      isEnabled: true,
      functionId: 'stream-text',
    },
  })
  return result.toAIStreamResponse()
}

Now every streamText / generateText / streamObject / generateObject call carries token counts, model id, prompt, response text, tool calls, finish reason, and latency — all the way to the exporter.

Bonus: per-user cost attribution in one line

This is the part that surprised me. You can attach arbitrary metadata to the telemetry block, and a good exporter will surface it as searchable tags:

experimental_telemetry: {
  isEnabled: true,
  metadata: {
    userId: session.user.id,
    plan: session.user.plan,
    org: session.user.org,
  },
}

In the Voight dashboard this populates a "Users" sub-tab automatically — you get cost-per-end-user without writing any analytics code. The metadata.userId key is the one that triggers the per-user aggregation; everything else becomes a filterable tag.

If you've ever needed to answer "which of my users is costing me the most?" — this is how. Same pattern works regardless of which observability backend you wire up; it's just ai.telemetry.metadata.<key> span attributes under the hood.

What you actually get

After the 3 steps above, every LLM call carries:

Signal	Where
Model ID	`model`
Provider (`openai`, `anthropic`, …)	`metadata.provider`
Prompts and response text	with optional PII scrubbing
Token counts (input, output, cache reads)	`metadata.tokens`
Tool calls with arguments	`metadata.toolCalls`
Streaming flag	`metadata.streaming`
Latency	`durationMs`
Errors	`outcome: 'failed'` + `errorMessage`

Why this approach (not a middleware)

The Vercel AI SDK docs list a long row of observability providers (Langfuse, Phoenix, Braintrust, Datadog, Sentry, W&B). They all consume the same OpenTelemetry wire — none of them require a custom middleware that wraps your streamText call.

That's by design. If you commit to a middleware, you've coupled to one vendor's API. If you commit to OpenTelemetry, you can swap exporters whenever your needs change, pair multiple exporters via MultiSpanProcessor, or wire into your existing OTel pipeline at zero cost.

Wrapping up

If you've been putting off adding observability to your AI app because the existing tutorials are 50 lines of setup — try the 30-second version. The Vercel AI SDK gives you the telemetry for free; all you're really doing is pointing it at a backend.

If you're curious about Voight specifically (the exporter I used here), it's Apache 2.0:

npm: @voightxyz/vercel-ai
Docs: docs.voight.xyz/ai-apps/vercel-ai

What observability backend are you using for your AI SDK apps? Did you go with one of the big OTel-friendly options or roll your own?

DEV Community: Dangel Jesus Rodríguez

Per-user cost attribution for your AI APP

Approach 1: Wrap your provider client (5 minutes)

Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)

Approach 3: Raw event emission (autonomous bots / non-HTTP)

What you can answer once userId is in your tags

A note on GDPR / multi-tenant safety

Wrapping up

Adding observability to your Vercel AI SDK app in 30 seconds

Install

Register the exporter

Enable telemetry on your LLM calls

Bonus: per-user cost attribution in one line

What you actually get

Why this approach (not a middleware)

Wrapping up

What you can answer once `userId` is in your tags