Last week I was debugging a streamText call in my Next.js chatbot and realized I had no idea how many tokens it actually used, what the latency was, or how much it cost — three things I really should know in production.
The Vercel AI SDK emits OpenTelemetry spans natively the moment you flip experimental_telemetry: { isEnabled: true }. The wire is there. You just need something to listen on the other end.
This is the 30-second setup I ended up with.
Install
npm install @voightxyz/vercel-ai @vercel/otel @ai-sdk/otel
Three small packages:
-
@vercel/otel— Vercel's OpenTelemetry bootstrap for Next.js -
@ai-sdk/otel— bridges the AI SDK's telemetry into OpenTelemetry -
@voightxyz/vercel-ai— the SpanExporter that sends the captured spans somewhere you can read them (in my case, the Voight dashboard, but this is just a standard OTel SpanExporter — pair it with whatever)
Register the exporter
In instrumentation.ts (Next.js convention — root of the project, or src/instrumentation.ts if you use src/):
import { registerTelemetry } from 'ai'
import { LegacyOpenTelemetry } from '@ai-sdk/otel'
import { registerOTel } from '@vercel/otel'
import { VoightExporter } from '@voightxyz/vercel-ai'
registerTelemetry(new LegacyOpenTelemetry())
export function register() {
registerOTel({
serviceName: 'my-chatbot',
traceExporter: new VoightExporter({
agent: 'production-chat-api',
privacy: 'standard',
}),
})
}
That's the whole instrumentation step. Done.
Enable telemetry on your LLM calls
In your route handler:
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
export async function POST(req: Request) {
const result = streamText({
model: openai('gpt-4o-mini'),
prompt: (await req.json()).prompt,
experimental_telemetry: {
isEnabled: true,
functionId: 'stream-text',
},
})
return result.toAIStreamResponse()
}
Now every streamText / generateText / streamObject / generateObject call carries token counts, model id, prompt, response text, tool calls, finish reason, and latency — all the way to the exporter.
Bonus: per-user cost attribution in one line
This is the part that surprised me. You can attach arbitrary metadata to the telemetry block, and a good exporter will surface it as searchable tags:
experimental_telemetry: {
isEnabled: true,
metadata: {
userId: session.user.id,
plan: session.user.plan,
org: session.user.org,
},
}
In the Voight dashboard this populates a "Users" sub-tab automatically — you get cost-per-end-user without writing any analytics code. The metadata.userId key is the one that triggers the per-user aggregation; everything else becomes a filterable tag.
If you've ever needed to answer "which of my users is costing me the most?" — this is how. Same pattern works regardless of which observability backend you wire up; it's just ai.telemetry.metadata.<key> span attributes under the hood.
What you actually get
After the 3 steps above, every LLM call carries:
| Signal | Where |
|---|---|
| Model ID | model |
Provider (openai, anthropic, …) |
metadata.provider |
| Prompts and response text | with optional PII scrubbing |
| Token counts (input, output, cache reads) | metadata.tokens |
| Tool calls with arguments | metadata.toolCalls |
| Streaming flag | metadata.streaming |
| Latency | durationMs |
| Errors |
outcome: 'failed' + errorMessage
|
Why this approach (not a middleware)
The Vercel AI SDK docs list a long row of observability providers (Langfuse, Phoenix, Braintrust, Datadog, Sentry, W&B). They all consume the same OpenTelemetry wire — none of them require a custom middleware that wraps your streamText call.
That's by design. If you commit to a middleware, you've coupled to one vendor's API. If you commit to OpenTelemetry, you can swap exporters whenever your needs change, pair multiple exporters via MultiSpanProcessor, or wire into your existing OTel pipeline at zero cost.
Wrapping up
If you've been putting off adding observability to your AI app because the existing tutorials are 50 lines of setup — try the 30-second version. The Vercel AI SDK gives you the telemetry for free; all you're really doing is pointing it at a backend.
If you're curious about Voight specifically (the exporter I used here), it's Apache 2.0:
What observability backend are you using for your AI SDK apps? Did you go with one of the big OTel-friendly options or roll your own?
Top comments (0)