Pico

Posted on May 4 • Originally published at agentlair.dev

Vercel AI SDK telemetry that doesn't ship your prompts

#ai #vercel #aisdk #javascript

Most observability stories for LLM agents end the same way. You wire up an SDK. The dashboard fills with full prompts, full completions, tool arguments, retrieved documents. Beautiful for debugging. A nightmare for any system where someone outside your team is supposed to trust the data, because every byte of user content is now sitting somewhere a security review has to argue about.

@agentlair/vercel-ai shipped to npm yesterday at v0.1.1. It plugs into the Vercel AI SDK's experimental_telemetry hook and forwards behavioral signal to AgentLair. Zero runtime dependencies. Three lines to wire up. The part that took the most time to get right is what it doesn't capture.

Three lines

bun add @agentlair/vercel-ai ai @ai-sdk/openai

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { createAgentLairExporter } from "@agentlair/vercel-ai";

const agentlair = createAgentLairExporter({
  apiKey: process.env.AGENTLAIR_API_KEY!,
  agentId: "my-agent",
});

const { text } = await generateText({
  model: openai("gpt-4o-mini"),
  prompt: "Summarise last week's sales",
  tools: { /* ... */ },
  experimental_telemetry: { isEnabled: true, tracer: agentlair },
});

await agentlair.shutdown();

That's the integration. The exporter implements the AI SDK's TelemetryIntegration shape with five hooks: onStart, onToolCallStart, onToolCallFinish, onStepFinish, onFinish. Each one fires from inside the SDK's generation loop. The exporter pushes structured events into a buffer, and a background timer flushes them every five seconds to AgentLair's /v1/events endpoint.

What gets captured

AI SDK event	Action	Category	Data
`onStart`	`generation_start`	session	model id, provider, temperature, tool count
`onToolCallStart`	`{toolName}`	tool	tool name, call id
`onToolCallFinish`	`{toolName}_complete`	tool	duration, success/failure, error string
`onStepFinish`	`step_complete`	session	input/output/total tokens, finish reason, tool call count
`onFinish`	`generation_complete`	session	total token usage, finish reason, tool call count

For a five-step agent loop with two tool calls, AgentLair sees something like:

generation_start    model=gpt-4o-mini  tools=4  temperature=0.7
web_search          call_id=tc_1
web_search_complete duration_ms=412  success
calculate           call_id=tc_2
calculate_complete  duration_ms=18   success
step_complete       finish_reason=stop  total_tokens=842  tool_calls=2
generation_complete finish_reason=stop  total_tokens=842

Five events. Enough to know the agent ran, what it called, in what order, how long each call took, and whether anything broke. Not enough to know what the user asked, or what the model said back.

What's deliberately not captured

The shape of this got argued over on a call two weeks ago. Capture more, the dashboard becomes more useful. That pull is obvious. Did you want every user prompt ending up in a third-party event log? Less obvious, but for everyone in the room, the answer was no.

Prompt and response text never leave your process. The system prompt stays in your container. The user message stays. The model's completion stays. None of those values get serialized into events at any point in the default config.

Tool arguments stay too. When web_search({"query": "internal doc title"}) fires, AgentLair sees the string web_search and the duration. The query argument is not in the payload. The query may be public, may be private. The exporter has no way to tell, so it doesn't try.

There's a switch for the cases where you do want minimal input/output framing:

const agentlair = createAgentLairExporter({
  apiKey: process.env.AGENTLAIR_API_KEY!,
  agentId: "my-agent",
  captureInputs: true,    // logs system_prompt_length, message_count
  captureOutputs: true,   // logs output_length
});

Lengths, not contents. If your model returns a 2,400-character response after a 180-character system prompt and a five-message history, those numbers travel. The text doesn't.

This is the design intent, not an oversight. AgentLair's job is behavioral attestation: did this agent do what it said it does, consistently, over time? That question is answerable from a stream of (timestamp, action, duration, success) triples. Storing every prompt to answer it would be wrong on two axes: privacy load on the integrator, and signal-to-noise for the trust score.

If you want full trace dumps for debugging, experimental_telemetry already supports OTel tracers. Wire one up alongside the AgentLair exporter and you get both. Different layers, different storage, different retention policies.

Why this matters for agent-as-customer

When a human signs up for a service, the service runs identity proofing. KYC, behavioral fraud signals, account history. None of that infrastructure exists for agents yet. If your agent calls somebody else's API in production, that other party has nothing to verify against.

The trust profile AgentLair builds from these events is meant to be that signal. Score 0 to 100, level (intern through principal), per-dimension breakdown across consistency, restraint, transparency. Visible at https://agentlair.dev/badge/{agentId}/score.json. The verifier on the other side reads it and decides whether to extend capability.

You can't build that score from prompts. You can build it from (timestamp, action, duration, success). So that's what this exporter ships.

Try it

Get an API key at agentlair.dev/quickstart. About 90 seconds, no card.

bun add @agentlair/vercel-ai

Source at github.com/piiiico/agentlair-vercel-ai. LangChain.js users: the parallel package is @agentlair/langchain.

Issues welcome. Especially if you find a path where prompt content leaks somewhere it shouldn't.

DEV Community