Most observability stories for LLM agents end the same way. You wire up an SDK. The dashboard fills with full prompts, full completions, tool arguments, retrieved documents. Beautiful for debugging. A nightmare for any system where someone outside your team is supposed to trust the data, because every byte of user content is now sitting somewhere a security review has to argue about.
@agentlair/vercel-ai shipped to npm yesterday at v0.1.1. It plugs into the Vercel AI SDK's experimental_telemetry hook and forwards behavioral signal to AgentLair. Zero runtime dependencies. Three lines to wire up. The part that took the most time to get right is what it doesn't capture.
Three lines
bun add @agentlair/vercel-ai ai @ai-sdk/openai
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { createAgentLairExporter } from "@agentlair/vercel-ai";
const agentlair = createAgentLairExporter({
apiKey: process.env.AGENTLAIR_API_KEY!,
agentId: "my-agent",
});
const { text } = await generateText({
model: openai("gpt-4o-mini"),
prompt: "Summarise last week's sales",
tools: { /* ... */ },
experimental_telemetry: { isEnabled: true, tracer: agentlair },
});
await agentlair.shutdown();
That's the integration. The exporter implements the AI SDK's TelemetryIntegration shape with five hooks: onStart, onToolCallStart, onToolCallFinish, onStepFinish, onFinish. Each one fires from inside the SDK's generation loop. The exporter pushes structured events into a buffer, and a background timer flushes them every five seconds to AgentLair's /v1/events endpoint.
What gets captured
| AI SDK event | Action | Category | Data |
|---|---|---|---|
onStart |
generation_start |
session | model id, provider, temperature, tool count |
onToolCallStart |
{toolName} |
tool | tool name, call id |
onToolCallFinish |
{toolName}_complete |
tool | duration, success/failure, error string |
onStepFinish |
step_complete |
session | input/output/total tokens, finish reason, tool call count |
onFinish |
generation_complete |
session | total token usage, finish reason, tool call count |
For a five-step agent loop with two tool calls, AgentLair sees something like:
generation_start model=gpt-4o-mini tools=4 temperature=0.7
web_search call_id=tc_1
web_search_complete duration_ms=412 success
calculate call_id=tc_2
calculate_complete duration_ms=18 success
step_complete finish_reason=stop total_tokens=842 tool_calls=2
generation_complete finish_reason=stop total_tokens=842
Five events. Enough to know the agent ran, what it called, in what order, how long each call took, and whether anything broke. Not enough to know what the user asked, or what the model said back.
What's deliberately not captured
The shape of this got argued over on a call two weeks ago. Capture more, the dashboard becomes more useful. That pull is obvious. Did you want every user prompt ending up in a third-party event log? Less obvious, but for everyone in the room, the answer was no.
Prompt and response text never leave your process. The system prompt stays in your container. The user message stays. The model's completion stays. None of those values get serialized into events at any point in the default config.
Tool arguments stay too. When web_search({"query": "internal doc title"}) fires, AgentLair sees the string web_search and the duration. The query argument is not in the payload. The query may be public, may be private. The exporter has no way to tell, so it doesn't try.
There's a switch for the cases where you do want minimal input/output framing:
const agentlair = createAgentLairExporter({
apiKey: process.env.AGENTLAIR_API_KEY!,
agentId: "my-agent",
captureInputs: true, // logs system_prompt_length, message_count
captureOutputs: true, // logs output_length
});
Lengths, not contents. If your model returns a 2,400-character response after a 180-character system prompt and a five-message history, those numbers travel. The text doesn't.
This is the design intent, not an oversight. AgentLair's job is behavioral attestation: did this agent do what it said it does, consistently, over time? That question is answerable from a stream of (timestamp, action, duration, success) triples. Storing every prompt to answer it would be wrong on two axes: privacy load on the integrator, and signal-to-noise for the trust score.
If you want full trace dumps for debugging, experimental_telemetry already supports OTel tracers. Wire one up alongside the AgentLair exporter and you get both. Different layers, different storage, different retention policies.
Why this matters for agent-as-customer
When a human signs up for a service, the service runs identity proofing. KYC, behavioral fraud signals, account history. None of that infrastructure exists for agents yet. If your agent calls somebody else's API in production, that other party has nothing to verify against.
The trust profile AgentLair builds from these events is meant to be that signal. Score 0 to 100, level (intern through principal), per-dimension breakdown across consistency, restraint, transparency. Visible at https://agentlair.dev/badge/{agentId}/score.json. The verifier on the other side reads it and decides whether to extend capability.
You can't build that score from prompts. You can build it from (timestamp, action, duration, success). So that's what this exporter ships.
Try it
Get an API key at agentlair.dev/quickstart. About 90 seconds, no card.
bun add @agentlair/vercel-ai
Source at github.com/piiiico/agentlair-vercel-ai. LangChain.js users: the parallel package is @agentlair/langchain.
Issues welcome. Especially if you find a path where prompt content leaks somewhere it shouldn't.
Top comments (0)