Have you ever stared at your terminal wondering why your multi-agent workflow failed? You know a task errored, but you can't tell which agent was responsible, how much it cost, or what the LLM actually received as input. Sound familiar?
If you're building AI agent systems, you need observability. But adding it shouldn't require rewriting your code or instrumenting every LLM call manually. Let me show you how to add production-grade observability using OpenTelemetry in just a few lines of code.
The Problem: Debugging Multi-Agent Systems is Hard
When you're running a workflow with multiple agents, each making LLM calls, you're dealing with:
- Long execution chains: Task A → Agent 1 → LLM → Task B → Agent 2 → LLM → Task C
- Unclear failure points: Which agent failed? Was it the LLM call, the parsing, or the logic?
- Hidden costs: No visibility into token usage per agent or task
- Nested execution: Agents can spawn sub-tasks, making traces complex
Traditional logging doesn't cut it. You need structured traces that show the entire workflow execution with timing, costs, and context.
What is OpenTelemetry?
OpenTelemetry (often abbreviated as OTL or OTel) is an open standard for observability. Think of it as a universal language for describing what your application is doing. It's supported by virtually every monitoring tool, from open-source solutions like Jaeger to commercial platforms like Datadog, and specialized AI tools like Langfuse and Phoenix.
The key concept: Traces. A trace is a record of a request's journey through your system, broken down into spans (individual operations) that have parent-child relationships. This creates a timeline view of your entire workflow.
Introducing KaibanJS
For this tutorial, we'll use KaibanJS, a TypeScript framework for building multi-agent workflows. KaibanJS makes it easy to define agents with specific roles and orchestrate them through task dependencies.
Here's why it's great for production:
- Task dependencies: Declare what needs to run before what
- Agent specialization: Each agent has a clear role and responsibility
- Automatic orchestration: The framework handles coordination and queuing
- Event-driven: Built-in events make it observable by design
- Type-safe: TypeScript everywhere for fewer runtime surprises
The best part? You can add observability without modifying your workflow code at all.
The Solution: @kaibanjs/opentelemetry
The @kaibanjs/opentelemetry package automatically instruments your KaibanJS workflows and exports traces to any OpenTelemetry-compatible service. It works by subscribing to workflow events, no code changes needed.
Let's see it in action:
Quick Start: Adding Observability to Your Workflow
First, install the package:
npm install @kaibanjs/opentelemetry
Now, let's build a practical example: a content processing workflow that extracts, analyzes, and synthesizes information.
Step 1: Define Your Agents and Tasks
import { Team, Agent, Task } from 'kaibanjs';
// Define specialized agents
const extractor = new Agent({
name: 'ContentExtractor',
role: 'Extract structured data',
goal: 'Parse unstructured content into JSON',
background: 'Expert in NLP and data extraction',
});
const analyzer = new Agent({
name: 'ContentAnalyzer',
role: 'Analyze content',
goal: 'Identify patterns and insights',
background: 'Expert in content analysis',
});
const synthesizer = new Agent({
name: 'ContentSynthesizer',
role: 'Synthesize findings',
goal: 'Create coherent summaries',
background: 'Expert in summarization',
});
// Define tasks with dependencies
const extractTask = new Task({
title: 'Extract Content',
description: 'Extract structured data from: {input}',
expectedOutput: 'JSON with key information',
agent: extractor,
});
const analyzeTask = new Task({
title: 'Analyze Content',
description: 'Analyze the extracted content',
expectedOutput: 'Analysis report',
agent: analyzer,
dependencies: [extractTask], // Runs after extraction
});
const synthesizeTask = new Task({
title: 'Synthesize Findings',
description: 'Create a summary',
expectedOutput: 'Executive summary',
agent: synthesizer,
dependencies: [analyzeTask], // Runs after analysis
});
const team = new Team({
name: 'Content Processing Team',
agents: [extractor, analyzer, synthesizer],
tasks: [extractTask, analyzeTask, synthesizeTask],
});
Step 2: Add Observability
Now, here's where the magic happens. Add just a few lines:
import { enableOpenTelemetry } from '@kaibanjs/opentelemetry';
const config = {
enabled: true,
sampling: {
rate: 1.0, // Sample all traces (use 0.1-0.3 in production)
strategy: 'always',
},
attributes: {
includeSensitiveData: false,
customAttributes: {
'service.name': 'content-processor',
'service.version': '1.0.0',
},
},
exporters: {
console: true, // See traces in your terminal
},
};
enableOpenTelemetry(team, config);
That's it! Your workflow is now fully observable. When you run:
await team.start({ input: 'Your content here...' });
You'll see structured traces in your console showing every task execution, agent thinking phase, LLM call, token usage, and cost.
Step 3: Send Traces to a Monitoring Service
For production, you'll want to send traces to a proper observability platform. Here's how to export to Langfuse (great for LLM observability):
import * as dotenv from 'dotenv';
dotenv.config();
const config = {
enabled: true,
sampling: { rate: 0.1, strategy: 'probabilistic' }, // Sample 10% in production
attributes: {
includeSensitiveData: false,
customAttributes: {
'service.name': 'content-processor',
'service.environment': process.env.NODE_ENV || 'development',
},
},
exporters: {
console: process.env.NODE_ENV === 'development', // Only in dev
otlp: {
endpoint: 'https://cloud.langfuse.com/api/public/otel',
protocol: 'http',
headers: {
Authorization: `Basic ${Buffer.from(
`${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}`
).toString('base64')}`,
},
serviceName: 'content-processor',
},
},
};
enableOpenTelemetry(team, config);
Or send to multiple services simultaneously:
exporters: {
otlp: [
// Langfuse for LLM-specific insights
{
endpoint: 'https://cloud.langfuse.com/api/public/otel',
protocol: 'http',
headers: { /* ... */ },
serviceName: 'content-processor-langfuse',
},
// SigNoz for infrastructure monitoring
{
endpoint: 'https://ingest.us.signoz.cloud:443',
protocol: 'grpc',
headers: { 'signoz-access-token': process.env.SIGNOZ_TOKEN },
serviceName: 'content-processor-signoz',
},
],
}
What You Get: Understanding Your Traces
When your workflow runs, you get hierarchical traces like this:
Task: Extract Content (2.5s)
├── Agent Thinking (1.2s)
│ ├── Model: gpt-4
│ ├── Input tokens: 245
│ ├── Output tokens: 312
│ └── Cost: $0.012
└── Status: DONE
Task: Analyze Content (3.1s)
├── Agent Thinking (2.8s)
│ └── Cost: $0.018
└── Status: DONE
Task: Synthesize Findings (1.9s)
└── Agent Thinking (1.7s)
└── Cost: $0.008
This immediately tells you:
- Which tasks took longest: Analyze Content is your bottleneck
- Cost breakdown: You spent $0.038 total, mostly in analysis
- Token usage: You can optimize the extraction step
- Failure points: If something fails, you see exactly where
LLM-Specific Attributes
The package uses semantic conventions that LLM observability platforms automatically recognize:
{
// Request info
'kaiban.llm.request.model': 'gpt-4',
'kaiban.llm.request.provider': 'openai',
'kaiban.llm.request.input_length': 1524,
// Usage metrics
'kaiban.llm.usage.input_tokens': 245,
'kaiban.llm.usage.output_tokens': 312,
'kaiban.llm.usage.total_tokens': 557,
'kaiban.llm.usage.cost': 0.012,
// Response info
'kaiban.llm.response.duration': 1200,
'kaiban.llm.response.status': 'completed',
}
Platforms like Langfuse and Phoenix automatically display these in specialized LLM views, giving you token trends, cost analysis, and latency monitoring out of the box.
Production Best Practices
1. Use Sampling
Don't trace everything in production, it's expensive:
sampling: {
rate: 0.1, // Sample 10% of workflows
strategy: 'probabilistic',
}
2. Environment-Based Configuration
Use environment variables for secrets:
# .env
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_SECRET_KEY=sk-lf-xxx
SIGNOZ_TOKEN=your-token
exporters: {
otlp: {
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
headers: {
Authorization: `Basic ${Buffer.from(
`${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}`
).toString('base64')}`,
},
serviceName: 'my-service',
},
}
3. Disable Console in Production
exporters: {
console: process.env.NODE_ENV === 'development',
otlp: { /* ... */ },
}
4. Handle Shutdown Gracefully
If you're using the advanced API:
import { createOpenTelemetryIntegration } from '@kaibanjs/opentelemetry';
const integration = createOpenTelemetryIntegration(config);
integration.integrateWithTeam(team);
// Your workflow code
await team.start({ input: 'data' });
// Cleanup when done
await integration.shutdown();
Real-World Use Cases
With this setup, you can now:
- Debug failures: See exactly which agent failed and what it received
- Optimize costs: Identify expensive tasks and agents
- Monitor performance: Track task durations and spot bottlenecks
- Analyze patterns: Understand how agents iterate and refine outputs
Supported Services
The OTLP exporter works with any OpenTelemetry-compatible service:
- Langfuse: LLM observability (HTTP)
- Phoenix: AI observability by Arize (HTTP)
- SigNoz: Full-stack observability (gRPC/HTTP)
- Braintrust: AI experiment tracking (HTTP/gRPC)
- Dash0: Observability platform (HTTP)
- Any OTLP collector: Your own infrastructure
Try It Yourself
Want to see it in action? Check out the package examples:
npm install @kaibanjs/opentelemetry
npm run dev # Runs a basic example with console output
Or explore the full documentation for more advanced use cases.
Wrapping Up
Adding observability to AI agent workflows doesn't have to be painful. With @kaibanjs/opentelemetry, you get production-ready tracing in minutes, not hours. The best part? It's completely non-invasive, you can add it to existing workflows without modifying a single line of your business logic.
Once you have traces flowing to your observability platform, you'll wonder how you ever debugged these systems without them. Trust me.
Questions or feedback? Drop a comment below or check out the package repository for more examples and documentation.
Top comments (0)