DEV Community

Cover image for Add Production-Ready Observability to Your AI Agent Workflows in 5 Minutes
Dariel Vila for KaibanJS

Posted on

Add Production-Ready Observability to Your AI Agent Workflows in 5 Minutes

Have you ever stared at your terminal wondering why your multi-agent workflow failed? You know a task errored, but you can't tell which agent was responsible, how much it cost, or what the LLM actually received as input. Sound familiar?

If you're building AI agent systems, you need observability. But adding it shouldn't require rewriting your code or instrumenting every LLM call manually. Let me show you how to add production-grade observability using OpenTelemetry in just a few lines of code.

The Problem: Debugging Multi-Agent Systems is Hard

When you're running a workflow with multiple agents, each making LLM calls, you're dealing with:

  • Long execution chains: Task A → Agent 1 → LLM → Task B → Agent 2 → LLM → Task C
  • Unclear failure points: Which agent failed? Was it the LLM call, the parsing, or the logic?
  • Hidden costs: No visibility into token usage per agent or task
  • Nested execution: Agents can spawn sub-tasks, making traces complex

Traditional logging doesn't cut it. You need structured traces that show the entire workflow execution with timing, costs, and context.

What is OpenTelemetry?

OpenTelemetry (often abbreviated as OTL or OTel) is an open standard for observability. Think of it as a universal language for describing what your application is doing. It's supported by virtually every monitoring tool, from open-source solutions like Jaeger to commercial platforms like Datadog, and specialized AI tools like Langfuse and Phoenix.

The key concept: Traces. A trace is a record of a request's journey through your system, broken down into spans (individual operations) that have parent-child relationships. This creates a timeline view of your entire workflow.

Introducing KaibanJS

For this tutorial, we'll use KaibanJS, a TypeScript framework for building multi-agent workflows. KaibanJS makes it easy to define agents with specific roles and orchestrate them through task dependencies.

Here's why it's great for production:

  • Task dependencies: Declare what needs to run before what
  • Agent specialization: Each agent has a clear role and responsibility
  • Automatic orchestration: The framework handles coordination and queuing
  • Event-driven: Built-in events make it observable by design
  • Type-safe: TypeScript everywhere for fewer runtime surprises

The best part? You can add observability without modifying your workflow code at all.

The Solution: @kaibanjs/opentelemetry

The @kaibanjs/opentelemetry package automatically instruments your KaibanJS workflows and exports traces to any OpenTelemetry-compatible service. It works by subscribing to workflow events, no code changes needed.

Let's see it in action:

Quick Start: Adding Observability to Your Workflow

First, install the package:

npm install @kaibanjs/opentelemetry
Enter fullscreen mode Exit fullscreen mode

Now, let's build a practical example: a content processing workflow that extracts, analyzes, and synthesizes information.

Step 1: Define Your Agents and Tasks

import { Team, Agent, Task } from 'kaibanjs';

// Define specialized agents
const extractor = new Agent({
  name: 'ContentExtractor',
  role: 'Extract structured data',
  goal: 'Parse unstructured content into JSON',
  background: 'Expert in NLP and data extraction',
});

const analyzer = new Agent({
  name: 'ContentAnalyzer',
  role: 'Analyze content',
  goal: 'Identify patterns and insights',
  background: 'Expert in content analysis',
});

const synthesizer = new Agent({
  name: 'ContentSynthesizer',
  role: 'Synthesize findings',
  goal: 'Create coherent summaries',
  background: 'Expert in summarization',
});

// Define tasks with dependencies
const extractTask = new Task({
  title: 'Extract Content',
  description: 'Extract structured data from: {input}',
  expectedOutput: 'JSON with key information',
  agent: extractor,
});

const analyzeTask = new Task({
  title: 'Analyze Content',
  description: 'Analyze the extracted content',
  expectedOutput: 'Analysis report',
  agent: analyzer,
  dependencies: [extractTask], // Runs after extraction
});

const synthesizeTask = new Task({
  title: 'Synthesize Findings',
  description: 'Create a summary',
  expectedOutput: 'Executive summary',
  agent: synthesizer,
  dependencies: [analyzeTask], // Runs after analysis
});

const team = new Team({
  name: 'Content Processing Team',
  agents: [extractor, analyzer, synthesizer],
  tasks: [extractTask, analyzeTask, synthesizeTask],
});
Enter fullscreen mode Exit fullscreen mode

Step 2: Add Observability

Now, here's where the magic happens. Add just a few lines:

import { enableOpenTelemetry } from '@kaibanjs/opentelemetry';

const config = {
  enabled: true,
  sampling: {
    rate: 1.0, // Sample all traces (use 0.1-0.3 in production)
    strategy: 'always',
  },
  attributes: {
    includeSensitiveData: false,
    customAttributes: {
      'service.name': 'content-processor',
      'service.version': '1.0.0',
    },
  },
  exporters: {
    console: true, // See traces in your terminal
  },
};

enableOpenTelemetry(team, config);
Enter fullscreen mode Exit fullscreen mode

That's it! Your workflow is now fully observable. When you run:

await team.start({ input: 'Your content here...' });
Enter fullscreen mode Exit fullscreen mode

You'll see structured traces in your console showing every task execution, agent thinking phase, LLM call, token usage, and cost.

Step 3: Send Traces to a Monitoring Service

For production, you'll want to send traces to a proper observability platform. Here's how to export to Langfuse (great for LLM observability):

import * as dotenv from 'dotenv';
dotenv.config();

const config = {
  enabled: true,
  sampling: { rate: 0.1, strategy: 'probabilistic' }, // Sample 10% in production
  attributes: {
    includeSensitiveData: false,
    customAttributes: {
      'service.name': 'content-processor',
      'service.environment': process.env.NODE_ENV || 'development',
    },
  },
  exporters: {
    console: process.env.NODE_ENV === 'development', // Only in dev
    otlp: {
      endpoint: 'https://cloud.langfuse.com/api/public/otel',
      protocol: 'http',
      headers: {
        Authorization: `Basic ${Buffer.from(
          `${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}`
        ).toString('base64')}`,
      },
      serviceName: 'content-processor',
    },
  },
};

enableOpenTelemetry(team, config);
Enter fullscreen mode Exit fullscreen mode

Or send to multiple services simultaneously:

exporters: {
  otlp: [
    // Langfuse for LLM-specific insights
    {
      endpoint: 'https://cloud.langfuse.com/api/public/otel',
      protocol: 'http',
      headers: { /* ... */ },
      serviceName: 'content-processor-langfuse',
    },
    // SigNoz for infrastructure monitoring
    {
      endpoint: 'https://ingest.us.signoz.cloud:443',
      protocol: 'grpc',
      headers: { 'signoz-access-token': process.env.SIGNOZ_TOKEN },
      serviceName: 'content-processor-signoz',
    },
  ],
}
Enter fullscreen mode Exit fullscreen mode

What You Get: Understanding Your Traces

When your workflow runs, you get hierarchical traces like this:

Task: Extract Content (2.5s)
├── Agent Thinking (1.2s)
│   ├── Model: gpt-4
│   ├── Input tokens: 245
│   ├── Output tokens: 312
│   └── Cost: $0.012
└── Status: DONE

Task: Analyze Content (3.1s)
├── Agent Thinking (2.8s)
│   └── Cost: $0.018
└── Status: DONE

Task: Synthesize Findings (1.9s)
└── Agent Thinking (1.7s)
    └── Cost: $0.008
Enter fullscreen mode Exit fullscreen mode

This immediately tells you:

  • Which tasks took longest: Analyze Content is your bottleneck
  • Cost breakdown: You spent $0.038 total, mostly in analysis
  • Token usage: You can optimize the extraction step
  • Failure points: If something fails, you see exactly where

LLM-Specific Attributes

The package uses semantic conventions that LLM observability platforms automatically recognize:

{
  // Request info
  'kaiban.llm.request.model': 'gpt-4',
  'kaiban.llm.request.provider': 'openai',
  'kaiban.llm.request.input_length': 1524,

  // Usage metrics
  'kaiban.llm.usage.input_tokens': 245,
  'kaiban.llm.usage.output_tokens': 312,
  'kaiban.llm.usage.total_tokens': 557,
  'kaiban.llm.usage.cost': 0.012,

  // Response info
  'kaiban.llm.response.duration': 1200,
  'kaiban.llm.response.status': 'completed',
}
Enter fullscreen mode Exit fullscreen mode

Platforms like Langfuse and Phoenix automatically display these in specialized LLM views, giving you token trends, cost analysis, and latency monitoring out of the box.

Production Best Practices

1. Use Sampling

Don't trace everything in production, it's expensive:

sampling: {
  rate: 0.1, // Sample 10% of workflows
  strategy: 'probabilistic',
}
Enter fullscreen mode Exit fullscreen mode

2. Environment-Based Configuration

Use environment variables for secrets:

# .env
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_SECRET_KEY=sk-lf-xxx
SIGNOZ_TOKEN=your-token
Enter fullscreen mode Exit fullscreen mode
exporters: {
  otlp: {
    endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
    headers: {
      Authorization: `Basic ${Buffer.from(
        `${process.env.LANGFUSE_PUBLIC_KEY}:${process.env.LANGFUSE_SECRET_KEY}`
      ).toString('base64')}`,
    },
    serviceName: 'my-service',
  },
}
Enter fullscreen mode Exit fullscreen mode

3. Disable Console in Production

exporters: {
  console: process.env.NODE_ENV === 'development',
  otlp: { /* ... */ },
}
Enter fullscreen mode Exit fullscreen mode

4. Handle Shutdown Gracefully

If you're using the advanced API:

import { createOpenTelemetryIntegration } from '@kaibanjs/opentelemetry';

const integration = createOpenTelemetryIntegration(config);
integration.integrateWithTeam(team);

// Your workflow code
await team.start({ input: 'data' });

// Cleanup when done
await integration.shutdown();
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

With this setup, you can now:

  1. Debug failures: See exactly which agent failed and what it received
  2. Optimize costs: Identify expensive tasks and agents
  3. Monitor performance: Track task durations and spot bottlenecks
  4. Analyze patterns: Understand how agents iterate and refine outputs

Supported Services

The OTLP exporter works with any OpenTelemetry-compatible service:

  • Langfuse: LLM observability (HTTP)
  • Phoenix: AI observability by Arize (HTTP)
  • SigNoz: Full-stack observability (gRPC/HTTP)
  • Braintrust: AI experiment tracking (HTTP/gRPC)
  • Dash0: Observability platform (HTTP)
  • Any OTLP collector: Your own infrastructure

Try It Yourself

Want to see it in action? Check out the package examples:

npm install @kaibanjs/opentelemetry
npm run dev # Runs a basic example with console output
Enter fullscreen mode Exit fullscreen mode

Or explore the full documentation for more advanced use cases.

Wrapping Up

Adding observability to AI agent workflows doesn't have to be painful. With @kaibanjs/opentelemetry, you get production-ready tracing in minutes, not hours. The best part? It's completely non-invasive, you can add it to existing workflows without modifying a single line of your business logic.

Once you have traces flowing to your observability platform, you'll wonder how you ever debugged these systems without them. Trust me.


Questions or feedback? Drop a comment below or check out the package repository for more examples and documentation.

Top comments (0)