Alexandr Bandurchin for Uptrace

Posted on Oct 15

LangChain Observability: Monitoring Guide for Production Apps

#monitoring #llm #ai #performance

LangChain applications fail differently than traditional web apps. A single user request can trigger 15+ LLM calls, cost $5 in tokens, and fail silently without throwing errors. One team discovered a $12,000 OpenAI bill caused by a recursive chain with no monitoring.

This guide shows how to implement observability for LangChain applications, giving you complete visibility into performance, costs, and errors before they impact your users or budget.

Why Monitor LangChain Apps?

Traditional web application monitoring focuses on HTTP requests, database queries, and server resources. But LangChain applications introduce entirely new failure modes and cost structures that standard monitoring tools miss.

Unique Monitoring Challenges

Cost Explosion Risk

Unlike regular web apps where compute costs are predictable, LangChain applications have variable costs tied directly to usage. Each LLM call costs between $0.01-$0.06 per 1,000 tokens, and costs can spiral quickly:

A recursive chain can generate thousands of tokens in seconds
Long conversation contexts increase token usage exponentially
Failed requests still consume tokens before timing out

Complex Execution Paths

A single user request in a LangChain app might:

Call multiple LLM models in sequence
Query vector databases for context
Execute external API calls through tools
Chain together multiple reasoning steps

When something breaks, you need to trace the failure across all these components to understand what went wrong.

Silent Failure Patterns

LangChain applications often fail gracefully, returning partial results without throwing exceptions:

Vector searches return empty results when indexes are corrupted
LLM calls succeed but return unhelpful responses
Tool executions timeout but chains continue with missing data

Without proper monitoring, these silent failures go unnoticed until users complain.

Variable Performance

LLM response times vary dramatically based on:

Model load and availability
Prompt complexity and length
Context window utilization
Rate limiting and queuing

Standard application performance monitoring doesn't account for these AI-specific factors.

OpenTelemetry as a Solution

To properly monitor LangChain applications, we'll use OpenTelemetry - an open-source framework that captures detailed traces, metrics, and logs from your application. Think of it as a sophisticated data collection system that records everything happening in your LangChain chains.

Here's how the monitoring works:

OpenTelemetry instruments your code - capturing every chain execution, LLM call, and tool usage
Data gets exported to an observability platform - like Uptrace, Jaeger, or DataDog
You get dashboards and alerts - showing performance, costs, and errors in real-time

The beauty of this solution is that you can see exactly what's happening inside your LangChain application, from the initial user request to the final response.

Essential Metrics for LangChain App

Let's understand what metrics actually matter for LangChain applications. These aren't your typical web app metrics - they're specifically designed to catch the unique failure modes of AI applications.

Core Performance Metrics

These metrics help you understand how your chains are performing and where bottlenecks occur:

Metric	Purpose	Alert Threshold
Chain Execution Time	Detect slow operations that frustrate users	>10 seconds
Token Usage per Request	Prevent runaway costs	>5000 tokens
Error Rate by Chain	Identify unreliable components	>5%
Model Availability	Monitor LLM service health	<95% success

Cost Tracking Metrics

Since every LLM call costs money, you need specific metrics to control spending:

# Key cost metrics that prevent bill shock
cost_metrics = {
    'tokens_per_hour': 'Identify usage spikes before they impact billing',
    'cost_per_user': 'Find users generating excessive costs',
    'tokens_by_model': 'Optimize expensive vs cheap model usage',
    'failed_token_spend': 'Track money wasted on failed requests'
}

These metrics let you set up alerts like "notify me if hourly token usage exceeds $100" or "alert if any user generates more than $50 in costs per session."

Quick Start: 5-Minute Setup

Now let's implement this monitoring system. We'll start with a basic setup that captures traces and exports them to Uptrace (an OpenTelemetry-native observability platform). This same approach works with any OTLP-compatible backend like Jaeger, Grafana, or commercial solutions.

The setup process has three steps: install the monitoring libraries, configure where to send the data, and add monitoring to your LangChain code.

Step 1: Install Dependencies

First, install the OpenTelemetry libraries that will capture monitoring data from your application:

pip install opentelemetry-api opentelemetry-sdk
pip install opentelemetry-exporter-otlp
pip install langchain langchain-openai

These packages provide the core instrumentation (opentelemetry-api), the data processing engine (opentelemetry-sdk), and the exporter that sends data to your monitoring platform (opentelemetry-exporter-otlp).

Step 2: Basic OpenTelemetry Configuration

Next, configure OpenTelemetry to send monitoring data to your observability platform. This code sets up the "plumbing" that captures traces and exports them:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
import os

def setup_tracing():
    trace.set_tracer_provider(TracerProvider())

    otlp_exporter = OTLPSpanExporter(
        endpoint="https://api.uptrace.dev:4318",  # HTTP
        headers={"uptrace-dsn": os.getenv("UPTRACE_DSN")}
    )

    span_processor = BatchSpanProcessor(otlp_exporter)
    trace.get_tracer_provider().add_span_processor(span_processor)

setup_tracing()

The UPTRACE_DSN environment variable contains your project credentials. Here is a detailed information about UPTRACE_DSN. You can get this from your Uptrace dashboard after creating a free account. Other platforms have similar setup - just change the endpoint and headers.

Step 3: Custom Callback Handler

Now comes the LangChain-specific part. We need to create a callback handler that captures detailed information about every chain execution, LLM call, and error:

from langchain_core.callbacks.base import BaseCallbackHandler
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import time

class ObservabilityCallback(BaseCallbackHandler):
    """
    Custom callback that captures comprehensive monitoring data
    from LangChain operations and exports it via OpenTelemetry
    """
    def __init__(self):
        self.tracer = trace.get_tracer(__name__)
        self.spans = {}  # Track active spans by run_id

    def on_chain_start(self, serialized, inputs, **kwargs):
        """Called when any chain starts executing"""
        run_id = kwargs.get('run_id')
        span = self.tracer.start_span(f"chain_{serialized.get('name', 'unknown')}")

        # Add context attributes that help with debugging
        span.set_attribute("chain.type", serialized.get("name"))
        span.set_attribute("chain.inputs", str(inputs)[:500])  # Truncate long inputs
        span.set_attribute("chain.run_id", str(run_id))

        # Store span data for later use
        self.spans[run_id] = {
            'span': span,
            'start_time': time.time(),
            'tokens': 0
        }

    def on_llm_end(self, response, **kwargs):
        """Called when LLM completes - captures token usage and costs"""
        run_id = kwargs.get('run_id')
        if run_id in self.spans:
            # Extract token usage from the LLM response
            if hasattr(response, 'llm_output') and response.llm_output:
                usage = response.llm_output.get('token_usage', {})
                tokens = usage.get('total_tokens', 0)
                self.spans[run_id]['tokens'] += tokens

                # Calculate estimated cost (using GPT-4 pricing as example)
                cost = tokens * 0.00003  # $0.03 per 1K tokens
                self.spans[run_id]['span'].set_attribute("llm.tokens", tokens)
                self.spans[run_id]['span'].set_attribute("llm.estimated_cost", cost)

    def on_chain_end(self, outputs, **kwargs):
        """Called when chain completes successfully"""
        run_id = kwargs.get('run_id')
        if run_id in self.spans:
            span_data = self.spans[run_id]
            span = span_data['span']

            # Calculate total execution time
            duration = time.time() - span_data['start_time']
            span.set_attribute("chain.duration_seconds", duration)
            span.set_attribute("chain.total_tokens", span_data['tokens'])

            # Add output preview for debugging
            output_preview = str(outputs)[:200] if outputs else "No output"
            span.set_attribute("chain.output_preview", output_preview)

            span.end()
            del self.spans[run_id]

    def on_chain_error(self, error, **kwargs):
        """Called when chain fails - captures error details"""
        run_id = kwargs.get('run_id')
        if run_id in self.spans:
            span = self.spans[run_id]['span']
            span.record_exception(error)  # Capture full stack trace
            span.set_status(Status(StatusCode.ERROR, str(error)))
            span.end()
            del self.spans[run_id]

This callback handler captures everything you need to monitor: execution time, token usage, costs, errors, and output quality. The data automatically flows to your observability platform where you can create dashboards and alerts.

Implementation with FastAPI

For production applications, you'll typically want to wrap your LangChain logic in a web API. Here's a complete example that shows how to integrate monitoring into a FastAPI application that serves LangChain-powered endpoints.

For more details on FastAPI observability setup, see our FastAPI OpenTelemetry integration guide.

Complete Application Setup

This example creates a content generation API with full observability. Every request is traced from the HTTP layer down to individual LLM calls:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import uvicorn

app = FastAPI(title="LangChain Observability Demo")

# Initialize observability
setup_tracing()
callback_handler = ObservabilityCallback()

# Setup LangChain components
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
prompt = ChatPromptTemplate.from_template(
    "Generate a {content_type} about {topic}. Keep it {length}."
)
chain = prompt | llm | StrOutputParser()

class GenerationRequest(BaseModel):
    topic: str
    content_type: str = "summary"
    length: str = "concise"

@app.post("/generate")
async def generate_content(request: GenerationRequest):
    try:
        with trace.get_tracer(__name__).start_as_current_span("api_request") as span:
            span.set_attribute("api.endpoint", "/generate")
            span.set_attribute("request.topic", request.topic)

            result = chain.invoke(
                {
                    "topic": request.topic,
                    "content_type": request.content_type,
                    "length": request.length
                },
                config={"callbacks": [callback_handler]}
            )

            return {
                "content": result,
                "status": "success"
            }

    except Exception as e:
        span.set_status(Status(StatusCode.ERROR, str(e)))
        raise HTTPException(status_code=500, detail=f"Generation failed: {str(e)}")

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

When you run this application and make requests to /generate, you'll see detailed traces in your observability platform showing:

HTTP request details (endpoint, parameters)
Chain execution steps
LLM calls with token usage and costs
Total execution time
Any errors that occurred

Advanced Monitoring Patterns

As your LangChain application grows more complex, you'll need more sophisticated monitoring. Here are patterns for handling multi-step chains, tool usage, and complex agent workflows.

Multi-Chain Tracing

Complex LangChain applications often involve multiple chains calling each other. This enhanced callback handler tracks the relationships between parent and child chains:

from opentelemetry.trace import get_current_span
import time

class AdvancedObservabilityCallback(ObservabilityCallback):
    def __init__(self):
        super().__init__()
        self.chain_hierarchy = {}

    def on_chain_start(self, serialized, inputs, **kwargs):
        super().on_chain_start(serialized, inputs, **kwargs)

        run_id = kwargs.get('run_id')
        parent_run_id = kwargs.get('parent_run_id')

        if parent_run_id:
            self.chain_hierarchy[run_id] = parent_run_id
            if run_id in self.spans:
                self.spans[run_id]['span'].set_attribute(
                    "chain.parent_id", str(parent_run_id)
                )

    def on_tool_start(self, serialized, input_str, **kwargs):
        run_id = kwargs.get('run_id')
        span = self.tracer.start_span(f"tool_{serialized.get('name', 'unknown')}")

        span.set_attribute("tool.name", serialized.get("name"))
        span.set_attribute("tool.input", input_str[:300])

        # Secure tools
        tool_key = f"tool_{run_id}"
        self.spans[tool_key] = {
            'span': span,
            'start_time': time.time(),
            'type': 'tool'
        }

    def on_tool_end(self, output, **kwargs):
        run_id = kwargs.get('run_id')
        tool_key = f"tool_{run_id}"

        if tool_key in self.spans:
            span_data = self.spans[tool_key]
            span = span_data['span']
            duration = time.time() - span_data['start_time']

            span.set_attribute("tool.duration_seconds", duration)
            span.set_attribute("tool.output", str(output)[:200])
            span.end()
            del self.spans[tool_key]

This enhanced callback gives you visibility into agent workflows where chains call tools, which might call other chains, creating complex execution trees.

Cost Monitoring with Metrics

While traces show individual executions, metrics track trends over time. Here's how to add comprehensive cost and performance metrics to your monitoring setup:

import os
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter

def setup_metrics():
    """
    Configure metrics collection and export.
    Metrics complement traces by showing trends and aggregate data.
    """
    # Configure metrics exporter
    metric_exporter = OTLPMetricExporter(
        endpoint=os.getenv("OTLP_ENDPOINT", "https://api.uptrace.dev:4318"),
        headers={"uptrace-dsn": os.getenv("UPTRACE_DSN")}
    )

    # Export metrics every 30 seconds
    metric_reader = PeriodicExportingMetricReader(
        metric_exporter, export_interval_millis=30000
    )

    metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))

# Initialize metrics alongside tracing
setup_metrics()
meter = metrics.get_meter(__name__)

# Define the metrics you want to track
token_counter = meter.create_counter(
    "langchain_tokens_total",
    description="Total tokens consumed by LangChain operations"
)

cost_counter = meter.create_counter(
    "langchain_cost_usd_total",
    description="Total estimated cost in USD"
)

request_duration = meter.create_histogram(
    "langchain_request_duration_seconds",
    description="Request processing time distribution"
)

class MetricsCallback(AdvancedObservabilityCallback):
    """
    Callback that emits both traces and metrics for comprehensive monitoring
    """
    def on_llm_end(self, response, **kwargs):
        super().on_llm_end(response, **kwargs)

        if hasattr(response, 'llm_output') and response.llm_output:
            usage = response.llm_output.get('token_usage', {})
            tokens = usage.get('total_tokens', 0)
            model = kwargs.get('invocation_params', {}).get('model', 'unknown')

            # Record metrics with labels for filtering
            token_counter.add(tokens, {"model": model})

            # Calculate and record cost
            cost_per_token = self._get_model_cost(model)
            estimated_cost = tokens * cost_per_token
            cost_counter.add(estimated_cost, {"model": model})

    def _get_model_cost(self, model):
        """
        Get cost per token for different models.
        Update these values based on current pricing.
        """
        costs = {
            'gpt-4o': 0.000005,          # $5 per 1M tokens
            'gpt-4o-mini': 0.00000015,   # $0.15 per 1M tokens
            'gpt-3.5-turbo': 0.000001,   # $1 per 1M tokens
        }
        return costs.get(model, 0.000003)  # Default cost

With these metrics, you can create dashboards showing token usage over time, cost trends by model, and performance percentiles across your application.

Troubleshooting

When problems arise in production LangChain applications, your observability data becomes crucial for quick diagnosis. Here are the most common issues and how to identify them using your monitoring setup.

Performance Problems

Performance issues in LangChain apps often stem from specific components that aren't obvious without detailed tracing:

Symptom	Likely Cause	How to Diagnose	Solution
Requests timing out	Large prompts overwhelming models	Check span attributes for prompt length and model response times	• Use faster models (gpt-4o-mini) • Implement prompt truncation • Add request timeouts
High latency spikes	Vector store queries taking too long	Look for spans labeled "vector_search" with >2s duration	• Optimize embedding dimensions • Add query result caching • Monitor vector DB performance
Memory usage growing	Chain state accumulating between requests	Monitor memory metrics alongside request volume	• Clear conversation memory • Implement session cleanup • Add memory usage alerts

Cost Control Issues

Cost surprises are common in production LangChain apps. Your monitoring data helps identify and prevent them:

Symptom	Root Cause	How to Detect	Fix
Unexpected high bills	Recursive chains or unusually long contexts	Alert when `langchain_cost_usd_total` spikes or `langchain_tokens_total` exceeds thresholds	• Set token limits per request • Monitor cost metrics • Add circuit breakers
Token waste from failures	Failed requests still consuming tokens	Compare successful vs failed spans, check for high token usage in error spans	• Track failed vs successful requests • Implement retry logic • Use cheaper models for retries
Inefficient model selection	Using expensive models for simple tasks	Analyze cost metrics by model type and request complexity	• Route simple queries to cheaper models • Implement model selection logic • Monitor cost efficiency ratios

Error Detection Patterns

To catch and categorize different types of errors, extend your callback handler with error classification:

class ErrorTrackingCallback(MetricsCallback):
    """
    Enhanced callback that categorizes and tracks different error types
    for better debugging and alerting
    """
    def __init__(self):
        super().__init__()
        self.error_counter = meter.create_counter(
            "langchain_errors_total",
            description="Total errors by category and chain type"
        )

    def on_chain_error(self, error, **kwargs):
        super().on_chain_error(error, **kwargs)

        # Categorize error for better monitoring
        error_type = self._categorize_error(error)
        chain_name = kwargs.get('serialized', {}).get('name', 'unknown')

        # Record error metric with useful labels
        self.error_counter.add(1, {
            "error_type": error_type,
            "chain_name": chain_name
        })

    def _categorize_error(self, error):
        """
        Categorize errors into actionable types for alerting and debugging
        """
        error_str = str(error).lower()

        if 'timeout' in error_str:
            return 'timeout'
        elif 'rate limit' in error_str:
            return 'rate_limit'
        elif 'token' in error_str and 'limit' in error_str:
            return 'token_limit'
        elif 'authentication' in error_str or 'api key' in error_str:
            return 'auth_error'
        elif 'connection' in error_str:
            return 'connection_error'
        else:
            return 'unknown'

This error categorization helps you set up specific alerts for different problem types and understand which issues are most common in your application.

Production Deployment

Moving from development to production requires additional considerations around containerization, environment management, and scaling.

Docker Configuration

Here's a production-ready Docker setup that includes proper health checks and resource limits:

FROM python:3.11-slim

WORKDIR /app

# Install system dependencies including curl for health checks
RUN apt-get update && apt-get install -y \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create non-root user with home directory
RUN useradd --create-home --shell /bin/bash app && \
    chown -R app:app /app
USER app

# Set environment variables
ENV ENVIRONMENT=production
ENV PYTHONPATH=/app
ENV PYTHONUNBUFFERED=1

# Health check
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Start application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Environment Variables

Organize your configuration using environment variables that work across development and production:

# OpenTelemetry Configuration
export UPTRACE_DSN="https://your-token@api.uptrace.dev/project-id"
export OTEL_SERVICE_NAME="langchain-app"
export OTEL_SERVICE_VERSION="1.0.0"
export OTEL_ENVIRONMENT="production"
export OTEL_RESOURCE_ATTRIBUTES="service.name=langchain-app,service.version=1.0.0"

# Alternative: Using separate OTLP endpoint (for other platforms)
# export OTLP_ENDPOINT="https://api.uptrace.dev:4317"
# export OTLP_HEADERS="uptrace-dsn=your-dsn-here"

# LangChain Configuration
export OPENAI_API_KEY="sk-your-openai-key-here"
export LANGCHAIN_TRACING_V2="false"  # Disabled - using custom OpenTelemetry callbacks
export LANGCHAIN_API_KEY=""          # Not needed when using custom tracing

# Application Settings
export ENVIRONMENT="production"
export LOG_LEVEL="INFO"
export MAX_TOKENS_PER_REQUEST="10000"
export REQUEST_TIMEOUT_SECONDS="30"

# Optional: Cost Control
export DAILY_COST_LIMIT_USD="500"
export HOURLY_TOKEN_LIMIT="100000"

# Security (for production)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export RATE_LIMIT_REQUESTS_PER_MINUTE="100"

Kubernetes Deployment

For scalable production deployment, here's a Kubernetes configuration with proper resource management and observability:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-app
  labels:
    app: langchain-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langchain-app
  template:
    metadata:
      labels:
        app: langchain-app
      annotations:
        # Annotations for observability platform integration
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        image: langchain-app:latest
        ports:
        - containerPort: 8000
          name: http
        env:
        - name: UPTRACE_DSN
          valueFrom:
            secretKeyRef:
              name: observability-secrets
              key: uptrace-dsn
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: openai-key
        - name: OTEL_SERVICE_NAME
          value: "langchain-app"
        - name: ENVIRONMENT
          value: "production"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

Key Monitoring Dashboards

Once your observability data is flowing, you'll need dashboards to visualize and understand your application's behavior. Here's what to monitor and why each metric matters.

Essential Metrics Dashboard

Your primary dashboard should focus on the metrics that indicate overall application health and cost control:

Request Volume & Performance

These metrics help you understand load patterns and identify performance regressions:

Requests per minute - Shows traffic patterns and helps with capacity planning
Average response time - Indicates overall user experience
95th percentile latency - Catches performance issues affecting some users
Error rate percentage - Critical for reliability monitoring

Cost Tracking

Since LangChain applications have variable costs, these metrics prevent budget surprises:

Tokens consumed per hour - Early warning for cost spikes
Cost per request - Efficiency metric for optimization
Daily/monthly spend trends - Budget tracking and forecasting
Cost by model type - Optimization opportunities

Chain Performance

LangChain-specific metrics that help optimize your AI workflows:

Chain execution time by type - Identify slow chains for optimization
Success rate by chain - Reliability monitoring for individual components
Token usage distribution - Understand resource consumption patterns
Failed chain analysis - Error pattern identification

Alerting Rules

Set up proactive alerts to catch issues before they impact users or budgets:

# Example alerting rules (syntax varies by platform)
# Adjust thresholds based on your application scale and budget

cost_alerts:
  high_token_usage:
    description: "Alert when token consumption exceeds expected levels"
    condition: "langchain_tokens_total > 50000"  # Adjust per your usage
    window: "1h"
    severity: "warning"
    action: "slack_notification"

  daily_cost_breach:
    description: "Critical alert for budget protection"
    condition: "sum_over_time(langchain_cost_usd_total[24h]) > 200"  # $200/day
    window: "24h"
    severity: "critical"
    action: "email_alert"

performance_alerts:
  error_rate_spike:
    description: "High error rate indicates system issues"
    condition: "rate(langchain_errors_total[5m]) > 0.05"  # 5% error rate
    window: "5m"
    severity: "critical"
    action: "pagerduty"

  high_latency:
    description: "Slow responses affecting user experience"
    condition: "histogram_quantile(0.95, langchain_request_duration_seconds) > 15"
    window: "10m"
    severity: "warning"
    action: "slack_notification"

reliability_alerts:
  chain_failures:
    description: "Specific chain type failing frequently"
    condition: "increase(langchain_errors_total{chain_name=\"summarization\"}[30m]) > 10"
    window: "30m"
    severity: "warning"
    action: "email_alert"

  model_unavailability:
    description: "LLM service experiencing issues"
    condition: "rate(langchain_errors_total{error_type=\"connection_error\"}[15m]) > 0.1"
    window: "15m"
    severity: "critical"
    action: "pagerduty"

Migration from Other Solutions

If you're currently using other monitoring solutions, here's how to migrate to the OpenTelemetry-based solution.

From LangSmith

LangSmith provides built-in tracing for LangChain, but OpenTelemetry offers more flexibility and control. Replace LangSmith environment variables with OpenTelemetry setup::

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Disable LangSmith
os.environ["LANGCHAIN_TRACING_V2"] = "false"

# Setup OpenTelemetry
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(
    endpoint="https://api.uptrace.dev:4317",
    headers={"uptrace-dsn": os.getenv("UPTRACE_DSN")}
)
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# Use with chains
callback_handler = ObservabilityCallback()
result = chain.invoke(input_data, config={"callbacks": [callback_handler]})

Benefits of OpenTelemetry Setup

Here's why you might want to migrate from LangSmith or other solutions:

Feature	LangSmith	OpenTelemetry + Uptrace
Setup Complexity	Simple environment variables	Moderate - custom callbacks
Customization	Limited to built-in features	Full control over data collection
Cost	Usage-based pricing	Fixed pricing, predictable costs
Integration	LangChain applications only	Works with any application framework
Self-hosting	Not available	Available for data privacy
Metrics & Logs	Traces only	Unified traces, metrics, and logs
Vendor Lock-in	Tied to LangChain ecosystem	Open standard, portable

The OpenTelemetry approach gives you more control and flexibility, especially as your application grows beyond simple LangChain chains.

Conclusion

Implementing comprehensive observability for LangChain applications requires monitoring three critical areas: performance, costs, and errors. Unlike traditional web applications, LangChain apps have unique failure modes and cost structures that demand specialized monitoring approaches.

The key to success is starting simple with basic tracing, then expanding to include cost monitoring and advanced metrics as your application scales. Remember that observability overhead should be minimal—typically less than 2% of your application's total latency.

The patterns and code examples in this guide provide a solid foundation for monitoring LangChain applications in both development and production environments. With proper observability in place, you'll catch cost spikes before they impact your budget, identify performance issues before they frustrate users, and debug complex chain failures with confidence.

Start with the basic setup, customize the monitoring to fit your specific use cases, and gradually expand your observability coverage as your LangChain application grows in complexity and scale.

You may also be interested in:

DEV Community