LangChain applications fail differently than traditional web apps. A single user request can trigger 15+ LLM calls, cost $5 in tokens, and fail silently without throwing errors. One team discovered a $12,000 OpenAI bill caused by a recursive chain with no monitoring.
This guide shows how to implement observability for LangChain applications, giving you complete visibility into performance, costs, and errors before they impact your users or budget.
Why Monitor LangChain Apps?
Traditional web application monitoring focuses on HTTP requests, database queries, and server resources. But LangChain applications introduce entirely new failure modes and cost structures that standard monitoring tools miss.
Unique Monitoring Challenges
Cost Explosion Risk
Unlike regular web apps where compute costs are predictable, LangChain applications have variable costs tied directly to usage. Each LLM call costs between $0.01-$0.06 per 1,000 tokens, and costs can spiral quickly:
- A recursive chain can generate thousands of tokens in seconds
- Long conversation contexts increase token usage exponentially
- Failed requests still consume tokens before timing out
Complex Execution Paths
A single user request in a LangChain app might:
- Call multiple LLM models in sequence
- Query vector databases for context
- Execute external API calls through tools
- Chain together multiple reasoning steps
When something breaks, you need to trace the failure across all these components to understand what went wrong.
Silent Failure Patterns
LangChain applications often fail gracefully, returning partial results without throwing exceptions:
- Vector searches return empty results when indexes are corrupted
- LLM calls succeed but return unhelpful responses
- Tool executions timeout but chains continue with missing data
Without proper monitoring, these silent failures go unnoticed until users complain.
Variable Performance
LLM response times vary dramatically based on:
- Model load and availability
- Prompt complexity and length
- Context window utilization
- Rate limiting and queuing
Standard application performance monitoring doesn't account for these AI-specific factors.
OpenTelemetry as a Solution
To properly monitor LangChain applications, we'll use OpenTelemetry - an open-source framework that captures detailed traces, metrics, and logs from your application. Think of it as a sophisticated data collection system that records everything happening in your LangChain chains.
Here's how the monitoring works:
- OpenTelemetry instruments your code - capturing every chain execution, LLM call, and tool usage
- Data gets exported to an observability platform - like Uptrace, Jaeger, or DataDog
- You get dashboards and alerts - showing performance, costs, and errors in real-time
The beauty of this solution is that you can see exactly what's happening inside your LangChain application, from the initial user request to the final response.
Essential Metrics for LangChain App
Let's understand what metrics actually matter for LangChain applications. These aren't your typical web app metrics - they're specifically designed to catch the unique failure modes of AI applications.
Core Performance Metrics
These metrics help you understand how your chains are performing and where bottlenecks occur:
Metric | Purpose | Alert Threshold |
---|---|---|
Chain Execution Time | Detect slow operations that frustrate users | >10 seconds |
Token Usage per Request | Prevent runaway costs | >5000 tokens |
Error Rate by Chain | Identify unreliable components | >5% |
Model Availability | Monitor LLM service health | <95% success |
Cost Tracking Metrics
Since every LLM call costs money, you need specific metrics to control spending:
# Key cost metrics that prevent bill shock
cost_metrics = {
'tokens_per_hour': 'Identify usage spikes before they impact billing',
'cost_per_user': 'Find users generating excessive costs',
'tokens_by_model': 'Optimize expensive vs cheap model usage',
'failed_token_spend': 'Track money wasted on failed requests'
}
These metrics let you set up alerts like "notify me if hourly token usage exceeds $100" or "alert if any user generates more than $50 in costs per session."
Quick Start: 5-Minute Setup
Now let's implement this monitoring system. We'll start with a basic setup that captures traces and exports them to Uptrace (an OpenTelemetry-native observability platform). This same approach works with any OTLP-compatible backend like Jaeger, Grafana, or commercial solutions.
The setup process has three steps: install the monitoring libraries, configure where to send the data, and add monitoring to your LangChain code.
Step 1: Install Dependencies
First, install the OpenTelemetry libraries that will capture monitoring data from your application:
pip install opentelemetry-api opentelemetry-sdk
pip install opentelemetry-exporter-otlp
pip install langchain langchain-openai
These packages provide the core instrumentation (opentelemetry-api), the data processing engine (opentelemetry-sdk), and the exporter that sends data to your monitoring platform (opentelemetry-exporter-otlp).
Step 2: Basic OpenTelemetry Configuration
Next, configure OpenTelemetry to send monitoring data to your observability platform. This code sets up the "plumbing" that captures traces and exports them:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
import os
def setup_tracing():
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(
endpoint="https://api.uptrace.dev:4318", # HTTP
headers={"uptrace-dsn": os.getenv("UPTRACE_DSN")}
)
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
setup_tracing()
The UPTRACE_DSN
environment variable contains your project credentials. Here is a detailed information about UPTRACE_DSN. You can get this from your Uptrace dashboard after creating a free account. Other platforms have similar setup - just change the endpoint and headers.
Step 3: Custom Callback Handler
Now comes the LangChain-specific part. We need to create a callback handler that captures detailed information about every chain execution, LLM call, and error:
from langchain_core.callbacks.base import BaseCallbackHandler
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import time
class ObservabilityCallback(BaseCallbackHandler):
"""
Custom callback that captures comprehensive monitoring data
from LangChain operations and exports it via OpenTelemetry
"""
def __init__(self):
self.tracer = trace.get_tracer(__name__)
self.spans = {} # Track active spans by run_id
def on_chain_start(self, serialized, inputs, **kwargs):
"""Called when any chain starts executing"""
run_id = kwargs.get('run_id')
span = self.tracer.start_span(f"chain_{serialized.get('name', 'unknown')}")
# Add context attributes that help with debugging
span.set_attribute("chain.type", serialized.get("name"))
span.set_attribute("chain.inputs", str(inputs)[:500]) # Truncate long inputs
span.set_attribute("chain.run_id", str(run_id))
# Store span data for later use
self.spans[run_id] = {
'span': span,
'start_time': time.time(),
'tokens': 0
}
def on_llm_end(self, response, **kwargs):
"""Called when LLM completes - captures token usage and costs"""
run_id = kwargs.get('run_id')
if run_id in self.spans:
# Extract token usage from the LLM response
if hasattr(response, 'llm_output') and response.llm_output:
usage = response.llm_output.get('token_usage', {})
tokens = usage.get('total_tokens', 0)
self.spans[run_id]['tokens'] += tokens
# Calculate estimated cost (using GPT-4 pricing as example)
cost = tokens * 0.00003 # $0.03 per 1K tokens
self.spans[run_id]['span'].set_attribute("llm.tokens", tokens)
self.spans[run_id]['span'].set_attribute("llm.estimated_cost", cost)
def on_chain_end(self, outputs, **kwargs):
"""Called when chain completes successfully"""
run_id = kwargs.get('run_id')
if run_id in self.spans:
span_data = self.spans[run_id]
span = span_data['span']
# Calculate total execution time
duration = time.time() - span_data['start_time']
span.set_attribute("chain.duration_seconds", duration)
span.set_attribute("chain.total_tokens", span_data['tokens'])
# Add output preview for debugging
output_preview = str(outputs)[:200] if outputs else "No output"
span.set_attribute("chain.output_preview", output_preview)
span.end()
del self.spans[run_id]
def on_chain_error(self, error, **kwargs):
"""Called when chain fails - captures error details"""
run_id = kwargs.get('run_id')
if run_id in self.spans:
span = self.spans[run_id]['span']
span.record_exception(error) # Capture full stack trace
span.set_status(Status(StatusCode.ERROR, str(error)))
span.end()
del self.spans[run_id]
This callback handler captures everything you need to monitor: execution time, token usage, costs, errors, and output quality. The data automatically flows to your observability platform where you can create dashboards and alerts.
Implementation with FastAPI
For production applications, you'll typically want to wrap your LangChain logic in a web API. Here's a complete example that shows how to integrate monitoring into a FastAPI application that serves LangChain-powered endpoints.
For more details on FastAPI observability setup, see our FastAPI OpenTelemetry integration guide.
Complete Application Setup
This example creates a content generation API with full observability. Every request is traced from the HTTP layer down to individual LLM calls:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import uvicorn
app = FastAPI(title="LangChain Observability Demo")
# Initialize observability
setup_tracing()
callback_handler = ObservabilityCallback()
# Setup LangChain components
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
prompt = ChatPromptTemplate.from_template(
"Generate a {content_type} about {topic}. Keep it {length}."
)
chain = prompt | llm | StrOutputParser()
class GenerationRequest(BaseModel):
topic: str
content_type: str = "summary"
length: str = "concise"
@app.post("/generate")
async def generate_content(request: GenerationRequest):
try:
with trace.get_tracer(__name__).start_as_current_span("api_request") as span:
span.set_attribute("api.endpoint", "/generate")
span.set_attribute("request.topic", request.topic)
result = chain.invoke(
{
"topic": request.topic,
"content_type": request.content_type,
"length": request.length
},
config={"callbacks": [callback_handler]}
)
return {
"content": result,
"status": "success"
}
except Exception as e:
span.set_status(Status(StatusCode.ERROR, str(e)))
raise HTTPException(status_code=500, detail=f"Generation failed: {str(e)}")
@app.get("/health")
async def health_check():
return {"status": "healthy"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
When you run this application and make requests to /generate
, you'll see detailed traces in your observability platform showing:
- HTTP request details (endpoint, parameters)
- Chain execution steps
- LLM calls with token usage and costs
- Total execution time
- Any errors that occurred
Advanced Monitoring Patterns
As your LangChain application grows more complex, you'll need more sophisticated monitoring. Here are patterns for handling multi-step chains, tool usage, and complex agent workflows.
Multi-Chain Tracing
Complex LangChain applications often involve multiple chains calling each other. This enhanced callback handler tracks the relationships between parent and child chains:
from opentelemetry.trace import get_current_span
import time
class AdvancedObservabilityCallback(ObservabilityCallback):
def __init__(self):
super().__init__()
self.chain_hierarchy = {}
def on_chain_start(self, serialized, inputs, **kwargs):
super().on_chain_start(serialized, inputs, **kwargs)
run_id = kwargs.get('run_id')
parent_run_id = kwargs.get('parent_run_id')
if parent_run_id:
self.chain_hierarchy[run_id] = parent_run_id
if run_id in self.spans:
self.spans[run_id]['span'].set_attribute(
"chain.parent_id", str(parent_run_id)
)
def on_tool_start(self, serialized, input_str, **kwargs):
run_id = kwargs.get('run_id')
span = self.tracer.start_span(f"tool_{serialized.get('name', 'unknown')}")
span.set_attribute("tool.name", serialized.get("name"))
span.set_attribute("tool.input", input_str[:300])
# Secure tools
tool_key = f"tool_{run_id}"
self.spans[tool_key] = {
'span': span,
'start_time': time.time(),
'type': 'tool'
}
def on_tool_end(self, output, **kwargs):
run_id = kwargs.get('run_id')
tool_key = f"tool_{run_id}"
if tool_key in self.spans:
span_data = self.spans[tool_key]
span = span_data['span']
duration = time.time() - span_data['start_time']
span.set_attribute("tool.duration_seconds", duration)
span.set_attribute("tool.output", str(output)[:200])
span.end()
del self.spans[tool_key]
This enhanced callback gives you visibility into agent workflows where chains call tools, which might call other chains, creating complex execution trees.
Cost Monitoring with Metrics
While traces show individual executions, metrics track trends over time. Here's how to add comprehensive cost and performance metrics to your monitoring setup:
import os
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
def setup_metrics():
"""
Configure metrics collection and export.
Metrics complement traces by showing trends and aggregate data.
"""
# Configure metrics exporter
metric_exporter = OTLPMetricExporter(
endpoint=os.getenv("OTLP_ENDPOINT", "https://api.uptrace.dev:4318"),
headers={"uptrace-dsn": os.getenv("UPTRACE_DSN")}
)
# Export metrics every 30 seconds
metric_reader = PeriodicExportingMetricReader(
metric_exporter, export_interval_millis=30000
)
metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))
# Initialize metrics alongside tracing
setup_metrics()
meter = metrics.get_meter(__name__)
# Define the metrics you want to track
token_counter = meter.create_counter(
"langchain_tokens_total",
description="Total tokens consumed by LangChain operations"
)
cost_counter = meter.create_counter(
"langchain_cost_usd_total",
description="Total estimated cost in USD"
)
request_duration = meter.create_histogram(
"langchain_request_duration_seconds",
description="Request processing time distribution"
)
class MetricsCallback(AdvancedObservabilityCallback):
"""
Callback that emits both traces and metrics for comprehensive monitoring
"""
def on_llm_end(self, response, **kwargs):
super().on_llm_end(response, **kwargs)
if hasattr(response, 'llm_output') and response.llm_output:
usage = response.llm_output.get('token_usage', {})
tokens = usage.get('total_tokens', 0)
model = kwargs.get('invocation_params', {}).get('model', 'unknown')
# Record metrics with labels for filtering
token_counter.add(tokens, {"model": model})
# Calculate and record cost
cost_per_token = self._get_model_cost(model)
estimated_cost = tokens * cost_per_token
cost_counter.add(estimated_cost, {"model": model})
def _get_model_cost(self, model):
"""
Get cost per token for different models.
Update these values based on current pricing.
"""
costs = {
'gpt-4o': 0.000005, # $5 per 1M tokens
'gpt-4o-mini': 0.00000015, # $0.15 per 1M tokens
'gpt-3.5-turbo': 0.000001, # $1 per 1M tokens
}
return costs.get(model, 0.000003) # Default cost
With these metrics, you can create dashboards showing token usage over time, cost trends by model, and performance percentiles across your application.
Troubleshooting
When problems arise in production LangChain applications, your observability data becomes crucial for quick diagnosis. Here are the most common issues and how to identify them using your monitoring setup.
Performance Problems
Performance issues in LangChain apps often stem from specific components that aren't obvious without detailed tracing:
Symptom | Likely Cause | How to Diagnose | Solution |
---|---|---|---|
Requests timing out | Large prompts overwhelming models | Check span attributes for prompt length and model response times | • Use faster models (gpt-4o-mini) • Implement prompt truncation • Add request timeouts |
High latency spikes | Vector store queries taking too long | Look for spans labeled "vector_search" with >2s duration | • Optimize embedding dimensions • Add query result caching • Monitor vector DB performance |
Memory usage growing | Chain state accumulating between requests | Monitor memory metrics alongside request volume | • Clear conversation memory • Implement session cleanup • Add memory usage alerts |
Cost Control Issues
Cost surprises are common in production LangChain apps. Your monitoring data helps identify and prevent them:
Symptom | Root Cause | How to Detect | Fix |
---|---|---|---|
Unexpected high bills | Recursive chains or unusually long contexts | Alert when langchain_cost_usd_total spikes or langchain_tokens_total exceeds thresholds |
• Set token limits per request • Monitor cost metrics • Add circuit breakers |
Token waste from failures | Failed requests still consuming tokens | Compare successful vs failed spans, check for high token usage in error spans | • Track failed vs successful requests • Implement retry logic • Use cheaper models for retries |
Inefficient model selection | Using expensive models for simple tasks | Analyze cost metrics by model type and request complexity | • Route simple queries to cheaper models • Implement model selection logic • Monitor cost efficiency ratios |
Error Detection Patterns
To catch and categorize different types of errors, extend your callback handler with error classification:
class ErrorTrackingCallback(MetricsCallback):
"""
Enhanced callback that categorizes and tracks different error types
for better debugging and alerting
"""
def __init__(self):
super().__init__()
self.error_counter = meter.create_counter(
"langchain_errors_total",
description="Total errors by category and chain type"
)
def on_chain_error(self, error, **kwargs):
super().on_chain_error(error, **kwargs)
# Categorize error for better monitoring
error_type = self._categorize_error(error)
chain_name = kwargs.get('serialized', {}).get('name', 'unknown')
# Record error metric with useful labels
self.error_counter.add(1, {
"error_type": error_type,
"chain_name": chain_name
})
def _categorize_error(self, error):
"""
Categorize errors into actionable types for alerting and debugging
"""
error_str = str(error).lower()
if 'timeout' in error_str:
return 'timeout'
elif 'rate limit' in error_str:
return 'rate_limit'
elif 'token' in error_str and 'limit' in error_str:
return 'token_limit'
elif 'authentication' in error_str or 'api key' in error_str:
return 'auth_error'
elif 'connection' in error_str:
return 'connection_error'
else:
return 'unknown'
This error categorization helps you set up specific alerts for different problem types and understand which issues are most common in your application.
Production Deployment
Moving from development to production requires additional considerations around containerization, environment management, and scaling.
Docker Configuration
Here's a production-ready Docker setup that includes proper health checks and resource limits:
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies including curl for health checks
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create non-root user with home directory
RUN useradd --create-home --shell /bin/bash app && \
chown -R app:app /app
USER app
# Set environment variables
ENV ENVIRONMENT=production
ENV PYTHONPATH=/app
ENV PYTHONUNBUFFERED=1
# Health check
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Start application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Environment Variables
Organize your configuration using environment variables that work across development and production:
# OpenTelemetry Configuration
export UPTRACE_DSN="https://your-token@api.uptrace.dev/project-id"
export OTEL_SERVICE_NAME="langchain-app"
export OTEL_SERVICE_VERSION="1.0.0"
export OTEL_ENVIRONMENT="production"
export OTEL_RESOURCE_ATTRIBUTES="service.name=langchain-app,service.version=1.0.0"
# Alternative: Using separate OTLP endpoint (for other platforms)
# export OTLP_ENDPOINT="https://api.uptrace.dev:4317"
# export OTLP_HEADERS="uptrace-dsn=your-dsn-here"
# LangChain Configuration
export OPENAI_API_KEY="sk-your-openai-key-here"
export LANGCHAIN_TRACING_V2="false" # Disabled - using custom OpenTelemetry callbacks
export LANGCHAIN_API_KEY="" # Not needed when using custom tracing
# Application Settings
export ENVIRONMENT="production"
export LOG_LEVEL="INFO"
export MAX_TOKENS_PER_REQUEST="10000"
export REQUEST_TIMEOUT_SECONDS="30"
# Optional: Cost Control
export DAILY_COST_LIMIT_USD="500"
export HOURLY_TOKEN_LIMIT="100000"
# Security (for production)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export RATE_LIMIT_REQUESTS_PER_MINUTE="100"
Kubernetes Deployment
For scalable production deployment, here's a Kubernetes configuration with proper resource management and observability:
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-app
labels:
app: langchain-app
spec:
replicas: 3
selector:
matchLabels:
app: langchain-app
template:
metadata:
labels:
app: langchain-app
annotations:
# Annotations for observability platform integration
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: langchain-app:latest
ports:
- containerPort: 8000
name: http
env:
- name: UPTRACE_DSN
valueFrom:
secretKeyRef:
name: observability-secrets
key: uptrace-dsn
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: api-secrets
key: openai-key
- name: OTEL_SERVICE_NAME
value: "langchain-app"
- name: ENVIRONMENT
value: "production"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
Key Monitoring Dashboards
Once your observability data is flowing, you'll need dashboards to visualize and understand your application's behavior. Here's what to monitor and why each metric matters.
Essential Metrics Dashboard
Your primary dashboard should focus on the metrics that indicate overall application health and cost control:
Request Volume & Performance
These metrics help you understand load patterns and identify performance regressions:
- Requests per minute - Shows traffic patterns and helps with capacity planning
- Average response time - Indicates overall user experience
- 95th percentile latency - Catches performance issues affecting some users
- Error rate percentage - Critical for reliability monitoring
Cost Tracking
Since LangChain applications have variable costs, these metrics prevent budget surprises:
- Tokens consumed per hour - Early warning for cost spikes
- Cost per request - Efficiency metric for optimization
- Daily/monthly spend trends - Budget tracking and forecasting
- Cost by model type - Optimization opportunities
Chain Performance
LangChain-specific metrics that help optimize your AI workflows:
- Chain execution time by type - Identify slow chains for optimization
- Success rate by chain - Reliability monitoring for individual components
- Token usage distribution - Understand resource consumption patterns
- Failed chain analysis - Error pattern identification
Alerting Rules
Set up proactive alerts to catch issues before they impact users or budgets:
# Example alerting rules (syntax varies by platform)
# Adjust thresholds based on your application scale and budget
cost_alerts:
high_token_usage:
description: "Alert when token consumption exceeds expected levels"
condition: "langchain_tokens_total > 50000" # Adjust per your usage
window: "1h"
severity: "warning"
action: "slack_notification"
daily_cost_breach:
description: "Critical alert for budget protection"
condition: "sum_over_time(langchain_cost_usd_total[24h]) > 200" # $200/day
window: "24h"
severity: "critical"
action: "email_alert"
performance_alerts:
error_rate_spike:
description: "High error rate indicates system issues"
condition: "rate(langchain_errors_total[5m]) > 0.05" # 5% error rate
window: "5m"
severity: "critical"
action: "pagerduty"
high_latency:
description: "Slow responses affecting user experience"
condition: "histogram_quantile(0.95, langchain_request_duration_seconds) > 15"
window: "10m"
severity: "warning"
action: "slack_notification"
reliability_alerts:
chain_failures:
description: "Specific chain type failing frequently"
condition: "increase(langchain_errors_total{chain_name=\"summarization\"}[30m]) > 10"
window: "30m"
severity: "warning"
action: "email_alert"
model_unavailability:
description: "LLM service experiencing issues"
condition: "rate(langchain_errors_total{error_type=\"connection_error\"}[15m]) > 0.1"
window: "15m"
severity: "critical"
action: "pagerduty"
Migration from Other Solutions
If you're currently using other monitoring solutions, here's how to migrate to the OpenTelemetry-based solution.
From LangSmith
LangSmith provides built-in tracing for LangChain, but OpenTelemetry offers more flexibility and control. Replace LangSmith environment variables with OpenTelemetry setup::
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Disable LangSmith
os.environ["LANGCHAIN_TRACING_V2"] = "false"
# Setup OpenTelemetry
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(
endpoint="https://api.uptrace.dev:4317",
headers={"uptrace-dsn": os.getenv("UPTRACE_DSN")}
)
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
# Use with chains
callback_handler = ObservabilityCallback()
result = chain.invoke(input_data, config={"callbacks": [callback_handler]})
Benefits of OpenTelemetry Setup
Here's why you might want to migrate from LangSmith or other solutions:
Feature | LangSmith | OpenTelemetry + Uptrace |
---|---|---|
Setup Complexity | Simple environment variables | Moderate - custom callbacks |
Customization | Limited to built-in features | Full control over data collection |
Cost | Usage-based pricing | Fixed pricing, predictable costs |
Integration | LangChain applications only | Works with any application framework |
Self-hosting | Not available | Available for data privacy |
Metrics & Logs | Traces only | Unified traces, metrics, and logs |
Vendor Lock-in | Tied to LangChain ecosystem | Open standard, portable |
The OpenTelemetry approach gives you more control and flexibility, especially as your application grows beyond simple LangChain chains.
Conclusion
Implementing comprehensive observability for LangChain applications requires monitoring three critical areas: performance, costs, and errors. Unlike traditional web applications, LangChain apps have unique failure modes and cost structures that demand specialized monitoring approaches.
The key to success is starting simple with basic tracing, then expanding to include cost monitoring and advanced metrics as your application scales. Remember that observability overhead should be minimal—typically less than 2% of your application's total latency.
The patterns and code examples in this guide provide a solid foundation for monitoring LangChain applications in both development and production environments. With proper observability in place, you'll catch cost spikes before they impact your budget, identify performance issues before they frustrate users, and debug complex chain failures with confidence.
Start with the basic setup, customize the monitoring to fit your specific use cases, and gradually expand your observability coverage as your LangChain application grows in complexity and scale.
You may also be interested in:
Top comments (0)