Building a sidecar OpenLLMetry by using Bobβs talents π€
TL;DR β What is traceloopβs OpenLLMetry
Traceloop OpenLLMetry is an open-source observability framework built on top of OpenTelemetry, specifically designed to provide deep visibility into the execution of Large Language Model (LLM) applications. It enables developers to monitor and debug their AI systems by automatically instrumenting popular LLM providers (like OpenAI, Anthropic, and Azure) and vector databases (such as Pinecone, Milvus, or Chroma). By integrating OpenLLMetry, you gain access to high-fidelity distributed tracing, allowing you to visualize the entire lifecycle of a request β from the initial prompt and retrieval-augmented generation (RAG) steps to the final model response β ensuring you can pinpoint bottlenecks, evaluate model performance, and track token usage across your infrastructure.
Second TL;DR β Sidecar pattern
The sidecar design pattern functions by deploying a secondary βsidecarβ container alongside a primary application container within the same execution environment, such as a Kubernetes pod or a shared network namespace. The core logic relies on separation of concerns, where the application container remains focused exclusively on its business logic while the sidecar handles cross-cutting tasks like distributed tracing, logging, or proxying traffic. A fundamental pre-requisite for this pattern is the use of container images; both the application and the sidecar must be packaged as independent images to allow them to be βpluggedβ together. This modularity enables the sidecar to intercept requests β such as LLM API calls β and add OpenTelemetry instrumentation without requiring any code changes to the primary application image.
Evolving Observability: Moving to a Sidecar Pattern
In one of my previous explorations, I demonstrated a standalone implementation of OpenLLMetry. We saw how straightforward it is to integrate into a Python application, requiring just a few lines of code to unlock deep visibility into LLM calls. While powerful, that approach requires modifying the core application code.
Basically doing this (excerpt from traceloop documentation) ‡οΈ
#######################
pip install traceloop-sdk
#######################
#...
import os
from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow
Traceloop.init(app_name="joke_generation_service")
@workflow(name="joke_creation")
def create_joke():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
)
return completion.choices[0].message.content
The Vision: Pluggable Observability
This time, I wanted to push the architecture further by decoupling the monitoring logic from the business logic. The goal was to implement OpenLLMetry using a βsidecarβ design pattern. This approach makes the observability layer almost entirely pluggable, allowing it to be attached to virtually any application container without cluttering the primary codebase.
Building with Bob
To bring this modular architecture to life, I teamed up with my new AI partner, IBM Bob. I tasked Bob with building a robust application that could serve as the primary service, while I focused on engineering the side-car to capture traces, monitor performance, and manage the OpenTelemetry export pipeline.
The Logic and Implementaion
The project demonstrates a sidecar design pattern for implementing LLM observability using OpenLLMetry, allowing to add distributed tracing to applications without modifying any core code. By deploying an independent TraceLoop sidecar proxy alongside a primary service β such as the one built by IBM Bob β all HTTP traffic to the LLM engine (e.g., Ollama) is intercepted and instrumented with OpenTelemetry spans. This architecture ensures a clean separation of concerns: the application remains focused on business logic while the sidecar captures high-fidelity metadata, including prompts, responses, and token usage, before forwarding traces to a collector and visualization tool like Jaeger.
The main idea of this logic is the seperation of concerns!
Application Code: Observability Infrastructure:
ββββββββββββββββ ββββββββββββββββββββ
β Business β β TraceLoop β
β Logic Only βββββββββΆβ Sidecar β
β β β (Tracing Proxy) β
ββββββββββββββββ ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β OTel Collector β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β Jaeger (Storage) β
ββββββββββββββββββββ
πͺ Letβs jump into some practical example. The main sample application (a very basic chat application using Ollama and Granite).
#!/usr/bin/env python3
"""
Simple Ollama Application - No Tracing Code
This application uses Ollama for LLM inference without any built-in tracing.
Tracing will be handled by the TraceLoop sidecar.
"""
import os
import sys
import time
import logging
from datetime import datetime
from pathlib import Path
import ollama
from flask import Flask, request, jsonify
app = Flask(__name__)
# Configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest")
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
# Setup logging directory
LOG_DIR = Path("./logs")
LOG_DIR.mkdir(exist_ok=True)
# Create timestamped log file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = LOG_DIR / f"ollama_app_{timestamp}.log"
# Configure logging
logging.basicConfig(
level=getattr(logging, LOG_LEVEL.upper()),
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(log_file),
logging.StreamHandler(sys.stdout)
]
)
logger = logging.getLogger(__name__)
# Log startup information
logger.info("=" * 60)
logger.info("Simple Ollama Application Starting")
logger.info("=" * 60)
logger.info(f"Ollama Host: {OLLAMA_HOST}")
logger.info(f"Model: {OLLAMA_MODEL}")
logger.info(f"Log Level: {LOG_LEVEL}")
logger.info(f"Log File: {log_file}")
logger.info("=" * 60)
@app.route('/health', methods=['GET'])
def health():
"""Health check endpoint"""
logger.debug("Health check requested")
return jsonify({
"status": "healthy",
"model": OLLAMA_MODEL,
"ollama_host": OLLAMA_HOST,
"log_file": str(log_file)
}), 200
@app.route('/chat', methods=['POST'])
def chat():
"""
Chat endpoint - accepts a prompt and returns a response
Request body:
{
"prompt": "Your question here",
"model": "optional-model-override"
}
"""
request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
logger.info(f"[{request_id}] Chat request received")
try:
data = request.get_json()
if not data or 'prompt' not in data:
logger.warning(f"[{request_id}] Missing 'prompt' in request body")
return jsonify({"error": "Missing 'prompt' in request body"}), 400
prompt = data['prompt']
model = data.get('model', OLLAMA_MODEL)
logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
logger.info(f"[{request_id}] Model: {model}")
logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}")
# Create Ollama client
try:
client = ollama.Client(host=OLLAMA_HOST)
logger.debug(f"[{request_id}] Ollama client created successfully")
except Exception as e:
logger.error(f"[{request_id}] Failed to create Ollama client: {e}")
raise
# Send the prompt
start_time = time.time()
logger.info(f"[{request_id}] Sending request to Ollama...")
try:
response = client.chat(
model=model,
messages=[
{
'role': 'user',
'content': prompt,
},
],
)
logger.debug(f"[{request_id}] Received response from Ollama")
except Exception as e:
logger.error(f"[{request_id}] Ollama request failed: {e}")
raise
response_text = response['message']['content']
duration = time.time() - start_time
logger.info(f"[{request_id}] Response received in {duration:.2f}s")
logger.info(f"[{request_id}] Response length: {len(response_text)} characters")
logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...")
return jsonify({
"prompt": prompt,
"response": response_text,
"model": model,
"duration_seconds": duration,
"request_id": request_id
}), 200
except Exception as e:
logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True)
return jsonify({
"error": str(e),
"request_id": request_id
}), 500
@app.route('/batch', methods=['POST'])
def batch_chat():
"""
Batch chat endpoint - accepts multiple prompts
Request body:
{
"prompts": ["Question 1", "Question 2", ...],
"model": "optional-model-override"
}
"""
request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
logger.info(f"[{request_id}] Batch request received")
try:
data = request.get_json()
if not data or 'prompts' not in data:
logger.warning(f"[{request_id}] Missing 'prompts' in request body")
return jsonify({"error": "Missing 'prompts' in request body"}), 400
prompts = data['prompts']
model = data.get('model', OLLAMA_MODEL)
if not isinstance(prompts, list):
logger.warning(f"[{request_id}] 'prompts' is not a list")
return jsonify({"error": "'prompts' must be a list"}), 400
logger.info(f"[{request_id}] Processing {len(prompts)} prompts")
logger.info(f"[{request_id}] Model: {model}")
# Create Ollama client
try:
client = ollama.Client(host=OLLAMA_HOST)
logger.debug(f"[{request_id}] Ollama client created successfully")
except Exception as e:
logger.error(f"[{request_id}] Failed to create Ollama client: {e}")
raise
results = []
total_start = time.time()
for i, prompt in enumerate(prompts, 1):
logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...")
try:
start_time = time.time()
response = client.chat(
model=model,
messages=[
{
'role': 'user',
'content': prompt,
},
],
)
response_text = response['message']['content']
duration = time.time() - start_time
results.append({
"prompt": prompt,
"response": response_text,
"duration_seconds": duration
})
logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s")
except Exception as e:
logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}")
results.append({
"prompt": prompt,
"error": str(e),
"duration_seconds": 0
})
total_duration = time.time() - total_start
logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s")
return jsonify({
"results": results,
"model": model,
"total_duration_seconds": total_duration,
"count": len(results),
"request_id": request_id
}), 200
except Exception as e:
logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True)
return jsonify({
"error": str(e),
"request_id": request_id
}), 500
def run_sample_queries():
"""Run some sample queries on startup"""
logger.info("=" * 60)
logger.info("Running sample queries...")
logger.info("=" * 60)
sample_prompts = [
"What is OpenTelemetry?",
"Explain distributed tracing in one sentence.",
"What are the benefits of observability?",
]
try:
client = ollama.Client(host=OLLAMA_HOST)
logger.info(f"Connected to Ollama at {OLLAMA_HOST}")
except Exception as e:
logger.error(f"Failed to connect to Ollama: {e}")
return
for i, prompt in enumerate(sample_prompts, 1):
logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}")
try:
start_time = time.time()
response = client.chat(
model=OLLAMA_MODEL,
messages=[{'role': 'user', 'content': prompt}],
)
duration = time.time() - start_time
response_text = response['message']['content']
logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s")
logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...")
except Exception as e:
logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True)
time.sleep(1)
logger.info("=" * 60)
logger.info("Sample queries completed!")
logger.info("=" * 60)
if __name__ == "__main__":
# Run sample queries if in standalone mode
if os.getenv("RUN_SAMPLES", "true").lower() == "true":
try:
run_sample_queries()
except Exception as e:
logger.error(f"Sample queries failed: {e}", exc_info=True)
# Start Flask server
port = int(os.getenv("PORT", "8080"))
logger.info(f"Starting Flask server on port {port}...")
logger.info(f"Logs are being written to: {log_file}")
app.run(host='0.0.0.0', port=port, debug=False)
# Made with Bob
Now, we build a βproxyβ ποΈ application which would be the sidecar implementation.
#!/usr/bin/env python3
"""
TraceLoop Sidecar - OpenLLMetry Tracing Proxy
This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls.
It acts as a transparent proxy that adds LLM-specific tracing.
"""
import os
import json
import requests
from flask import Flask, request, Response
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow, task
from opentelemetry import trace
app = Flask(__name__)
# Configuration
OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434")
OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar")
TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app")
print("=" * 70)
print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy")
print("=" * 70)
print(f"Upstream Ollama: {OLLAMA_UPSTREAM}")
print(f"OTEL Endpoint: {OTEL_ENDPOINT}")
print(f"Service Name: {SERVICE_NAME}")
print(f"Traced Service: {TRACED_SERVICE_NAME}")
print("=" * 70)
def init_tracing():
"""Initialize OpenLLMetry (Traceloop SDK)"""
Traceloop.init(
app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name
disable_batch=False,
exporter_otlp_endpoint=OTEL_ENDPOINT,
# Enable LLM-specific instrumentation
should_enrich_metrics=True,
)
print("β OpenLLMetry (Traceloop SDK) initialized successfully")
# Initialize tracing on startup
init_tracing()
tracer = trace.get_tracer(__name__)
@app.route('/health', methods=['GET'])
def health():
"""Health check endpoint"""
return {"status": "healthy", "service": SERVICE_NAME}, 200
@task(name="ollama_api_call")
def proxy_ollama_request(method, path, headers, data, query_string):
"""
Proxy request to Ollama with OpenLLMetry tracing.
The @task decorator automatically creates spans and adds LLM attributes.
"""
# Build upstream URL
upstream_url = f"{OLLAMA_UPSTREAM}/{path}"
if query_string:
upstream_url += f"?{query_string.decode()}"
# Parse request data for logging
request_data = None
if data:
try:
request_data = json.loads(data)
except:
request_data = data.decode('utf-8', errors='ignore')
# Get current span to add custom attributes
current_span = trace.get_current_span()
# Add custom attributes
current_span.set_attribute("http.method", method)
current_span.set_attribute("http.url", upstream_url)
current_span.set_attribute("http.target", f"/{path}")
current_span.set_attribute("llm.system", "ollama")
# Extract and add LLM-specific attributes
if request_data and isinstance(request_data, dict):
if "model" in request_data:
current_span.set_attribute("llm.model", request_data["model"])
# For chat API
if "messages" in request_data:
messages = request_data["messages"]
if messages and len(messages) > 0:
last_message = messages[-1]
if "content" in last_message:
prompt = last_message["content"]
current_span.set_attribute("llm.prompts", prompt[:1000])
current_span.set_attribute("llm.request.type", "chat")
# For generate API
if "prompt" in request_data:
current_span.set_attribute("llm.prompts", request_data["prompt"][:1000])
current_span.set_attribute("llm.request.type", "completion")
# Log the request
print(f"\n[PROXY] {method} /{path}")
if request_data and isinstance(request_data, dict):
if "model" in request_data:
print(f"[PROXY] Model: {request_data['model']}")
if "messages" in request_data and request_data["messages"]:
print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...")
elif "prompt" in request_data:
print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...")
try:
# Forward request to upstream Ollama
upstream_response = requests.request(
method=method,
url=upstream_url,
headers={k: v for k, v in headers if k.lower() != 'host'},
data=data,
allow_redirects=False,
timeout=300 # 5 minutes for model operations
)
# Parse response
response_data = None
try:
response_data = upstream_response.json()
except:
response_data = upstream_response.text
# Add response attributes
current_span.set_attribute("http.status_code", upstream_response.status_code)
# Extract response content
if response_data and isinstance(response_data, dict):
# For chat API
if "message" in response_data:
message = response_data["message"]
if "content" in message:
response_text = message["content"]
current_span.set_attribute("llm.responses", response_text[:1000])
current_span.set_attribute("llm.response_length", len(response_text))
print(f"[PROXY] Response: {response_text[:100]}...")
# For generate API
elif "response" in response_data:
response_text = response_data["response"]
current_span.set_attribute("llm.responses", response_text[:1000])
current_span.set_attribute("llm.response_length", len(response_text))
print(f"[PROXY] Response: {response_text[:100]}...")
print(f"[PROXY] Status: {upstream_response.status_code}")
# Return response
return Response(
upstream_response.content,
status=upstream_response.status_code,
headers=dict(upstream_response.headers)
)
except Exception as e:
current_span.set_attribute("error", True)
current_span.set_attribute("error.message", str(e))
current_span.record_exception(e)
print(f"[PROXY ERROR] {str(e)}")
raise
@workflow(name="ollama_proxy")
@app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
def proxy(path):
"""
Main proxy endpoint. The @workflow decorator creates a parent span for the entire request.
"""
try:
return proxy_ollama_request(
method=request.method,
path=path,
headers=request.headers,
data=request.data,
query_string=request.query_string
)
except Exception as e:
return {"error": str(e)}, 500
if __name__ == "__main__":
port = int(os.getenv("PORT", "11434"))
print(f"\nStarting TraceLoop Sidecar on port {port}...")
print(f"Proxying to: {OLLAMA_UPSTREAM}")
print(f"Using OpenLLMetry for automatic LLM tracing")
print("=" * 70 + "\n")
app.run(host='0.0.0.0', port=port, debug=False)
# Made with Bob
Architecting for Flexibility: Project Structure and Multi-Platform Support
The project is engineered with a modular structure to support various deployment strategies, ensuring that observability remains βpluggableβ regardless of the environment. Because my personal development workflow relies on Podman and Minikube, I collaborated with Bob to design several implementation types.
Bob helped architect a structure that separates the pure business logic of the application from the tracing infrastructure. This resulted in a comprehensive setup where OpenLLMetry operates as a transparent proxy, intercepting traffic between the application and the LLM engine. Whether deploying via Docker Compose for quick local testing or using Kubernetes (Minikube) for a production-grade simulation, the sidecar pattern remains consistent.
Project Overview
- ollama-simple-app/: The core application built by Bob, containing pure business logic with zero tracing code or OpenTelemetry dependencies.
- traceloop-sidecar/: The independent tracing proxy that provides the "sidecar" functionality.
- k8s/ & docker-compose/: Deployment manifests specifically tailored for different container engines, including specialized support for Podman users.
- Utility Scripts: A suite of automated tools (like deploy-podman.sh) to streamline building, loading, and deploying images across these diverse environments. By leveraging independent container images for both the application and the sidecar, we can ensure that the tracing layer can be updated, scaled, or swapped out without ever needing to modify the βoriginalβ application code.
.
βββ ollama-simple-app/ # Application WITHOUT tracing code
β βββ app.py # Pure Flask app using Ollama
β βββ requirements.txt # No OpenTelemetry dependencies!
β βββ Dockerfile
βββ traceloop-sidecar/ # Independent tracing sidecar
β βββ proxy.py # Transparent tracing proxy
β βββ requirements.txt # OpenTelemetry dependencies here
β βββ Dockerfile
βββ ollama-app/ # (Optional) App with built-in tracing
β βββ ... # For comparison purposes
βββ collector/ # OpenTelemetry Collector
β βββ otel-collector-config.yaml
β βββ Dockerfile
βββ k8s/ # Kubernetes manifests
β βββ 00-namespace.yaml
β βββ 01-otel-collector.yaml
β βββ 02-ollama.yaml
β βββ 04-jaeger.yaml
β βββ 05-ollama-simple-app.yaml # Sidecar deployment
βββ docker-compose/
β βββ docker-compose.yaml
βββ start-all.sh # Utility: Start all services
βββ stop-all.sh # Utility: Stop all services
βββ push-to-github.sh # Utility: Push to GitHub
βββ README.md
The application could be deployed and tested using Docker/Podman, Minikube and adaptable to other Kuberntes flavors! Several scripts are provided to start/stop the applications, trace logs in case of errors. A thorough Podman dcumentation was generated according to my request from Bob.π
Last but not least, as bonus, a sample application in Python to test OpenLLMetry directly is provided as well π¨βπ», with the objective to provide pros and cons of such implementation!
#!/usr/bin/env python3
"""
Sample Ollama Application with OpenLLMetry Tracing
This application demonstrates how to use Ollama with OpenTelemetry tracing
"""
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from traceloop.sdk import Traceloop
import ollama
# Initialize OpenTelemetry with Traceloop
def init_tracing():
"""Initialize OpenTelemetry tracing with OTLP exporter"""
# Get configuration from environment variables
otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app")
print(f"Initializing tracing with endpoint: {otlp_endpoint}")
print(f"Service name: {service_name}")
# Initialize Traceloop SDK
Traceloop.init(
app_name=service_name,
disable_batch=False,
exporter_otlp_endpoint=otlp_endpoint
)
print("Tracing initialized successfully")
def chat_with_ollama(model: str, prompt: str) -> str:
"""
Send a prompt to Ollama and get a response
Args:
model: The model to use (e.g., 'granite3:latest')
prompt: The prompt to send to the model
Returns:
The model's response
"""
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("ollama_chat") as span:
span.set_attribute("llm.model", model)
span.set_attribute("llm.prompt", prompt)
try:
# Get Ollama host from environment
ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434")
print(f"\nSending prompt to Ollama ({model})...")
print(f"Prompt: {prompt}")
# Create Ollama client
client = ollama.Client(host=ollama_host)
# Send the prompt
response = client.chat(
model=model,
messages=[
{
'role': 'user',
'content': prompt,
},
],
)
response_text = response['message']['content']
span.set_attribute("llm.response", response_text)
span.set_attribute("llm.response_length", len(response_text))
print(f"Response: {response_text}\n")
return response_text
except Exception as e:
span.set_attribute("error", True)
span.set_attribute("error.message", str(e))
print(f"Error: {e}")
raise
def main():
"""Main application loop"""
print("=" * 60)
print("Ollama Application with OpenLLMetry Tracing")
print("=" * 60)
# Initialize tracing
init_tracing()
# Get model from environment
model = os.getenv("OLLAMA_MODEL", "granite3:latest")
# Sample prompts to demonstrate tracing
prompts = [
"What is OpenTelemetry?",
"Explain distributed tracing in one sentence.",
"What are the benefits of observability?",
]
print(f"\nUsing model: {model}")
print(f"Running {len(prompts)} sample queries...\n")
# Run sample queries
for i, prompt in enumerate(prompts, 1):
print(f"Query {i}/{len(prompts)}")
try:
chat_with_ollama(model, prompt)
time.sleep(2) # Small delay between requests
except Exception as e:
print(f"Failed to process query: {e}")
print("\n" + "=" * 60)
print("All queries completed. Check your tracing backend for traces!")
print("=" * 60)
# Keep the application running to allow traces to be exported
print("\nKeeping application alive for trace export...")
time.sleep(10)
if __name__ == "__main__":
main()
# Made with Bob
Comparison: Sidecar vs Built-in Tracing
| Aspect | Sidecar (This Project) | Built-in Tracing |
| ---------------- | ---------------------- | -------------------- |
| Code changes | β None | β
Required |
| Dependencies | β None in app | β
OpenTelemetry libs |
| Language support | β
Any | β οΈ Language-specific |
| Maintenance | β
Centralized | β οΈ Per application |
| Performance | β οΈ Extra hop | β
Direct |
| Flexibility | β οΈ HTTP only | β
Any protocol |
Final Thoughts: Seamless Deployment Across the Ecosystem
To wrap up, this implementation demonstrates that advanced LLM observability doesnβt require complex code changes. By leveraging the sidecar pattern, the application is ready for immediate deployment and testing across a wide range of environments. Whether you are using Docker or Podman for local development, or orchestrating via Minikube, this setup is designed to be effortlessly adaptable to any Kubernetes flavor. Thanks to Bobβs help in structuring the project, you can now plug high-fidelity tracing into your AI workflows with a single command, regardless of your infrastructure.
Thanks for reading π»
Links
- GitHub Code Repository: https://github.com/aairom/OpenLLMetry-SideCar
- Traceloop OpenLLMetry: https://www.traceloop.com/docs/openllmetry/introduction
- IBM Project Bob: https://www.ibm.com/products/bob
- Sidecar Pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/sidecar
- Sidecar containers: https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/





Top comments (0)