Alain Airom

Posted on Jan 2

Applying Sidecar 🏎️ pattern to OpenLLMetry using Bob!

#bob #projectbob #openllmetry #observability

Building a sidecar OpenLLMetry by using Bob’s talents 🤖

TL;DR — What is traceloop’s OpenLLMetry

Traceloop OpenLLMetry is an open-source observability framework built on top of OpenTelemetry, specifically designed to provide deep visibility into the execution of Large Language Model (LLM) applications. It enables developers to monitor and debug their AI systems by automatically instrumenting popular LLM providers (like OpenAI, Anthropic, and Azure) and vector databases (such as Pinecone, Milvus, or Chroma). By integrating OpenLLMetry, you gain access to high-fidelity distributed tracing, allowing you to visualize the entire lifecycle of a request — from the initial prompt and retrieval-augmented generation (RAG) steps to the final model response — ensuring you can pinpoint bottlenecks, evaluate model performance, and track token usage across your infrastructure.

Second TL;DR — Sidecar pattern

The sidecar design pattern functions by deploying a secondary “sidecar” container alongside a primary application container within the same execution environment, such as a Kubernetes pod or a shared network namespace. The core logic relies on separation of concerns, where the application container remains focused exclusively on its business logic while the sidecar handles cross-cutting tasks like distributed tracing, logging, or proxying traffic. A fundamental pre-requisite for this pattern is the use of container images; both the application and the sidecar must be packaged as independent images to allow them to be “plugged” together. This modularity enables the sidecar to intercept requests — such as LLM API calls — and add OpenTelemetry instrumentation without requiring any code changes to the primary application image.

Evolving Observability: Moving to a Sidecar Pattern

In one of my previous explorations, I demonstrated a standalone implementation of OpenLLMetry. We saw how straightforward it is to integrate into a Python application, requiring just a few lines of code to unlock deep visibility into LLM calls. While powerful, that approach requires modifying the core application code.

Basically doing this (excerpt from traceloop documentation) ⤵️

#######################
pip install traceloop-sdk
#######################

#...
import os

from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow

Traceloop.init(app_name="joke_generation_service")

@workflow(name="joke_creation")
def create_joke():
  client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
  completion = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
  )

  return completion.choices[0].message.content

The Vision: Pluggable Observability

This time, I wanted to push the architecture further by decoupling the monitoring logic from the business logic. The goal was to implement OpenLLMetry using a “sidecar” design pattern. This approach makes the observability layer almost entirely pluggable, allowing it to be attached to virtually any application container without cluttering the primary codebase.

Building with Bob

To bring this modular architecture to life, I teamed up with my new AI partner, IBM Bob. I tasked Bob with building a robust application that could serve as the primary service, while I focused on engineering the side-car to capture traces, monitor performance, and manage the OpenTelemetry export pipeline.

The Logic and Implementaion

The project demonstrates a sidecar design pattern for implementing LLM observability using OpenLLMetry, allowing to add distributed tracing to applications without modifying any core code. By deploying an independent TraceLoop sidecar proxy alongside a primary service — such as the one built by IBM Bob — all HTTP traffic to the LLM engine (e.g., Ollama) is intercepted and instrumented with OpenTelemetry spans. This architecture ensures a clean separation of concerns: the application remains focused on business logic while the sidecar captures high-fidelity metadata, including prompts, responses, and token usage, before forwarding traces to a collector and visualization tool like Jaeger.

The main idea of this logic is the seperation of concerns!

Application Code:        Observability Infrastructure:
┌──────────────┐        ┌──────────────────┐
│ Business     │        │ TraceLoop        │
│ Logic Only   │───────▶│ Sidecar          │
│              │        │ (Tracing Proxy)  │
└──────────────┘        └────────┬─────────┘
                                 │
                                 ▼
                        ┌──────────────────┐
                        │ OTel Collector   │
                        └────────┬─────────┘
                                 │
                                 ▼
                        ┌──────────────────┐
                        │ Jaeger (Storage) │
                        └──────────────────┘

🪂 Let’s jump into some practical example. The main sample application (a very basic chat application using Ollama and Granite).

#!/usr/bin/env python3
"""
Simple Ollama Application - No Tracing Code
This application uses Ollama for LLM inference without any built-in tracing.
Tracing will be handled by the TraceLoop sidecar.
"""
import os
import sys
import time
import logging
from datetime import datetime
from pathlib import Path
import ollama
from flask import Flask, request, jsonify

app = Flask(__name__)

# Configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest")
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")

# Setup logging directory
LOG_DIR = Path("./logs")
LOG_DIR.mkdir(exist_ok=True)

# Create timestamped log file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = LOG_DIR / f"ollama_app_{timestamp}.log"

# Configure logging
logging.basicConfig(
    level=getattr(logging, LOG_LEVEL.upper()),
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler(sys.stdout)
    ]
)

logger = logging.getLogger(__name__)

# Log startup information
logger.info("=" * 60)
logger.info("Simple Ollama Application Starting")
logger.info("=" * 60)
logger.info(f"Ollama Host: {OLLAMA_HOST}")
logger.info(f"Model: {OLLAMA_MODEL}")
logger.info(f"Log Level: {LOG_LEVEL}")
logger.info(f"Log File: {log_file}")
logger.info("=" * 60)


@app.route('/health', methods=['GET'])
def health():
    """Health check endpoint"""
    logger.debug("Health check requested")
    return jsonify({
        "status": "healthy",
        "model": OLLAMA_MODEL,
        "ollama_host": OLLAMA_HOST,
        "log_file": str(log_file)
    }), 200


@app.route('/chat', methods=['POST'])
def chat():
    """
    Chat endpoint - accepts a prompt and returns a response

    Request body:
    {
        "prompt": "Your question here",
        "model": "optional-model-override"
    }
    """
    request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
    logger.info(f"[{request_id}] Chat request received")

    try:
        data = request.get_json()

        if not data or 'prompt' not in data:
            logger.warning(f"[{request_id}] Missing 'prompt' in request body")
            return jsonify({"error": "Missing 'prompt' in request body"}), 400

        prompt = data['prompt']
        model = data.get('model', OLLAMA_MODEL)

        logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
        logger.info(f"[{request_id}] Model: {model}")
        logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}")

        # Create Ollama client
        try:
            client = ollama.Client(host=OLLAMA_HOST)
            logger.debug(f"[{request_id}] Ollama client created successfully")
        except Exception as e:
            logger.error(f"[{request_id}] Failed to create Ollama client: {e}")
            raise

        # Send the prompt
        start_time = time.time()
        logger.info(f"[{request_id}] Sending request to Ollama...")

        try:
            response = client.chat(
                model=model,
                messages=[
                    {
                        'role': 'user',
                        'content': prompt,
                    },
                ],
            )
            logger.debug(f"[{request_id}] Received response from Ollama")
        except Exception as e:
            logger.error(f"[{request_id}] Ollama request failed: {e}")
            raise

        response_text = response['message']['content']
        duration = time.time() - start_time

        logger.info(f"[{request_id}] Response received in {duration:.2f}s")
        logger.info(f"[{request_id}] Response length: {len(response_text)} characters")
        logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...")

        return jsonify({
            "prompt": prompt,
            "response": response_text,
            "model": model,
            "duration_seconds": duration,
            "request_id": request_id
        }), 200

    except Exception as e:
        logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True)
        return jsonify({
            "error": str(e),
            "request_id": request_id
        }), 500


@app.route('/batch', methods=['POST'])
def batch_chat():
    """
    Batch chat endpoint - accepts multiple prompts

    Request body:
    {
        "prompts": ["Question 1", "Question 2", ...],
        "model": "optional-model-override"
    }
    """
    request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
    logger.info(f"[{request_id}] Batch request received")

    try:
        data = request.get_json()

        if not data or 'prompts' not in data:
            logger.warning(f"[{request_id}] Missing 'prompts' in request body")
            return jsonify({"error": "Missing 'prompts' in request body"}), 400

        prompts = data['prompts']
        model = data.get('model', OLLAMA_MODEL)

        if not isinstance(prompts, list):
            logger.warning(f"[{request_id}] 'prompts' is not a list")
            return jsonify({"error": "'prompts' must be a list"}), 400

        logger.info(f"[{request_id}] Processing {len(prompts)} prompts")
        logger.info(f"[{request_id}] Model: {model}")

        # Create Ollama client
        try:
            client = ollama.Client(host=OLLAMA_HOST)
            logger.debug(f"[{request_id}] Ollama client created successfully")
        except Exception as e:
            logger.error(f"[{request_id}] Failed to create Ollama client: {e}")
            raise

        results = []
        total_start = time.time()

        for i, prompt in enumerate(prompts, 1):
            logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...")

            try:
                start_time = time.time()
                response = client.chat(
                    model=model,
                    messages=[
                        {
                            'role': 'user',
                            'content': prompt,
                        },
                    ],
                )

                response_text = response['message']['content']
                duration = time.time() - start_time

                results.append({
                    "prompt": prompt,
                    "response": response_text,
                    "duration_seconds": duration
                })

                logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s")

            except Exception as e:
                logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}")
                results.append({
                    "prompt": prompt,
                    "error": str(e),
                    "duration_seconds": 0
                })

        total_duration = time.time() - total_start
        logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s")

        return jsonify({
            "results": results,
            "model": model,
            "total_duration_seconds": total_duration,
            "count": len(results),
            "request_id": request_id
        }), 200

    except Exception as e:
        logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True)
        return jsonify({
            "error": str(e),
            "request_id": request_id
        }), 500


def run_sample_queries():
    """Run some sample queries on startup"""
    logger.info("=" * 60)
    logger.info("Running sample queries...")
    logger.info("=" * 60)

    sample_prompts = [
        "What is OpenTelemetry?",
        "Explain distributed tracing in one sentence.",
        "What are the benefits of observability?",
    ]

    try:
        client = ollama.Client(host=OLLAMA_HOST)
        logger.info(f"Connected to Ollama at {OLLAMA_HOST}")
    except Exception as e:
        logger.error(f"Failed to connect to Ollama: {e}")
        return

    for i, prompt in enumerate(sample_prompts, 1):
        logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}")
        try:
            start_time = time.time()
            response = client.chat(
                model=OLLAMA_MODEL,
                messages=[{'role': 'user', 'content': prompt}],
            )
            duration = time.time() - start_time
            response_text = response['message']['content']
            logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s")
            logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...")
        except Exception as e:
            logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True)

        time.sleep(1)

    logger.info("=" * 60)
    logger.info("Sample queries completed!")
    logger.info("=" * 60)


if __name__ == "__main__":
    # Run sample queries if in standalone mode
    if os.getenv("RUN_SAMPLES", "true").lower() == "true":
        try:
            run_sample_queries()
        except Exception as e:
            logger.error(f"Sample queries failed: {e}", exc_info=True)

    # Start Flask server
    port = int(os.getenv("PORT", "8080"))
    logger.info(f"Starting Flask server on port {port}...")
    logger.info(f"Logs are being written to: {log_file}")
    app.run(host='0.0.0.0', port=port, debug=False)

# Made with Bob

Now, we build a “proxy” 🖇️ application which would be the sidecar implementation.

#!/usr/bin/env python3
"""
TraceLoop Sidecar - OpenLLMetry Tracing Proxy
This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls.
It acts as a transparent proxy that adds LLM-specific tracing.
"""
import os
import json
import requests
from flask import Flask, request, Response
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow, task
from opentelemetry import trace

app = Flask(__name__)

# Configuration
OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434")
OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar")
TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app")

print("=" * 70)
print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy")
print("=" * 70)
print(f"Upstream Ollama: {OLLAMA_UPSTREAM}")
print(f"OTEL Endpoint: {OTEL_ENDPOINT}")
print(f"Service Name: {SERVICE_NAME}")
print(f"Traced Service: {TRACED_SERVICE_NAME}")
print("=" * 70)


def init_tracing():
    """Initialize OpenLLMetry (Traceloop SDK)"""
    Traceloop.init(
        app_name=TRACED_SERVICE_NAME,  # Use the application name, not sidecar name
        disable_batch=False,
        exporter_otlp_endpoint=OTEL_ENDPOINT,
        # Enable LLM-specific instrumentation
        should_enrich_metrics=True,
    )
    print("✓ OpenLLMetry (Traceloop SDK) initialized successfully")


# Initialize tracing on startup
init_tracing()
tracer = trace.get_tracer(__name__)


@app.route('/health', methods=['GET'])
def health():
    """Health check endpoint"""
    return {"status": "healthy", "service": SERVICE_NAME}, 200


@task(name="ollama_api_call")
def proxy_ollama_request(method, path, headers, data, query_string):
    """
    Proxy request to Ollama with OpenLLMetry tracing.
    The @task decorator automatically creates spans and adds LLM attributes.
    """
    # Build upstream URL
    upstream_url = f"{OLLAMA_UPSTREAM}/{path}"
    if query_string:
        upstream_url += f"?{query_string.decode()}"

    # Parse request data for logging
    request_data = None
    if data:
        try:
            request_data = json.loads(data)
        except:
            request_data = data.decode('utf-8', errors='ignore')

    # Get current span to add custom attributes
    current_span = trace.get_current_span()

    # Add custom attributes
    current_span.set_attribute("http.method", method)
    current_span.set_attribute("http.url", upstream_url)
    current_span.set_attribute("http.target", f"/{path}")
    current_span.set_attribute("llm.system", "ollama")

    # Extract and add LLM-specific attributes
    if request_data and isinstance(request_data, dict):
        if "model" in request_data:
            current_span.set_attribute("llm.model", request_data["model"])

        # For chat API
        if "messages" in request_data:
            messages = request_data["messages"]
            if messages and len(messages) > 0:
                last_message = messages[-1]
                if "content" in last_message:
                    prompt = last_message["content"]
                    current_span.set_attribute("llm.prompts", prompt[:1000])
                    current_span.set_attribute("llm.request.type", "chat")

        # For generate API
        if "prompt" in request_data:
            current_span.set_attribute("llm.prompts", request_data["prompt"][:1000])
            current_span.set_attribute("llm.request.type", "completion")

    # Log the request
    print(f"\n[PROXY] {method} /{path}")
    if request_data and isinstance(request_data, dict):
        if "model" in request_data:
            print(f"[PROXY] Model: {request_data['model']}")
        if "messages" in request_data and request_data["messages"]:
            print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...")
        elif "prompt" in request_data:
            print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...")

    try:
        # Forward request to upstream Ollama
        upstream_response = requests.request(
            method=method,
            url=upstream_url,
            headers={k: v for k, v in headers if k.lower() != 'host'},
            data=data,
            allow_redirects=False,
            timeout=300  # 5 minutes for model operations
        )

        # Parse response
        response_data = None
        try:
            response_data = upstream_response.json()
        except:
            response_data = upstream_response.text

        # Add response attributes
        current_span.set_attribute("http.status_code", upstream_response.status_code)

        # Extract response content
        if response_data and isinstance(response_data, dict):
            # For chat API
            if "message" in response_data:
                message = response_data["message"]
                if "content" in message:
                    response_text = message["content"]
                    current_span.set_attribute("llm.responses", response_text[:1000])
                    current_span.set_attribute("llm.response_length", len(response_text))
                    print(f"[PROXY] Response: {response_text[:100]}...")

            # For generate API
            elif "response" in response_data:
                response_text = response_data["response"]
                current_span.set_attribute("llm.responses", response_text[:1000])
                current_span.set_attribute("llm.response_length", len(response_text))
                print(f"[PROXY] Response: {response_text[:100]}...")

        print(f"[PROXY] Status: {upstream_response.status_code}")

        # Return response
        return Response(
            upstream_response.content,
            status=upstream_response.status_code,
            headers=dict(upstream_response.headers)
        )

    except Exception as e:
        current_span.set_attribute("error", True)
        current_span.set_attribute("error.message", str(e))
        current_span.record_exception(e)
        print(f"[PROXY ERROR] {str(e)}")
        raise


@workflow(name="ollama_proxy")
@app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
def proxy(path):
    """
    Main proxy endpoint. The @workflow decorator creates a parent span for the entire request.
    """
    try:
        return proxy_ollama_request(
            method=request.method,
            path=path,
            headers=request.headers,
            data=request.data,
            query_string=request.query_string
        )
    except Exception as e:
        return {"error": str(e)}, 500


if __name__ == "__main__":
    port = int(os.getenv("PORT", "11434"))
    print(f"\nStarting TraceLoop Sidecar on port {port}...")
    print(f"Proxying to: {OLLAMA_UPSTREAM}")
    print(f"Using OpenLLMetry for automatic LLM tracing")
    print("=" * 70 + "\n")

    app.run(host='0.0.0.0', port=port, debug=False)

# Made with Bob

Architecting for Flexibility: Project Structure and Multi-Platform Support

The project is engineered with a modular structure to support various deployment strategies, ensuring that observability remains “pluggable” regardless of the environment. Because my personal development workflow relies on Podman and Minikube, I collaborated with Bob to design several implementation types.

Bob helped architect a structure that separates the pure business logic of the application from the tracing infrastructure. This resulted in a comprehensive setup where OpenLLMetry operates as a transparent proxy, intercepting traffic between the application and the LLM engine. Whether deploying via Docker Compose for quick local testing or using Kubernetes (Minikube) for a production-grade simulation, the sidecar pattern remains consistent.

Project Overview

ollama-simple-app/: The core application built by Bob, containing pure business logic with zero tracing code or OpenTelemetry dependencies.
traceloop-sidecar/: The independent tracing proxy that provides the "sidecar" functionality.
k8s/ & docker-compose/: Deployment manifests specifically tailored for different container engines, including specialized support for Podman users.
Utility Scripts: A suite of automated tools (like deploy-podman.sh) to streamline building, loading, and deploying images across these diverse environments. By leveraging independent container images for both the application and the sidecar, we can ensure that the tracing layer can be updated, scaled, or swapped out without ever needing to modify the ‘original’ application code.

.
├── ollama-simple-app/          # Application WITHOUT tracing code
│   ├── app.py                  # Pure Flask app using Ollama
│   ├── requirements.txt        # No OpenTelemetry dependencies!
│   └── Dockerfile
├── traceloop-sidecar/          # Independent tracing sidecar
│   ├── proxy.py                # Transparent tracing proxy
│   ├── requirements.txt        # OpenTelemetry dependencies here
│   └── Dockerfile
├── ollama-app/                 # (Optional) App with built-in tracing
│   └── ...                     # For comparison purposes
├── collector/                  # OpenTelemetry Collector
│   ├── otel-collector-config.yaml
│   └── Dockerfile
├── k8s/                        # Kubernetes manifests
│   ├── 00-namespace.yaml
│   ├── 01-otel-collector.yaml
│   ├── 02-ollama.yaml
│   ├── 04-jaeger.yaml
│   └── 05-ollama-simple-app.yaml  # Sidecar deployment
├── docker-compose/
│   └── docker-compose.yaml
├── start-all.sh                # Utility: Start all services
├── stop-all.sh                 # Utility: Stop all services
├── push-to-github.sh           # Utility: Push to GitHub
└── README.md

The application could be deployed and tested using Docker/Podman, Minikube and adaptable to other Kuberntes flavors! Several scripts are provided to start/stop the applications, trace logs in case of errors. A thorough Podman dcumentation was generated according to my request from Bob.😉

Last but not least, as bonus, a sample application in Python to test OpenLLMetry directly is provided as well 👨‍💻, with the objective to provide pros and cons of such implementation!

#!/usr/bin/env python3
"""
Sample Ollama Application with OpenLLMetry Tracing
This application demonstrates how to use Ollama with OpenTelemetry tracing
"""
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from traceloop.sdk import Traceloop
import ollama

# Initialize OpenTelemetry with Traceloop
def init_tracing():
    """Initialize OpenTelemetry tracing with OTLP exporter"""

    # Get configuration from environment variables
    otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
    service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app")

    print(f"Initializing tracing with endpoint: {otlp_endpoint}")
    print(f"Service name: {service_name}")

    # Initialize Traceloop SDK
    Traceloop.init(
        app_name=service_name,
        disable_batch=False,
        exporter_otlp_endpoint=otlp_endpoint
    )

    print("Tracing initialized successfully")

def chat_with_ollama(model: str, prompt: str) -> str:
    """
    Send a prompt to Ollama and get a response

    Args:
        model: The model to use (e.g., 'granite3:latest')
        prompt: The prompt to send to the model

    Returns:
        The model's response
    """
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("ollama_chat") as span:
        span.set_attribute("llm.model", model)
        span.set_attribute("llm.prompt", prompt)

        try:
            # Get Ollama host from environment
            ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434")

            print(f"\nSending prompt to Ollama ({model})...")
            print(f"Prompt: {prompt}")

            # Create Ollama client
            client = ollama.Client(host=ollama_host)

            # Send the prompt
            response = client.chat(
                model=model,
                messages=[
                    {
                        'role': 'user',
                        'content': prompt,
                    },
                ],
            )

            response_text = response['message']['content']

            span.set_attribute("llm.response", response_text)
            span.set_attribute("llm.response_length", len(response_text))

            print(f"Response: {response_text}\n")

            return response_text

        except Exception as e:
            span.set_attribute("error", True)
            span.set_attribute("error.message", str(e))
            print(f"Error: {e}")
            raise

def main():
    """Main application loop"""
    print("=" * 60)
    print("Ollama Application with OpenLLMetry Tracing")
    print("=" * 60)

    # Initialize tracing
    init_tracing()

    # Get model from environment
    model = os.getenv("OLLAMA_MODEL", "granite3:latest")

    # Sample prompts to demonstrate tracing
    prompts = [
        "What is OpenTelemetry?",
        "Explain distributed tracing in one sentence.",
        "What are the benefits of observability?",
    ]

    print(f"\nUsing model: {model}")
    print(f"Running {len(prompts)} sample queries...\n")

    # Run sample queries
    for i, prompt in enumerate(prompts, 1):
        print(f"Query {i}/{len(prompts)}")
        try:
            chat_with_ollama(model, prompt)
            time.sleep(2)  # Small delay between requests
        except Exception as e:
            print(f"Failed to process query: {e}")

    print("\n" + "=" * 60)
    print("All queries completed. Check your tracing backend for traces!")
    print("=" * 60)

    # Keep the application running to allow traces to be exported
    print("\nKeeping application alive for trace export...")
    time.sleep(10)

if __name__ == "__main__":
    main()

# Made with Bob

Comparison: Sidecar vs Built-in Tracing

| Aspect           | Sidecar (This Project) | Built-in Tracing     |
| ---------------- | ---------------------- | -------------------- |
| Code changes     | ❌ None                 | ✅ Required           |
| Dependencies     | ❌ None in app          | ✅ OpenTelemetry libs |
| Language support | ✅ Any                  | ⚠️ Language-specific  |
| Maintenance      | ✅ Centralized          | ⚠️ Per application    |
| Performance      | ⚠️ Extra hop            | ✅ Direct             |
| Flexibility      | ⚠️ HTTP only            | ✅ Any protocol       |

Final Thoughts: Seamless Deployment Across the Ecosystem

To wrap up, this implementation demonstrates that advanced LLM observability doesn’t require complex code changes. By leveraging the sidecar pattern, the application is ready for immediate deployment and testing across a wide range of environments. Whether you are using Docker or Podman for local development, or orchestrating via Minikube, this setup is designed to be effortlessly adaptable to any Kubernetes flavor. Thanks to Bob’s help in structuring the project, you can now plug high-fidelity tracing into your AI workflows with a single command, regardless of your infrastructure.

Thanks for reading 🍻

DEV Community