DEV Community

Alain Airom (Ayrom)
Alain Airom (Ayrom)

Posted on

Decouple Your Telemetry: Implementing OpenTelemetry Traces via Declarative Config

Step-by-Step: Implementing OpenTelemetry Declarative Configuration with Traces

Introduction

It was quite some time ago that I first stumbled upon the article “How to Write Your First OpenTelemetry Declarative Config File with Trace” (by Nawaz Dhandala). As is so often the case with compelling technical readouts, it immediately went onto my “to-read” stack. Recently, I finally dug into it — with a little help from Bob, naturally.

Historically, keeping all three observability signals (traces, metrics, and logs) configured in harmony has felt like a high-wire juggling act involving a messy mix of environment variables, programmatic SDK initialization boilerplate, and complex collector configurations. The OpenTelemetry declarative configuration format completely changes the game. By allowing you to define trace, metric, and log pipelines within a single YAML file that the SDK parses at startup, it drastically streamlines how we approach system visibility.

Excerpt of the original article;
Getting all three observability signals (traces, metrics, and logs) configured in a single place has historically been a juggling act of environment variables, programmatic SDK initialization code, and collector configs. The OpenTelemetry declarative configuration format changes that by letting you define trace, metric, and log pipelines in one YAML file that the SDK reads at startup.

Intrigued by the article’s premise, I tasked Bob with taking the concept from theory to reality by building a complete, end-to-end working application. What follows is a practical breakdown of that implementation, demonstrating exactly how this declarative approach functions under the hood.


Implementation-System Architecture & Component Mapping
To showcase the power of OpenTelemetry’s programmatic-free initialization, Bob and I designed a lightweight microservice (order-service) that ships its telemetry signals without relying on complex SDK setup code inside the application logic.

Here is how the components interact in our target architecture:

+--------------------------------------------------------------+
|                        Local Machine                         |
|                                                              |
|   +-----------------------+      +-----------------------+   |
|   |   order-service       |      |   otel-collector      |   |
|   |   (Python/FastAPI)    |      |   (OTel Collector)    |   |
|   |                       |      |                       |   |
|   |   +---------------+   |      |   +---------------+   |   |
|   |   | otel-config   |   |      |   | otel-collector|   |   |
|   |   | .yaml         |   |      |   | -config.yaml  |   |   |
|   |   +-------+-------+   |      |   +-------+-------+   |   |
|   |           |           |      |           |           |   |
|   |           v           |      |           |           |   |
|   |   [OTel SDK Engine]   |      |           |           |   |
|   |           |           |      |           |           |   |
|   +-----------|-----------+      +-----------|-----------+   |
|               | (OTLP/gRPC)                  |               |
|               v                              |               |
|       +-------+-------+                      |               |
|       | otel-collector| <--------------------+               |
|       | :4317         |                                      |
|       +-------+-------+                                      |
|               |                                              |
|               +---------------+---------------+              |
|               | (OTLP)        | (OTLP)        | (Prometheus) |
|               v               v               v              |
|         +-----------+   +-----------+   +-----------+        |
|         |  jaeger   |   |  loki     |   |prometheus |        |
|         |  :4317    |   |  :3100    |   |  :9090    |        |
|         +-----------+   +-----------+   +-----+-----+        |
|                                               |              |
|                                               v              |
|                                         +-----------+        |
|                                         |  grafana  |        |
|                                         |  :3000    |        |
|                                         +-----------+        |
+--------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Practically the architecture and application structure are;


otel-declarative/
├── app/
│   └── main.py                      # Flask application with OTel instrumentation
├── config/
│   ├── otel-config.yaml             # OpenTelemetry declarative configuration
│   ├── otel-collector-config.yaml   # Collector configuration
│   ├── prometheus.yml               # Prometheus configuration
│   └── grafana-datasources.yml      # Grafana datasources
├── dashboard/
│   ├── index.html                   # Dashboard HTML
│   ├── styles.css                   # Dashboard styles
│   └── dashboard.js                 # Dashboard JavaScript
├── scripts/
│   ├── setup.sh                     # Setup script
│   ├── start-backend.sh             # Start backend services
│   ├── run-app.sh                   # Run application
│   ├── stop-app.sh                  # Stop application
│   ├── stop-backend.sh              # Stop backend services
│   └── open-dashboard.sh            # Open dashboard
├── Docs/
│   ├── Architecture.md              # Detailed architecture documentation
│   ├── Configuration.md             # Configuration reference guide
│   ├── QuickStart.md                # Quick start guide
│   ├── Podman-Support.md            # Podman setup and support
│   ├── Verification-Guide.md        # Implementation verification guide
│   ├── Jaeger-Guide.md              # Complete Jaeger tracing guide
│   ├── Prometheus-Guide.md          # Comprehensive Prometheus metrics guide
│   └── Grafana-Guide.md             # Complete Grafana visualization guide
├── Blog/
│   ├── building-observability-stack-with-otel.md  # Comprehensive blog post
│   ├── README.md                    # Blog post overview
│   └── images/                      # Supporting images
├── output/
│   └── 2026-05-31_OTEL_DEMO_COMPLETION_SUMMARY.md  # Project summary
├── docker-compose.yml               # Docker Compose configuration
├── requirements.txt                 # Python dependencies
├── .gitignore                       # Git ignore rules
└── README.md                        # This file
Enter fullscreen mode Exit fullscreen mode

Component Breakdown

  • order-service: A Python-based FastAPI mock server exposing endpoints like user and order data processing, errors, and performance anomalies.
  • OpenTelemetry SDK (In-App): Reads the single otel-config.yaml file upon startup to define and handle telemetry pipelines dynamically.
  • OpenTelemetry Collector: Receives the standardized OTLP signals over gRPC on port 4317, sorting and routing them to the specialized backends.
  • Backend Observability Engines: Jaeger captures distributed trace contexts, Loki aggregates system and app logs, and Prometheus scrapes metric counters, all unified visually under Grafana dashboards.

Setting Up the Declarative Configuration (otel-config.yaml)

This is the file that replaces lines of boilerplate setup logic. By feeding this schema straight to the OpenTelemetry SDK environment, we establish a clean separation of concerns.

# OpenTelemetry Declarative Configuration
# File format version 0.3
# This configuration file defines the complete OpenTelemetry setup for the application

file_format: "0.3"

# Disabled flag - set to true to disable the SDK entirely
disabled: false

# Resource attributes shared by all signals (traces, metrics, logs)
# These attributes identify the service and its environment
resource:
  attributes:
    - name: service.name
      value: "order-service"
    - name: service.version
      value: "1.0.0"
    - name: deployment.environment
      value: "production"
    - name: service.namespace
      value: "ecommerce"
    - name: service.instance.id
      value: "${HOSTNAME}"
    - name: host.name
      value: "${HOSTNAME}"

# Attribute limits to protect against runaway instrumentation
attribute_limits:
  attribute_value_length_limit: 4096
  attribute_count_limit: 128

# Trace pipeline configuration
tracer_provider:
  # Span processors define how spans are processed before export
  processors:
    # Batch processor groups spans before export for efficiency
    - batch:
        schedule_delay: 5000        # milliseconds between exports
        max_queue_size: 2048        # max spans queued in memory
        max_export_batch_size: 512  # max spans per export call
        export_timeout: 30000       # export timeout in ms

        # OTLP exporter sends spans to the OpenTelemetry Collector
        exporter:
          otlp:
            protocol: grpc
            endpoint: "http://otel-collector:4317"
            timeout: 10000           # export timeout in ms
            compression: gzip
            headers: {}

  # Sampler controls what percentage of traces are recorded
  # This helps manage data volume in high-traffic scenarios
  sampler:
    parent_based:
      root:
        trace_id_ratio_based:
          ratio: 0.25  # sample 25% of root spans

  # Limits to protect against runaway instrumentation
  limits:
    attribute_count_limit: 128
    attribute_value_length_limit: 4096
    event_count_limit: 128
    link_count_limit: 128
    event_attribute_count_limit: 128
    link_attribute_count_limit: 128

# Metric pipeline configuration
meter_provider:
  # Metric readers define how metrics are collected and exported
  readers:
    # Periodic reader exports metrics at regular intervals
    - periodic:
        interval: 30000  # export every 30 seconds
        timeout: 5000    # export timeout in ms

        # OTLP exporter sends metrics to the OpenTelemetry Collector
        exporter:
          otlp:
            protocol: grpc
            endpoint: "http://otel-collector:4317"
            timeout: 10000
            compression: gzip
            headers: {}
            temporality_preference: cumulative

  # Views allow customization of metric aggregation
  views:
    - selector:
        instrument_name: "*"
        instrument_type: histogram
      stream:
        aggregation:
          explicit_bucket_histogram:
            boundaries: [0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000]

# Logger pipeline configuration
logger_provider:
  # Log record processors define how logs are processed before export
  processors:
    # Batch processor groups log records before export
    - batch:
        schedule_delay: 5000
        max_queue_size: 2048
        max_export_batch_size: 512
        export_timeout: 30000

        # OTLP exporter sends logs to the OpenTelemetry Collector
        exporter:
          otlp:
            protocol: grpc
            endpoint: "http://otel-collector:4317"
            timeout: 10000
            compression: gzip
            headers: {}

  # Limits for log records
  limits:
    attribute_count_limit: 128
    attribute_value_length_limit: 4096

# Context propagation configuration
# Defines how trace context is propagated across service boundaries
propagator:
  composite:
    - tracecontext: {}    # W3C Trace Context
    - baggage: {}         # W3C Baggage
    - b3: {}              # Zipkin B3 (for compatibility)

# Made with Bob
Enter fullscreen mode Exit fullscreen mode

Key Takeaway: Notice that we aren’t writing any explicit backend exporters for Prometheus or Jaeger here. The application only knows how to talk to a local OTel Collector over a single OTLP/gRPC endpoint (http://localhost:4317).

Core Application Implementation (main.py)

With our configuration defined in YAML, the application script remains delightfully free of SDK initialization setups. We simply write idiomatic Python code using FastAPI, relying on automatic instrumentation or basic OpenTelemetry API access hooks for custom measurements.

"""
OpenTelemetry Declarative Config Demo Application
A Flask-based REST API demonstrating OpenTelemetry instrumentation
using declarative configuration.
"""

import os
import time
import random
from flask import Flask, jsonify, request
from flask_cors import CORS
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.flask import FlaskInstrumentor

# Initialize Flask application
app = Flask(__name__)

# Enable CORS for all routes
CORS(app, resources={r"/*": {"origins": "*"}})

# Get tracer and meter for manual instrumentation
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

# Create custom metrics
request_counter = meter.create_counter(
    name="http_requests_total",
    description="Total number of HTTP requests",
    unit="1"
)

request_duration = meter.create_histogram(
    name="http_request_duration_seconds",
    description="HTTP request duration in seconds",
    unit="s"
)

order_counter = meter.create_counter(
    name="orders_created_total",
    description="Total number of orders created",
    unit="1"
)

error_counter = meter.create_counter(
    name="errors_total",
    description="Total number of errors",
    unit="1"
)

# Simulate in-memory data store
users_db = [
    {"id": 1, "name": "Alice Johnson", "email": "alice@example.com"},
    {"id": 2, "name": "Bob Smith", "email": "bob@example.com"},
    {"id": 3, "name": "Charlie Brown", "email": "charlie@example.com"}
]

orders_db = []
order_id_counter = 1


@app.before_request
def before_request():
    """Record request start time for duration calculation"""
    request.start_time = time.time()


@app.after_request
def after_request(response):
    """Record metrics after each request"""
    if hasattr(request, 'start_time'):
        duration = time.time() - request.start_time

        # Record request counter
        request_counter.add(
            1,
            {
                "method": request.method,
                "endpoint": request.endpoint or "unknown",
                "status": response.status_code
            }
        )

        # Record request duration
        request_duration.record(
            duration,
            {
                "method": request.method,
                "endpoint": request.endpoint or "unknown",
                "status": response.status_code
            }
        )

    return response


@app.route('/')
def home():
    """Home endpoint with basic service information"""
    with tracer.start_as_current_span("home_handler") as span:
        span.set_attribute("http.route", "/")

        info = {
            "service": "order-service",
            "version": "1.0.0",
            "status": "healthy",
            "endpoints": [
                "/",
                "/api/users",
                "/api/users/<id>",
                "/api/orders",
                "/api/orders/<id>",
                "/api/error",
                "/api/slow",
                "/metrics"
            ]
        }

        return jsonify(info)


@app.route('/api/users', methods=['GET'])
def get_users():
    """Get all users"""
    with tracer.start_as_current_span("get_users") as span:
        span.set_attribute("http.route", "/api/users")
        span.set_attribute("user.count", len(users_db))

        # Simulate database query delay
        time.sleep(random.uniform(0.01, 0.05))

        return jsonify({"users": users_db, "count": len(users_db)})


@app.route('/api/users/<int:user_id>', methods=['GET'])
def get_user(user_id):
    """Get a specific user by ID"""
    with tracer.start_as_current_span("get_user") as span:
        span.set_attribute("http.route", "/api/users/<id>")
        span.set_attribute("user.id", user_id)

        # Simulate database query delay
        time.sleep(random.uniform(0.01, 0.03))

        user = next((u for u in users_db if u["id"] == user_id), None)

        if user:
            span.set_attribute("user.found", True)
            return jsonify(user)
        else:
            span.set_attribute("user.found", False)
            span.add_event("User not found", {"user.id": user_id})
            return jsonify({"error": "User not found"}), 404


@app.route('/api/orders', methods=['GET', 'POST'])
def orders():
    """Get all orders or create a new order"""
    if request.method == 'GET':
        with tracer.start_as_current_span("get_orders") as span:
            span.set_attribute("http.route", "/api/orders")
            span.set_attribute("order.count", len(orders_db))

            # Simulate database query delay
            time.sleep(random.uniform(0.02, 0.08))

            return jsonify({"orders": orders_db, "count": len(orders_db)})

    else:  # POST
        with tracer.start_as_current_span("create_order") as span:
            global order_id_counter

            span.set_attribute("http.route", "/api/orders")

            data = request.get_json()

            # Validate request
            if not data or 'user_id' not in data or 'items' not in data:
                span.set_attribute("order.validation", "failed")
                error_counter.add(1, {"type": "validation_error"})
                return jsonify({"error": "Invalid request"}), 400

            # Simulate user lookup
            with tracer.start_as_current_span("lookup_user") as user_span:
                user_span.set_attribute("user.id", data['user_id'])
                time.sleep(random.uniform(0.01, 0.03))

                user = next((u for u in users_db if u["id"] == data['user_id']), None)
                if not user:
                    user_span.set_attribute("user.found", False)
                    error_counter.add(1, {"type": "user_not_found"})
                    return jsonify({"error": "User not found"}), 404

                user_span.set_attribute("user.found", True)

            # Calculate total
            with tracer.start_as_current_span("calculate_total") as calc_span:
                total = sum(item.get('price', 0) * item.get('quantity', 1) 
                           for item in data['items'])
                calc_span.set_attribute("order.total", total)
                calc_span.set_attribute("order.item_count", len(data['items']))

            # Create order
            order = {
                "id": order_id_counter,
                "user_id": data['user_id'],
                "user_name": user['name'],
                "items": data['items'],
                "total": total,
                "status": "pending",
                "created_at": time.time()
            }

            orders_db.append(order)
            order_id_counter += 1

            # Record metrics
            order_counter.add(1, {"user_id": str(data['user_id'])})

            span.set_attribute("order.id", order['id'])
            span.set_attribute("order.total", total)
            span.add_event("Order created", {
                "order.id": order['id'],
                "user.id": data['user_id']
            })

            # Simulate order processing delay
            time.sleep(random.uniform(0.05, 0.15))

            return jsonify(order), 201


@app.route('/api/orders/<int:order_id>', methods=['GET'])
def get_order(order_id):
    """Get a specific order by ID"""
    with tracer.start_as_current_span("get_order") as span:
        span.set_attribute("http.route", "/api/orders/<id>")
        span.set_attribute("order.id", order_id)

        # Simulate database query delay
        time.sleep(random.uniform(0.02, 0.05))

        order = next((o for o in orders_db if o["id"] == order_id), None)

        if order:
            span.set_attribute("order.found", True)
            return jsonify(order)
        else:
            span.set_attribute("order.found", False)
            span.add_event("Order not found", {"order.id": order_id})
            return jsonify({"error": "Order not found"}), 404


@app.route('/api/error')
def simulate_error():
    """Endpoint to simulate errors for testing"""
    with tracer.start_as_current_span("simulate_error") as span:
        span.set_attribute("http.route", "/api/error")

        error_type = random.choice(['validation', 'database', 'timeout', 'internal'])
        span.set_attribute("error.type", error_type)

        error_counter.add(1, {"type": error_type})

        if error_type == 'validation':
            span.add_event("Validation error occurred")
            return jsonify({"error": "Validation failed"}), 400
        elif error_type == 'database':
            span.add_event("Database error occurred")
            return jsonify({"error": "Database connection failed"}), 500
        elif error_type == 'timeout':
            span.add_event("Timeout error occurred")
            time.sleep(2)
            return jsonify({"error": "Request timeout"}), 504
        else:
            span.add_event("Internal error occurred")
            span.record_exception(Exception("Simulated internal error"))
            return jsonify({"error": "Internal server error"}), 500


@app.route('/api/slow')
def slow_endpoint():
    """Endpoint with intentional delay for testing"""
    with tracer.start_as_current_span("slow_endpoint") as span:
        span.set_attribute("http.route", "/api/slow")

        delay = random.uniform(1.0, 3.0)
        span.set_attribute("delay.seconds", delay)

        time.sleep(delay)

        return jsonify({
            "message": "Slow operation completed",
            "delay_seconds": delay
        })


@app.route('/metrics')
def metrics_endpoint():
    """Simple metrics endpoint showing current stats"""
    with tracer.start_as_current_span("metrics_endpoint") as span:
        span.set_attribute("http.route", "/metrics")

        stats = {
            "users_count": len(users_db),
            "orders_count": len(orders_db),
            "total_order_value": sum(o.get('total', 0) for o in orders_db)
        }

        return jsonify(stats)


@app.errorhandler(404)
def not_found(error):
    """Handle 404 errors"""
    error_counter.add(1, {"type": "not_found"})
    return jsonify({"error": "Not found"}), 404


@app.errorhandler(500)
def internal_error(error):
    """Handle 500 errors"""
    error_counter.add(1, {"type": "internal_error"})
    return jsonify({"error": "Internal server error"}), 500


if __name__ == '__main__':
    # Instrument Flask application
    FlaskInstrumentor().instrument_app(app)

    # Run the application
    port = int(os.environ.get('PORT', 8080))
    app.run(host='0.0.0.0', port=port, debug=False)

# Made with Bob
Enter fullscreen mode Exit fullscreen mode

Bootstrapping the Run Environment

To make the OpenTelemetry engine pick up our declarative schema file rather than demanding configuration from standard code hooks, we need to pass standard environment variables during runtime.

  • Running the Application Local Instance: you must point the OTEL_EXPERIMENTAL_CONFIG_FILE variable directly to our YAML configuration asset:
export OTEL_EXPERIMENTAL_CONFIG_FILE="otel-config.yaml"
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Enter fullscreen mode Exit fullscreen mode
  • Running with Container Isolation (Podman/Docker): if you prefer building and executing in rootless container environments like Podman, ensure your compose layout securely maps your configuration assets:
podman-compose.yaml fragment
services:
  order-service:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OTEL_EXPERIMENTAL_CONFIG_FILE=/app/otel-config.yaml
    volumes:
      - ./otel-config.yaml:/app/otel-config.yaml:ro
    depends_on:
      - otel-collector
Enter fullscreen mode Exit fullscreen mode

Correlating Telemetry Signals in Visual Dashboards

Once your application endpoints are queried, individual traces are systematically bound to specific metric loops and custom logged fields inside your target visualization layer.

  • Jaeger Traces View: Navigating to http://localhost:16686 allows you to pick order-service and track the nested hierarchy of process_orders_request down to sub-child executions like db_query_orders.

  • Prometheus Metrics Scrape: Queries against your exposed internal counters (orders_processed_total) can be evaluated via custom PromQL expressions.
  • Unified Grafana Panel: By defining standard Loki and Prometheus datasources, you can tie your error logs instantly to the spikes observed inside system trace identifiers, solving performance issues without combing through disjointed system data dumps.

Data Flow

The data flow begins within the application layer, where incoming HTTP traffic hits the Python service’s API routes. Instead of relying on manual code setups, the internal OpenTelemetry SDK bootstraps itself instantly by parsing the declarative otel-config.yaml file at launch, establishing trace, metric, and log providers. As runtime events execute, these providers automatically package structural instrumentation details into standard OpenTelemetry Protocol (OTLP) signal streams. The application pushes these telemetry batches over high-throughput OTLP/gRPC channels to an independent OpenTelemetry Collector instance listening on port 4317. Once received, the Collector processes, filters, and distributes the data to its respective domain backends: distributed trace contexts land in Jaeger, scraped metric dimensions map into Prometheus, and application execution logs route directly into Loki—all eventually correlated and visualized under unified Grafana dashboards.


Deployment Flow

The infrastructure is deployed using a local container network managed via Docker Compose or rootless Podman environments. Initialization begins by firing a setup script that sets up a clean Python virtual environment, installs runtime dependencies, and checks container socket availability. Next, a background script spins up the core observability stack inside an isolated virtual network, provisioning individual container images for the OTel Collector, Jaeger, Prometheus, Loki, and Grafana. When the core application container or local process initializes, the host orchestrator maps the local otel-config.yaml file into the runtime environment as a read-only volume, passing its path through the OTEL_EXPERIMENTAL_CONFIG_FILE variable. This decoupled deployment topology ensures that modifying telemetry processing rules, adjusting metric sampling frequencies, or switching destination backends can be handled entirely within configuration files without forcing a rebuild or modification of the application code.

services:
  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.91.0
    container_name: otel-collector
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./config/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver
      - "8888:8888"   # Prometheus metrics exposed by the collector
      - "8889:8889"   # Prometheus exporter metrics
      - "13133:13133" # health_check extension
    networks:
      - otel-network
    depends_on:
      - jaeger
      - prometheus

  # Jaeger - Distributed Tracing
  jaeger:
    image: jaegertracing/all-in-one:1.52
    container_name: jaeger
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - "16686:16686" # Jaeger UI
      - "14250:14250" # gRPC
    networks:
      - otel-network

  # Prometheus - Metrics Storage
  prometheus:
    image: prom/prometheus:v2.48.0
    container_name: prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - otel-network

  # Loki - Log Aggregation
  loki:
    image: grafana/loki:2.9.3
    container_name: loki
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - otel-network

  # Grafana - Visualization
  grafana:
    image: grafana/grafana:10.2.2
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - ./config/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
      - grafana-data:/var/lib/grafana
    ports:
      - "3001:3000"
    networks:
      - otel-network
    depends_on:
      - prometheus
      - loki
      - jaeger

networks:
  otel-network:
    driver: bridge

volumes:
  prometheus-data:
  grafana-data:

# Made with Bob
Enter fullscreen mode Exit fullscreen mode

Real-Time Visualization: The Frontend Metrics Dashboard

While backend telemetry storage layers like Jaeger, Prometheus, and Grafana are vital for deep-dive diagnostics, Bob and I wanted immediate feedback during local development. To achieve this, the project features a lightweight, real-time frontend dashboard (index.html and dashboard.js) powered by Chart.js.

This dashboard sits directly on top of our instrumented application, letting us execute actions with a single click and watch our OpenTelemetry metric pipelines react instantly.

Dashboard Architecture & Flow

 +------------------+     Click Action     +-----------------------+
 |  HTML UI Button  | -------------------> |    Flask App Route    |
 |  (index.html)    |                      |       (main.py)       |
 +------------------+                      +-----------------------+
          ^                                            |
          | Polls Metrics (/metrics)                   | Emits OTLP
          | Every 5 Seconds                            v
 +------------------+                      +-----------------------+
 |   dashboard.js   |                      |  OpenTelemetry Engine |
 |  (Chart.js UI)   |                      |  (Reads otel-config)  |
 +------------------+                      +-----------------------+
Enter fullscreen mode Exit fullscreen mode

Frontend Implementation (index.html)

The user interface splits your workspace into active experiment buttons and live-updating operational cards. It includes direct links to your backend triage tooling for an integrated debugging loop.

<div class="container">
    <header>
        <h1>🔭 OpenTelemetry Declarative Config Demo</h1>
        <p class="subtitle">Real-time Metrics Dashboard</p>
    </header>

    <div class="metrics-grid">
        <div class="card">
            <h2>📈 Request Traffic Rate</h2>
            <canvas id="requestRateChart"></canvas>
        </div>
        <div class="card">
            <h2>⏱️ Mean Response Latency</h2>
            <canvas id="responseTimeChart"></canvas>
        </div>
    </div>

    <div class="actions-section">
        <h2>🎯 Quick Actions</h2>
        <div class="button-group">
            <button onclick="testEndpoint('/api/users')" class="btn btn-primary">Test Users API</button>
            <button onclick="testEndpoint('/api/orders')" class="btn btn-primary">Test Orders API</button>
            <button onclick="createTestOrder()\" class="btn btn-success">Create Test Order</button>
            <button onclick="testEndpoint('/api/slow')" class="btn btn-warning">Test Slow Endpoint</button>
            <button onclick="testEndpoint('/api/error')" class="btn btn-danger">Trigger Error</button>
        </div>
    </div>

    <div class="links-section">
        <h2>🔗 Production Observability Tools</h2>
        <div class="button-group">
            <a href="http://localhost:16686" target="_blank" class="btn btn-link">Jaeger UI (Traces)</a>
            <a href="http://localhost:9090" target="_blank" class="btn btn-link">Prometheus (Metrics)</a>
            <a href="http://localhost:3001" target="_blank" class="btn btn-link">Grafana (Dashboards)</a>
        </div>
    </div>
</div>
Enter fullscreen mode Exit fullscreen mode

State Coordination Engine (dashboard.js)

To capture and render data points without overcomplicating the setup, dashboard.js continually polls the endpoint database arrays, feeding structural properties straight into the Chart.js line matrices.

// dashboard.js excerpt
const API_BASE_URL = 'http://localhost:8080';
const UPDATE_INTERVAL = 5000; // Poll metrics every 5 seconds

let state = {
    requestCount: 0,
    errorCount: 0,
    requestHistory: [],
    responseTimeHistory: []
};

// Periodic background metric gathering loop
async function updateDashboard() {
    try {
        const response = await fetch(`${API_BASE_URL}/metrics`);
        if (!response.ok) throw new Error('Metrics unreachable');

        const stats = await response.json();

        // Update UI info cards dynamically
        document.getElementById('service-status').innerText = "Healthy";
        document.getElementById('service-status').className = "status-value success";

        // Push fresh structural values to Chart data lists
        const now = new Date().toLocaleTimeString();
        updateChartData(requestRateChart, now, stats.orders_count);

    } catch (error) {
        document.getElementById('service-status').innerText = "Unreachable";
        document.getElementById('service-status').className = "status-value danger";
    }
}

function startAutoUpdate() {
    setInterval(updateDashboard, UPDATE_INTERVAL);
}
document.addEventListener('DOMContentLoaded', () => {
    initializeCharts();
    startAutoUpdate();
});
Enter fullscreen mode Exit fullscreen mode

Interacting with the Loop

When you run this architecture locally and load the web dashboard, you get an immediate visual playing field for testing the telemetry pipelines:

  • Simulating Traffic Spikes: Clicking “Create Test Order” repeatedly fires backend hooks. The OpenTelemetry metric provider logs the counter adjustments, which are picked up in seconds by the polling graph.
  • Diagnosing Anomalies: Clicking “Test Slow Endpoint” forces a mock processing lag. You will see an immediate, distinct jump on the Mean Response Latency graph.
  • Correlating Errors: Clicking “Trigger Error” logs a 500 status block. This updates the status panels on your dashboard and lets you jump over to Jaeger or Grafana via the quick links to locate that specific trace context and identify exactly which function failed.

Bob can Provide more than code

Beyond handling complex architectures, configuration schemas, and frontend wiring, Bob proved to be just as capable when it came to content creation — generating a comprehensive, production-ready blog post draft (building-observability-stack-with-otel.md) alongside the code. He effortlessly translated the technical implementation details into a structured, deep-dive narrative, mapping out everything from the core philosophy of programmatic-free initialization to granular troubleshooting tips and telemetry validation steps. Whether you need a rapid prototype or the documentation to explain it to the community, Bob bridges the gap between engineering and technical writing seamlessly, shrinking a task that would normally take days into a matter of minutes.

Maybe the next time I’ll just copy/paste Bob’s content 😂


Conclusion

Ultimately, this practical implementation proves that the era of cluttering application source code with verbose, programmatic SDK initialization boilerplate is officially behind us. By offloading the configuration of traces, metrics, and logs into a single, structured otel-config.yaml file, we have achieved a clean separation of concerns where the application remains entirely focused on business logic, while the OpenTelemetry engine dynamically orchestrates the telemetry pipelines at startup. From mapping out a decoupled architecture to wiring up a real-time Chart.js interactive frontend, every layer of this project demonstrates how accessible enterprise-grade observability can be. With Bob seamlessly bridging the gap between rapid code prototyping and comprehensive technical documentation, this end-to-end demonstration stands as a definitive blueprint for modern, declarative, and developer-friendly system visibility.

Thanks Bob 🤗 and thanks for reading!

Links

Top comments (0)