DEV Community

Cover image for Observability Part 2 – Metrics & Dashboards with Prometheus and Grafana
AnkitDevCode
AnkitDevCode

Posted on

Observability Part 2 – Metrics & Dashboards with Prometheus and Grafana

Introduction

Quick recap of previous article:

Showed how to use OpenTelemetry & Jaeger (Docker Tutorial) to trace requests in a Spring Boot app.

Focused on tracing to understand where requests spend time.

Motivation for this article:

Tracing is powerful, but observability is not complete without metrics.

Need dashboards & alerts to monitor the health of applications.

Goal: By the end, readers will have a Spring Boot app instrumented with Prometheus + Grafana running alongside Jaeger.

Why Metrics, If We Already Have Traces?

Difference between metrics and traces:

  • Metrics = system health over time (e.g., request latency, error rate).
  • Traces = detailed view of a single request journey.

Show a real-world example:

In the Jaeger UI, we see a slow DB query in a trace. That’s useful, but what if we want to know how often it happens and whether it’s getting worse over time?
Metrics + dashboards provide that context.

Together, Jaeger (traces) + Prometheus/Grafana (metrics) give a holistic observability setup.


Setting Up Jaeger

Jaeger is an open-source distributed tracing system (originally built at Uber, now a CNCF project). Please check previous article OpenTelemetry & Jaeger

Setting Up Prometheus

Prometheus scrapes metrics from HTTP endpoints at regular intervals. You configure targets in prometheus.yml:

  • Define scrape intervals (typically 15-30 seconds)
  • Specify target endpoints where metrics are exposed
  • Configure alerting rules for threshold violations
  • Set up service discovery for dynamic environments
  • Applications expose metrics on /metrics endpoints using client libraries. Common metrics include request counts, response times, error rates, and custom business metrics.

Spring Boot exposes metrics out-of-the-box using Micrometer. By enabling the prometheus actuator endpoint, our app automatically provides a /actuator/prometheus endpoint. However, will use OpenTelemetry instead of Micrometer for exposing Prometheus metrics. OpenTelemetry provides a more unified approach to observability (metrics, traces, and logs) and is becoming the industry standard.

Prometheus Server Configuration (prometheus.yml)

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "spring-boot-app"
    static_configs:
      - targets: ["docker-demo:9464"]   
Enter fullscreen mode Exit fullscreen mode

You will see Prometheus in action at http://localhost:9090

- OTEL_EXPORTER_PROMETHEUS_PORT=9464
- OTEL_EXPORTER_PROMETHEUS_HOST=0.0.0.0
Enter fullscreen mode Exit fullscreen mode

then your Spring Boot app also exposes its own metrics on port 9464→ /metrics.

Setting Up Grafana

Grafana is our visualization layer. It doesn’t store data itself — instead, it connects to Prometheus and queries metrics.

  • Install Grafana server
  • Add Prometheus data source with connection URL
  • Import or create dashboards with panels showing metrics over time
  • Set up alerting based on query results
  • Configure user authentication and permissions

Steps:

  • Run Grafana with Docker.
  • Open http://localhost:3000
  • (default login: admin/admin).
  • Add Prometheus as a data source (http://prometheus:9090) (Because Grafana and Prometheus are on the same Docker network, we use the container name, not localhost).
  • Import a prebuilt dashboard or create your own panels for latency, throughput, and errors.

Docker Compose: All-in-One Setup

When used together, these tools provide comprehensive observability:
Metrics + Traces: Grafana dashboards show high-level trends, while Jaeger provides detailed trace analysis when issues occur.
Alerting Workflow: Prometheus alerts trigger when metrics exceed thresholds, teams can then use Jaeger to investigate specific problematic requests.
Root Cause Analysis: Start with Grafana dashboards to identify when problems occurred, use Prometheus queries to narrow down affected services, then examine detailed traces in Jaeger.

Here’s a simple stack with Sample Spring boot app + Prometheus + Grafana and Jaeger:

docker-compose.yml

version: '3.8'
services:
  # Jaeger - Tracing Backend
  jaeger:
    image: jaegertracing/all-in-one:1.51
    container_name: jaeger
    ports:
      - "16686:16686"    # Jaeger UI
      - "14250:14250"    # Jaeger gRPC
      - "4318:4318"      # OTLP HTTP
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    networks:
      - app-network

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    networks:
      - app-network

  grafana:
    image: grafana/grafana-oss:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - prometheus
    networks:
      - app-network

  # Your Spring Boot Application
  docker-demo:
    image: docker-demo:latest
    container_name: docker-demo
    ports:
      - "8080:8080"
      - "9464:9464"   # OTel Prometheus exporter
    environment:
      # OpenTelemetry Configuration
      - OTEL_SERVICE_NAME=docker-demo
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
      - OTEL_TRACES_EXPORTER=otlp
      - OTEL_METRICS_EXPORTER=prometheus
      - OTEL.EXPORTER.PROMETHEUS.PORT=9464
      - OTEL.EXPORTER.PROMETHEUS.HOST=0.0.0.0
      - OTEL_LOGS_EXPORTER=none
      - OTEL_TRACES_SAMPLER=always_on  # Sample ALL traces
      - OTEL_INSTRUMENTATION_COMMON_DEFAULT_ENABLED=true
      - OTEL_INSTRUMENTATION_HTTP_ENABLED=true
      - OTEL_INSTRUMENTATION_SPRING_WEB_ENABLED=true
      - OTEL_LOG_LEVEL=DEBUG
      # Application Configuration
      - JAVA_OPTS=-javaagent:/app/opentelemetry-javaagent.jar
    networks:
      - app-network
volumes:
  grafana-data:

networks:
  app-network:
    driver: bridge
Enter fullscreen mode Exit fullscreen mode

Start the stack

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Jaeger → http://localhost:3000

Grafana → http://localhost:3000

(login: admin / admin)

Prometheus → http://localhost:9090

Since you’re using the OTel Prometheus exporter, your app should expose metrics at:

http://localhost:9090/metrics

Prometheus has a built-in UI to show scrape status.
Go to:
http://localhost:9090/targets
You should see a list of jobs (e.g., spring-boot-app) with:
State = UP → Prometheus is successfully scraping your app.
Last Scrape / Last Scrape Duration → confirms timing.
Error message if scraping failed (bad host/port/path).

Find Available Metrics

Add Jaeger as a data source in Grafana:

Explore Traces in Grafana

  • Open Explore → Select Jaeger as data source
  • Query traces by service name (e.g., springboot-app)
  • You’ll see the same traces as in Jaeger UI, but inside Grafana.

Add Prometheus as a data source in Grafana:

Import a Prometheus dashboard from Grafana’s dashboard library (e.g., ID: 3662 for Prometheus 2.0 overview).

Now you’ve got a working Grafana + Prometheus local setup!


Create a Dashboard

There are Simple Two different ways to create this dashboard in Grafana:

Method 1: Import via JSON (Recommended)

  • Copy the JSON from the artifact above
  • Open Grafana (http://localhost:3000)
  • Go to Dashboards → Import
  • Paste the JSON in the "Import via panel json" text box
  • Click "Load"
  • Configure data source: Select your Prometheus data source
  • Click "Import"

Method 2: Manual Creation (Step-by-Step)

  • Create New Dashboard
  • Go to Dashboards → New Dashboard
  • Click "Add visualization"
  • Add panels accordingly

In this post, I’ll be using Method 1: Import via JSON (recommended) to set up the Grafana dashboard.

Generate some traffic:

bash

# Make requests to your app
for i in {1..100}; do
  curl http://localhost:8080/api/customers
done
Enter fullscreen mode Exit fullscreen mode

You will see the request count in the Prometheus dashboard by using the http_server_request_duration_seconds_countmetric. you can also search like-

  • http_server_request_duration_seconds_count → monotonically increasing counter (total requests since start).
  • rate(http_server_request_duration_seconds_count[1m]) → request rate per second (much better for dashboards).

  • This shows requests per second, grouped by http_route.

sum(rate(http_server_request_duration_seconds_count[5m])) by (http_route)
Enter fullscreen mode Exit fullscreen mode
  • Average Load Time (Latency)
rate(http_server_request_duration_seconds_sum[5m])
/
rate(http_server_request_duration_seconds_count[5m])
Enter fullscreen mode Exit fullscreen mode

Important: The dashboard uses common OpenTelemetry metric names. If your metrics have different names, update the queries:
Find Your Actual Metrics

Go to Prometheus (http://localhost:9090)
Run this query: {job="spring-boot-app"}
Note the actual metric names


Let’s create a simple Grafana dashboard JSON for your Spring Boot app that includes:

  • API Request Count (Throughput)
  • Average Latency (Load Time)
  • Error Rate
  • P95 Latency (optional but very useful)

Steps to use:

  • Copy the JSON into a file (e.g. otel-dashboard.json).
  • In Grafana → Dashboards → Import → Upload JSON file.
  • Select your Prometheus data source when asked.
{
  "__inputs": [
    {
      "name": "DS_PROMETHEUS",
      "label": "Prometheus",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
  ],
  "id": null,
  "title": "OpenTelemetry API Monitoring Dashboard",
  "tags": ["api", "opentelemetry", "prometheus"],
  "timezone": "browser",
  "schemaVersion": 36,
  "version": 1,
  "panels": [
    {
      "type": "timeseries",
      "title": "API Request Count",
      "targets": [
        {
          "expr": "sum(rate(http_server_request_duration_seconds_count[5m])) by (http_route)",
          "legendFormat": "{{http_route}}",
          "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }
        }
      ],
      "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
    },
    {
      "type": "timeseries",
      "title": "Average Response Time (Latency)",
      "targets": [
        {
          "expr": "rate(http_server_request_duration_seconds_sum[5m]) / rate(http_server_request_duration_seconds_count[5m])",
          "legendFormat": "avg_latency",
          "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "s",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "green", "value": null },
              { "color": "orange", "value": 1 },
              { "color": "red", "value": 2 }
            ]
          }
        }
      },
      "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
    },
    {
      "type": "timeseries",
      "title": "Error Rate",
      "targets": [
        {
          "expr": "sum(rate(http_server_request_duration_seconds_count{http_response_status_code!~\"2..\"}[5m])) / sum(rate(http_server_request_duration_seconds_count[5m]))",
          "legendFormat": "error_rate",
          "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percentunit",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "green", "value": null },
              { "color": "orange", "value": 0.05 },
              { "color": "red", "value": 0.1 }
            ]
          }
        }
      },
      "gridPos": { "x": 0, "y": 8, "w": 12, "h": 8 }
    },
    {
      "type": "timeseries",
      "title": "95th Percentile Latency (p95)",
      "targets": [
        {
          "expr": "histogram_quantile(0.95, sum(rate(http_server_request_duration_seconds_bucket[5m])) by (le))",
          "legendFormat": "p95_latency",
          "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }
        }
      ],
      "fieldConfig": {
        "defaults": { "unit": "s" }
      },
      "gridPos": { "x": 12, "y": 8, "w": 12, "h": 8 }
    }
  ],
  "templating": { "list": [] },
  "annotations": { "list": [] },
  "time": { "from": "now-30m", "to": "now" }
}
Enter fullscreen mode Exit fullscreen mode

🎉 Congratulations! You’ve successfully built an end-to-end monitoring workflow by integrating Spring Boot with Prometheus and Grafana. Next, Link Prometheus → Jaeger for more powerful trace links.
Tempo (Grafana’s tracing backend) natively integrates better with Prometheus. you can explore alerting with Prometheus Alertmanager and Grafana to get notified on errors, high latency, or unusual traffic patterns.

Cleanup

Stop All containers

docker stop $(docker ps -q)
Enter fullscreen mode Exit fullscreen mode

Remove all containers

docker rm $(docker ps -aq)
Enter fullscreen mode Exit fullscreen mode

To remove all the stopped containers

docker rm $(docker ps -q -f status=exited)
Enter fullscreen mode Exit fullscreen mode

References & Credits

AI tools were used to assist in research and writing but final content was reviewed and verified by the author.
Additional Sources:

Top comments (0)