Alexandr Bandurchin for Uptrace

Posted on Nov 10 • Originally published at uptrace.dev

OpenTelemetry Docker Monitoring with Collector and Docker Stats

#tooling #devops #docker #tutorial

Docker containers are lightweight and portable, but their dynamic nature makes monitoring challenging. Containers start, stop, and move between hosts constantly—you need visibility into their resource usage and performance.

This guide explains how to monitor Docker containers using the OpenTelemetry Collector with the Docker Stats receiver. It covers setting up the Collector (either directly or with Docker Compose), collecting container metrics, and exporting them to your monitoring backend.

Why Monitor Docker Containers?

Monitoring Docker containers is a key part of ensuring that containerized applications run reliably and perform efficiently. Here's why it matters:

Performance Optimization: Identify resource bottlenecks and optimize container performance before they impact users. By tracking CPU, memory, and I/O metrics, you can fine-tune resource allocation and application configurations.
Resource Management: Ensure efficient allocation of CPU, memory, and network resources across containers. Docker containers share the host system's resources, so monitoring helps prevent resource contention and ensures fair distribution.
Troubleshooting: Quickly diagnose issues by tracking container metrics and identifying anomalies. When a container crashes or performs poorly, historical metrics provide invaluable insights into what went wrong and why.
Cost Control: In cloud environments, efficient resource monitoring can lead to significant cost savings. By right-sizing containers and eliminating waste, you can optimize your cloud spending.
Security and Compliance: Monitor container behavior to detect unusual activity that might indicate security breaches or compliance violations.

What is OpenTelemetry Collector?

OpenTelemetry Collector acts as a central hub for receiving, processing, and exporting telemetry data to various backends or observability tools.

The Collector can receive telemetry data (traces, metrics, and logs) from multiple sources, including:

Applications instrumented with OpenTelemetry SDKs.
Other telemetry agents or collectors.
Legacy systems using protocols like Jaeger, Zipkin, Prometheus, etc.

Why Use OpenTelemetry Collector with Docker?

Docker provides an ideal environment for running the OpenTelemetry Collector, offering several key advantages:

Portability and Consistency

Running the OpenTelemetry Collector in Docker ensures it behaves identically across development, staging, and production environments. Whether you're on a local machine, cloud VM, or Kubernetes cluster, the containerized collector provides consistent telemetry collection.

Simplified Deployment

Docker eliminates dependency management complexities. All required libraries and configurations are packaged within the container image, making deployment as simple as pulling an image and starting a container.

Easy Scaling

Docker makes it straightforward to run multiple collector instances for high-throughput environments. With Docker Compose or orchestration tools like Kubernetes, you can scale collectors horizontally to handle increased telemetry loads.

Isolation and Resource Control

Containers provide process isolation and resource limits, preventing the collector from impacting host system performance. You can set CPU and memory limits to ensure predictable resource usage.

Version Management

Docker images are versioned, allowing you to roll back to previous collector versions if issues arise. This makes upgrades safer and more manageable.

OpenTelemetry Docker Stats Receiver

OpenTelemetry Docker Stats receiver allows you to collect container-level resource metrics from Docker. It retrieves metrics such as CPU usage, memory usage, network statistics, and disk I/O from Docker containers and exposes them as OpenTelemetry metrics.

The receiver communicates with the Docker daemon through its API, querying container statistics at regular intervals. This approach is non-intrusive and doesn't require any modifications to your container images or applications.

CPU Metrics

The Docker Stats receiver collects comprehensive CPU metrics for monitoring container processor usage:

Metric	Description
container.cpu.usage.system	System CPU usage, as reported by docker.
container.cpu.usage.total	Total CPU time consumed.
container.cpu.usage.kernelmode	Time spent by tasks of the cgroup in kernel mode (Linux).
container.cpu.usage.usermode	Time spent by tasks of the cgroup in user mode (Linux).
container.cpu.usage.percpu	Per-core CPU usage by the container.
container.cpu.throttling_data.periods	Number of periods with throttling active.
container.cpu.throttling_data.throttled_periods	Number of periods when the container hits its throttling limit.
container.cpu.throttling_data.throttled_time	Aggregate time the container was throttled.
container.cpu.percent	Percent of CPU used by the container.

Memory Metrics

Memory metrics help you understand RAM usage and identify memory leaks or pressure:

Metric	Description
container.memory.usage.limit	Memory limit of the container.
container.memory.usage.total	Memory usage of the container. This excludes the cache.
container.memory.usage.max	Maximum memory usage.
container.memory.percent	Percentage of memory used.
container.memory.cache	The amount of memory used by the processes of this control group that can be associated precisely with a block on a block device.
container.memory.rss	The amount of memory that doesn't correspond to anything on disk: stacks, heaps, and anonymous memory maps.
container.memory.rss_huge	Number of bytes of anonymous transparent hugepages in this cgroup.
container.memory.dirty	Bytes that are waiting to get written back to the disk, from this cgroup.
container.memory.writeback	Number of bytes of file/anon cache that are queued for syncing to disk in this cgroup.
container.memory.mapped_file	Indicates the amount of memory mapped by the processes in the control group.
container.memory.swap	The amount of swap currently used by the processes in this cgroup.

Block I/O Metrics

Block I/O metrics track disk read and write operations for your containers:

Metric	Description
container.blockio.io_merged_recursive	Number of bios/requests merged into requests belonging to this cgroup and its descendant cgroups.
container.blockio.io_queued_recursive	Number of requests queued up for this cgroup and its descendant cgroups.
container.blockio.io_service_bytes_recursive	Number of bytes transferred to/from the disk by the group and descendant groups.
container.blockio.io_service_time_recursive	Total amount of time in nanoseconds between request dispatch and request completion for the IOs done by this cgroup and descendant cgroups.
container.blockio.io_serviced_recursive	Number of IOs (bio) issued to the disk by the group and descendant groups.
container.blockio.io_time_recursive	Disk time allocated to cgroup (and descendant cgroups) per device in milliseconds.
container.blockio.io_wait_time_recursive	Total amount of time the IOs for this cgroup (and descendant cgroups) spent waiting in the scheduler queues for service.
container.blockio.sectors_recursive	Number of sectors transferred to/from disk by the group and descendant groups.

Network Metrics

Network metrics provide visibility into container network traffic and errors:

Metric	Description
container.network.io.usage.rx_bytes	Bytes received by the container.
container.network.io.usage.tx_bytes	Bytes sent.
container.network.io.usage.rx_dropped	Incoming packets dropped.
container.network.io.usage.tx_dropped	Outgoing packets dropped.
container.network.io.usage.rx_errors	Received errors.
container.network.io.usage.tx_errors	Sent errors.
container.network.io.usage.rx_packets	Packets received.
container.network.io.usage.tx_packets	Packets sent.

Prerequisites

Before getting started with Docker container monitoring, ensure you have:

Docker installed and running on your system (Docker API version 1.25 or higher)
Running Docker containers to monitor (or follow our setup below)
Basic understanding of YAML configuration files

Choose Your Monitoring Backend

In this guide, we use Uptrace for examples, but you can use any OTLP-compatible backend.

For Uptrace:

Sign up at uptrace.dev
Get your DSN from Project → Data Source Name

For other backends:

Configure the OTLP exporter endpoint in your collector config
See the OpenTelemetry documentation for other exporters

Verify Your Docker Installation

Check your Docker version and API compatibility:

docker --version
# Docker version 24.0.0 or higher

docker version --format '{{.Server.APIVersion}}'
# Should return 1.25 or higher

Verify Docker is running:

docker ps

Start Sample Containers (Optional)

If you don't have containers running, start a few for testing:

# Start an Nginx web server
docker run -d --name nginx-test -p 8080:80 nginx:latest

# Start a MySQL database
docker run -d --name mysql-test \
  -e MYSQL_ROOT_PASSWORD=mysecretpassword \
  -p 3306:3306 mysql:latest

# Start Redis
docker run -d --name redis-test -p 6379:6379 redis:latest

Verify containers are running:

docker ps

You should see your containers listed with their status as "Up".

Setting Up OpenTelemetry Collector

You have several options for running the OpenTelemetry Collector. Choose the one that best fits your environment.

Option 1: Run Collector Natively

Running the collector as a native binary is ideal for development and testing environments.

Step 1: Download OpenTelemetry Collector

Download the appropriate binary for your operating system from the OpenTelemetry Collector releases page. We'll use the contrib distribution which includes the Docker Stats receiver.

For Linux (amd64):

curl --proto '=https' --tlsv1.2 -fOL \
  https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.137.0/otelcol-contrib_0.137.0_linux_amd64.tar.gz

For macOS (Apple Silicon/arm64):

curl --proto '=https' --tlsv1.2 -fOL \
  https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.137.0/otelcol-contrib_0.137.0_darwin_arm64.tar.gz

For macOS (Intel/amd64):

curl --proto '=https' --tlsv1.2 -fOL \
  https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.137.0/otelcol-contrib_0.137.0_darwin_amd64.tar.gz

Step 2: Extract and Install

# Create directory
mkdir otelcol-contrib

# Extract the archive (adjust filename for your OS)
tar xvzf otelcol-contrib_0.137.0_*.tar.gz -C otelcol-contrib

# Navigate to the directory
cd otelcol-contrib

# Make binary executable
chmod +x otelcol-contrib

Step 3: Verify Installation

./otelcol-contrib --version
# Output: otelcol-contrib version v0.137.0

Optionally, move the binary to your system path for easier access:

sudo mv otelcol-contrib /usr/local/bin/

Option 2: Run Collector in Docker

Running the collector as a Docker container is often the preferred method for containerized environments. The OpenTelemetry project provides official Docker images on Docker Hub under otel/opentelemetry-collector-contrib.

Pull the OpenTelemetry Docker Image

OpenTelemetry provides official Docker images on Docker Hub and GitHub Container Registry. The opentelemetry-collector-contrib distribution includes all community-contributed receivers, processors, and exporters, including the Docker Stats receiver.

# From Docker Hub (recommended)
docker pull otel/opentelemetry-collector-contrib:0.137.0

# Or from GitHub Container Registry
docker pull ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.137.0

You can find all available versions on Docker Hub: otel/opentelemetry-collector-contrib.

Download and Verify the Image

After pulling the image, verify it's available:

# List Docker images
docker images | grep opentelemetry-collector

# Check image details
docker inspect otel/opentelemetry-collector-contrib:0.137.0

Basic Docker Run Command

Run the collector with a custom configuration:

docker run -d \
  --name otel-collector \
  -v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -p 4317:4317 \
  -p 4318:4318 \
  --restart unless-stopped \
  otel/opentelemetry-collector-contrib:0.137.0

Important: The -v /var/run/docker.sock:/var/run/docker.sock:ro mount is required for Docker Stats receiver to access container metrics.

Understanding the Docker Run Parameters

-d: Run container in detached mode (background)
--name: Assign a name for easy reference
-v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml: Mount your configuration file
-v /var/run/docker.sock:/var/run/docker.sock:ro: Mount Docker socket (read-only)
-p 4317:4317: Expose OTLP gRPC receiver port
-p 4318:4318: Expose OTLP HTTP receiver port
--restart unless-stopped: Automatically restart on failure

Docker Socket Access on Different OS

Linux:

-v /var/run/docker.sock:/var/run/docker.sock:ro

macOS:

-v /var/run/docker.sock:/var/run/docker.sock:ro

Windows (Docker Desktop):

-v //var/run/docker.sock:/var/run/docker.sock:ro

Viewing Logs

Monitor collector logs to ensure it's working correctly:

# Follow logs in real-time
docker logs -f otel-collector

# View last 100 lines
docker logs --tail 100 otel-collector

Stopping and Removing

# Stop the container
docker stop otel-collector

# Remove the container
docker rm otel-collector

Option 3: Docker Compose Setup

For production-like environments or when running multiple services together, Docker Compose provides an elegant solution. This Docker Compose example shows how to set up the OpenTelemetry Collector for monitoring your Docker containers.

Basic OpenTelemetry Docker Compose Configuration

Create a docker-compose.yml file for the OpenTelemetry Collector:

version: '3.8'

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.137.0
    container_name: otel-collector
    command: ["--config=/etc/otelcol-contrib/config.yaml"]
    volumes:
      - ./config.yaml:/etc/otelcol-contrib/config.yaml
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8888:8888"   # Prometheus metrics (collector's own metrics)
      - "13133:13133" # Health check
    restart: unless-stopped
    networks:
      - monitoring
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
      interval: 30s
      timeout: 10s
      retries: 3

networks:
  monitoring:
    driver: bridge

This OpenTelemetry Docker Compose example provides a production-ready setup with health checks and proper networking.

Complete OpenTelemetry Collector Docker Compose Example

For a complete monitoring stack, the following OpenTelemetry Docker Compose example includes the Collector and sample applications to demonstrate an end-to-end setup.

version: '3.8'

services:
  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.137.0
    container_name: otel-collector
    command: ["--config=/etc/otelcol-contrib/config.yaml"]
    volumes:
      - ./otel-config.yaml:/etc/otelcol-contrib/config.yaml
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver
    environment:
      - UPTRACE_DSN=${UPTRACE_DSN}
    networks:
      - monitoring
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
      interval: 30s
      timeout: 10s
      retries: 3

  # Sample Application (to generate telemetry)
  sample-app:
    image: nginx:latest
    container_name: sample-app
    ports:
      - "8080:80"
    networks:
      - monitoring
    labels:
      app: "sample-nginx"
      environment: "production"

  # Another sample service
  redis:
    image: redis:latest
    container_name: sample-redis
    ports:
      - "6379:6379"
    networks:
      - monitoring
    labels:
      app: "redis"
      environment: "production"

networks:
  monitoring:
    driver: bridge

This Docker Compose tutorial demonstrates a complete setup for OpenTelemetry monitoring with multiple services.

# Start all services
docker-compose up -d

# View logs from all services
docker-compose logs -f

# View logs from specific service
docker-compose logs -f otel-collector

# Check service status
docker-compose ps

Stopping the Stack

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

Environment Variables in Docker Compose

Use environment variables for sensitive data:

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.137.0
    environment:
      - UPTRACE_DSN=${UPTRACE_DSN}
    env_file:
      - .env

Create a .env file:

UPTRACE_DSN=your_dsn_here

Important: Add .env to your .gitignore to avoid committing secrets.

Configuring Docker Stats Receiver

The Docker Stats receiver connects to the Docker daemon to collect container metrics. Create a configuration file to define how metrics are collected and exported.

Basic Configuration

For a minimal setup, create a file named config.yaml. This example exports to Uptrace, but you can replace the exporter with any OTLP-compatible backend:

receivers:
  docker_stats:
    endpoint: unix:///var/run/docker.sock
    collection_interval: 30s

exporters:
  otlp:
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: '<YOUR_DSN>'

processors:
  batch:
    timeout: 10s

service:
  pipelines:
    metrics:
      receivers: [docker_stats]
      processors: [batch]
      exporters: [otlp]

Replace <YOUR_DSN> with your Uptrace DSN, or configure a different exporter for your monitoring backend.

This basic configuration:

Collects Docker container metrics every 30 seconds
Uses the batch processor to optimize data transmission
Exports metrics to Uptrace via OTLP protocol

Production Configuration

For production environments, use this enhanced configuration with additional features:

receivers:
  otlp:
    protocols:
      grpc:
      http:

  docker_stats:
    endpoint: unix:///var/run/docker.sock
    collection_interval: 15s
    timeout: 10s
    api_version: "1.24"

    # Enable additional metrics
    metrics:
      container.uptime:
        enabled: true
      container.restarts:
        enabled: true
      container.network.io.usage.rx_errors:
        enabled: true
      container.network.io.usage.tx_errors:
        enabled: true
      container.network.io.usage.rx_packets:
        enabled: true
      container.network.io.usage.tx_packets:
        enabled: true
      container.cpu.usage.percpu:
        enabled: true

    # Filter out unwanted containers
    excluded_images:
      - undesired-container
      - /.*undesired.*/
      - another-*-container

    # Map container labels to metric labels
    container_labels_to_metric_labels:
      my.container.label: my-metric-label
      my.other.container.label: my-other-metric-label
      com.docker.compose.service: service_name
      com.docker.compose.project: project_name

    # Map environment variables to metric labels
    env_vars_to_metric_labels:
      MY_ENVIRONMENT_VARIABLE: my-metric-label
      MY_OTHER_ENVIRONMENT_VARIABLE: my-other-metric-label

exporters:
  otlp:
    endpoint: api.uptrace.dev:4317  # Replace with your backend endpoint
    headers:
      uptrace-dsn: '<YOUR_DSN>'     # Or use your backend's auth method

processors:
  resourcedetection:
    detectors: [env, system]

  cumulativetodelta:

  batch:
    timeout: 10s
    send_batch_size: 1000

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

    metrics:
      receivers: [otlp, docker_stats]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp]

This production configuration includes:

Additional metrics for better observability (uptime, restarts, network errors)
Container filtering to exclude test/development containers
Label mapping to enrich metrics with custom labels
Resource detection to add host information to metrics
Both OTLP receivers for collecting traces from applications

Docker Socket Configuration

The Docker socket path varies by operating system:

For Linux:

The default socket path is /var/run/docker.sock

docker_stats:
  endpoint: unix:///var/run/docker.sock

For macOS with Docker Desktop:

Check the socket location:

ls -la /var/run/docker.sock
# or
ls -la ~/.docker/run/docker.sock

Update your configuration accordingly if the path differs.

For Windows with Docker Desktop:

Docker uses a named pipe:

docker_stats:
  endpoint: npipe:////./pipe/docker_engine

Alternative: Docker Daemon TCP Endpoint

If you prefer using a TCP endpoint instead of Unix socket, you can configure the Docker daemon to listen on a TCP port. This is useful for remote monitoring.

First, configure Docker daemon to listen on 0.0.0.0:2375, then adjust the OpenTelemetry Collector config:

receivers:
  docker_stats:
    endpoint: http://localhost:2375

Note: The Unix socket approach is more secure and should be preferred for local monitoring.

Combining Multiple Receivers

The OpenTelemetry Collector can receive telemetry from multiple sources simultaneously, making it a central hub for all your observability data.

Docker Stats + OTLP Configuration

Monitor both Docker containers and application traces/metrics:

receivers:
  # Collect Docker container metrics
  docker_stats:
    endpoint: unix:///var/run/docker.sock
    collection_interval: 15s
    metrics:
      container.uptime:
        enabled: true
      container.restarts:
        enabled: true

  # Receive application telemetry
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

  resourcedetection:
    detectors: [env, system, docker]

exporters:
  otlp:
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: '<YOUR_DSN>'

service:
  pipelines:
    # Pipeline for application traces
    traces:
      receivers: [otlp]
      processors: [batch, resourcedetection]
      exporters: [otlp]

    # Pipeline for all metrics (Docker + Application)
    metrics:
      receivers: [otlp, docker_stats]
      processors: [batch, resourcedetection]
      exporters: [otlp]

    # Pipeline for application logs
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

This configuration creates a complete observability pipeline:

Docker Stats receiver collects container resource metrics
OTLP receivers collect application traces, metrics, and logs
All data is enriched with resource information
Everything flows to a single backend (Uptrace)

Benefits of Combined Collection

Unified View: Correlate application performance with container resource usage
Simplified Architecture: One collector handles all telemetry types
Consistent Processing: Apply the same processors to all data
Reduced Overhead: Single agent instead of multiple monitoring tools

Running the Collector

Run in Foreground (for testing)

From the directory containing your config.yaml file:

./otelcol-contrib --config ./config.yaml

Or if you installed it to your system path:

otelcol-contrib --config ./config.yaml

You should see output indicating the collector has started:

2024-10-09T10:00:00.000Z info service/telemetry.go:84 Setting up own telemetry...
2024-10-09T10:00:00.100Z info service/service.go:143 Starting otelcol-contrib...
2024-10-09T10:00:00.200Z info extensions/extensions.go:42 Starting extensions...
2024-10-09T10:00:01.000Z info dockerstatsreceiver@v0.137.0/receiver.go:123 Starting docker_stats receiver

Press Ctrl+C to stop the collector.

Run in Background

To run the collector in the background:

# Start in background and redirect output to log file
./otelcol-contrib --config ./config.yaml &> otelcol-output.log & echo "$!" > otel-pid

# View logs in real-time
tail -f otelcol-output.log

# View last 50 lines of logs
tail -n 50 otelcol-output.log

# Stop the collector
kill "$(< otel-pid)"

Run as systemd Service (Linux)

For production deployments on Linux, create a systemd service for automatic startup and management.

Create a service file /etc/systemd/system/otelcol.service:

[Unit]
Description=OpenTelemetry Collector
After=network.target docker.service
Requires=docker.service

[Service]
Type=simple
User=otel
Group=docker
ExecStart=/usr/local/bin/otelcol-contrib --config /etc/otelcol/config.yaml
Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target

Create the otel user and copy your configuration:

# Create otel user
sudo useradd -r -s /bin/false otel

# Add otel user to docker group
sudo usermod -aG docker otel

# Create config directory
sudo mkdir -p /etc/otelcol

# Copy your config file
sudo cp config.yaml /etc/otelcol/

# Set permissions
sudo chown -R otel:otel /etc/otelcol

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable otelcol
sudo systemctl start otelcol

# Check status
sudo systemctl status otelcol

# View logs
sudo journalctl -u otelcol -f

Verify Data Collection

Within 30 seconds of starting the collector, you should see metrics appearing in your Uptrace dashboard:

Navigate to your Uptrace instance
Go to the Metrics section
You should see Docker container metrics like:
- container.cpu.usage.total
- container.memory.usage.total
- container.network.io.usage.rx_bytes
- And many more

If metrics don't appear, check the Troubleshooting section below.

Monitoring with Uptrace

Once the metrics are collected and exported, you can visualize them using Uptrace dashboards. Uptrace provides powerful querying capabilities and customizable dashboards for analyzing your Docker container metrics.

Creating Dashboards

In the Uptrace dashboard:

Navigate to Dashboards tab
Click New Dashboard
Add panels to visualize different metrics

You can create various types of visualizations:

Time series charts for CPU and memory usage over time
Gauges for current resource utilization
Tables for listing containers and their current state
Heatmaps for distribution analysis

Example Queries

Here are some useful queries to get started:

Average CPU usage per container:

avg(container.cpu.utilization) by container.name

Memory usage percentage:

(container.memory.usage.total / container.memory.usage.limit) * 100

Network throughput:

rate(container.network.io.usage.rx_bytes[5m]) + rate(container.network.io.usage.tx_bytes[5m])

Containers by state:

count(container.cpu.utilization) by container.name

Setting Up Alerts

Configure alerts to be notified of potential issues:

In your dashboard panel Set Up Monitors then Create Alerets
Set conditions, for example:
- CPU usage > 80% for 5 minutes
- Memory usage > 90% for 2 minutes
- Container restarts > 3 in 10 minutes
Configure notification channels (email, Slack, PagerDuty, etc.)

Common alert rules:

High CPU utilization
Memory pressure
Excessive container restarts
Network errors
Disk I/O saturation

OpenTelemetry Backend

The OpenTelemetry Collector exports metrics to any OTLP-compatible backend. This guide uses Uptrace in examples, but you can use for example:

Prometheus + Grafana - Self-hosted metrics and visualization
Grafana Cloud - Managed observability platform
Datadog, New Relic - Commercial APM solutions

To switch backends, update the exporter configuration in your config.yaml.

Troubleshooting

Permission Denied: Cannot Access Docker Socket

Error:

Error: Get "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied

Solution:

Add your user to the docker group:

sudo usermod -aG docker $USER

Then log out and log back in, or run:

newgrp docker

Alternatively, if running the collector in Docker, ensure the container has access to the socket.

For systemd service, ensure the service user is in the docker group (as shown in the systemd setup above).

Docker Socket Not Found

Error:

Error: dial unix /var/run/docker.sock: connect: no such file or directory

Solution:

Verify Docker is running:

docker ps

Check if the socket exists:

# Linux
ls -la /var/run/docker.sock

# macOS
ls -la /var/run/docker.sock
ls -la ~/.docker/run/docker.sock

Update your config.yaml with the correct socket path if it differs from the default.

No Metrics in Uptrace

If metrics aren't appearing in Uptrace after a few minutes:

1. Verify DSN is correct:

Check that you've correctly copied your DSN from Uptrace Settings → Ingestion Settings
Ensure there are no extra spaces or quotes around the DSN value

2. Check network connectivity:

# Test connection to Uptrace
curl -v https://api.uptrace.dev:4317

3. Review collector logs:

# If running in foreground, check the console output
# If running in background:
tail -f otelcol-output.log

# If running in Docker:
docker logs otel-collector

# If running as systemd:
sudo journalctl -u otelcol -f

Look for error messages related to exporters or authentication.

4. Enable debug logging:

Add to your config.yaml:

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    # ... your existing pipelines

Restart the collector and check logs for detailed output.

5. Verify firewall rules:

Ensure outbound connections to api.uptrace.dev:4317 are allowed.

Docker API Version Incompatibility

Error:

Error response from daemon: client version 1.41 is too new. Maximum supported API version is 1.40

Solution:

Check your Docker API version:

docker version --format '{{.Server.APIVersion}}'

Specify the API version in your config.yaml:

receivers:
  docker_stats:
    endpoint: unix:///var/run/docker.sock
    api_version: "1.40"  # Use your Docker's API version

High CPU Usage by Collector

If the OpenTelemetry Collector is consuming excessive CPU:

1. Increase collection interval:

docker_stats:
  collection_interval: 60s  # Increase from default 30s or 15s

2. Disable expensive metrics:

docker_stats:
  metrics:
    container.cpu.usage.percpu:
      enabled: false
    container.blockio.io_service_bytes_recursive:
      enabled: false

3. Optimize batch processing:

processors:
  batch:
    timeout: 30s
    send_batch_size: 2000  # Increase batch size

4. Filter containers:

Use excluded_images to reduce the number of monitored containers:

docker_stats:
  excluded_images:
    - /.*test.*/
    - /.*dev.*/

Collector Crashes or Restarts Frequently

Common causes and solutions:

1. Memory issues:

Reduce collection frequency
Decrease batch size
Enable memory_limiter processor:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

service:
  pipelines:
    metrics:
      receivers: [docker_stats]
      processors: [memory_limiter, batch]
      exporters: [otlp]

2. Configuration errors:

Validate your YAML syntax
Check collector logs for specific error messages
Test with minimal configuration first

3. Network issues:

Add retry configuration to exporter:

exporters:
  otlp:
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: '<YOUR_DSN>'
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

Missing Expected Metrics

If some metrics are not appearing:

1. Check if metrics are enabled:

Many metrics are disabled by default. Enable them explicitly:

docker_stats:
  metrics:
    container.uptime:
      enabled: true
    container.restarts:
      enabled: true
    # ... enable other metrics as needed

2. Verify container is running:

The receiver only collects metrics from running containers.

3. Check cgroup version:

Some metrics are only available on cgroup v1 or v2. Check your system:

# Check cgroup version
mount | grep cgroup

If you're using cgroup v2, some v1-specific metrics won't be available.

Additional Resources

Documentation

Docker Images & Downloads

Docker Hub
GitHub Repo
Latest image (not recommended for production): docker pull otel/opentelemetry-collector-contrib:latest
Specific version (recommended): docker pull otel/opentelemetry-collector-contrib:0.137.0
Binary downloads
Package managers:
- Homebrew (macOS): brew install opentelemetry-collector-contrib
- APT (Debian/Ubuntu): see official docs

What's next?

With Docker container monitoring configured, you can track resource usage, container health, and application performance within your containerized environments. Scale up to Kubernetes monitoring for orchestrated deployments, or explore database monitoring with PostgreSQL and MySQL. For APM capabilities, compare top APM tools for container monitoring.