Docker containers are lightweight and portable, but their dynamic nature makes monitoring challenging. Containers start, stop, and move between hosts constantly—you need visibility into their resource usage and performance.
This guide explains how to monitor Docker containers using the OpenTelemetry Collector with the Docker Stats receiver. It covers setting up the Collector (either directly or with Docker Compose), collecting container metrics, and exporting them to your monitoring backend.
Why Monitor Docker Containers?
Monitoring Docker containers is a key part of ensuring that containerized applications run reliably and perform efficiently. Here's why it matters:
Performance Optimization: Identify resource bottlenecks and optimize container performance before they impact users. By tracking CPU, memory, and I/O metrics, you can fine-tune resource allocation and application configurations.
Resource Management: Ensure efficient allocation of CPU, memory, and network resources across containers. Docker containers share the host system's resources, so monitoring helps prevent resource contention and ensures fair distribution.
Troubleshooting: Quickly diagnose issues by tracking container metrics and identifying anomalies. When a container crashes or performs poorly, historical metrics provide invaluable insights into what went wrong and why.
Cost Control: In cloud environments, efficient resource monitoring can lead to significant cost savings. By right-sizing containers and eliminating waste, you can optimize your cloud spending.
Security and Compliance: Monitor container behavior to detect unusual activity that might indicate security breaches or compliance violations.
What is OpenTelemetry Collector?
OpenTelemetry Collector acts as a central hub for receiving, processing, and exporting telemetry data to various backends or observability tools.
The Collector can receive telemetry data (traces, metrics, and logs) from multiple sources, including:
- Applications instrumented with OpenTelemetry SDKs.
- Other telemetry agents or collectors.
- Legacy systems using protocols like Jaeger, Zipkin, Prometheus, etc.
Why Use OpenTelemetry Collector with Docker?
Docker provides an ideal environment for running the OpenTelemetry Collector, offering several key advantages:
Portability and Consistency
Running the OpenTelemetry Collector in Docker ensures it behaves identically across development, staging, and production environments. Whether you're on a local machine, cloud VM, or Kubernetes cluster, the containerized collector provides consistent telemetry collection.
Simplified Deployment
Docker eliminates dependency management complexities. All required libraries and configurations are packaged within the container image, making deployment as simple as pulling an image and starting a container.
Easy Scaling
Docker makes it straightforward to run multiple collector instances for high-throughput environments. With Docker Compose or orchestration tools like Kubernetes, you can scale collectors horizontally to handle increased telemetry loads.
Isolation and Resource Control
Containers provide process isolation and resource limits, preventing the collector from impacting host system performance. You can set CPU and memory limits to ensure predictable resource usage.
Version Management
Docker images are versioned, allowing you to roll back to previous collector versions if issues arise. This makes upgrades safer and more manageable.
OpenTelemetry Docker Stats Receiver
OpenTelemetry Docker Stats receiver allows you to collect container-level resource metrics from Docker. It retrieves metrics such as CPU usage, memory usage, network statistics, and disk I/O from Docker containers and exposes them as OpenTelemetry metrics.
The receiver communicates with the Docker daemon through its API, querying container statistics at regular intervals. This approach is non-intrusive and doesn't require any modifications to your container images or applications.
CPU Metrics
The Docker Stats receiver collects comprehensive CPU metrics for monitoring container processor usage:
| Metric | Description |
|---|---|
| container.cpu.usage.system | System CPU usage, as reported by docker. |
| container.cpu.usage.total | Total CPU time consumed. |
| container.cpu.usage.kernelmode | Time spent by tasks of the cgroup in kernel mode (Linux). |
| container.cpu.usage.usermode | Time spent by tasks of the cgroup in user mode (Linux). |
| container.cpu.usage.percpu | Per-core CPU usage by the container. |
| container.cpu.throttling_data.periods | Number of periods with throttling active. |
| container.cpu.throttling_data.throttled_periods | Number of periods when the container hits its throttling limit. |
| container.cpu.throttling_data.throttled_time | Aggregate time the container was throttled. |
| container.cpu.percent | Percent of CPU used by the container. |
Memory Metrics
Memory metrics help you understand RAM usage and identify memory leaks or pressure:
| Metric | Description |
|---|---|
| container.memory.usage.limit | Memory limit of the container. |
| container.memory.usage.total | Memory usage of the container. This excludes the cache. |
| container.memory.usage.max | Maximum memory usage. |
| container.memory.percent | Percentage of memory used. |
| container.memory.cache | The amount of memory used by the processes of this control group that can be associated precisely with a block on a block device. |
| container.memory.rss | The amount of memory that doesn't correspond to anything on disk: stacks, heaps, and anonymous memory maps. |
| container.memory.rss_huge | Number of bytes of anonymous transparent hugepages in this cgroup. |
| container.memory.dirty | Bytes that are waiting to get written back to the disk, from this cgroup. |
| container.memory.writeback | Number of bytes of file/anon cache that are queued for syncing to disk in this cgroup. |
| container.memory.mapped_file | Indicates the amount of memory mapped by the processes in the control group. |
| container.memory.swap | The amount of swap currently used by the processes in this cgroup. |
Block I/O Metrics
Block I/O metrics track disk read and write operations for your containers:
| Metric | Description |
|---|---|
| container.blockio.io_merged_recursive | Number of bios/requests merged into requests belonging to this cgroup and its descendant cgroups. |
| container.blockio.io_queued_recursive | Number of requests queued up for this cgroup and its descendant cgroups. |
| container.blockio.io_service_bytes_recursive | Number of bytes transferred to/from the disk by the group and descendant groups. |
| container.blockio.io_service_time_recursive | Total amount of time in nanoseconds between request dispatch and request completion for the IOs done by this cgroup and descendant cgroups. |
| container.blockio.io_serviced_recursive | Number of IOs (bio) issued to the disk by the group and descendant groups. |
| container.blockio.io_time_recursive | Disk time allocated to cgroup (and descendant cgroups) per device in milliseconds. |
| container.blockio.io_wait_time_recursive | Total amount of time the IOs for this cgroup (and descendant cgroups) spent waiting in the scheduler queues for service. |
| container.blockio.sectors_recursive | Number of sectors transferred to/from disk by the group and descendant groups. |
Network Metrics
Network metrics provide visibility into container network traffic and errors:
| Metric | Description |
|---|---|
| container.network.io.usage.rx_bytes | Bytes received by the container. |
| container.network.io.usage.tx_bytes | Bytes sent. |
| container.network.io.usage.rx_dropped | Incoming packets dropped. |
| container.network.io.usage.tx_dropped | Outgoing packets dropped. |
| container.network.io.usage.rx_errors | Received errors. |
| container.network.io.usage.tx_errors | Sent errors. |
| container.network.io.usage.rx_packets | Packets received. |
| container.network.io.usage.tx_packets | Packets sent. |
Prerequisites
Before getting started with Docker container monitoring, ensure you have:
- Docker installed and running on your system (Docker API version 1.25 or higher)
- Running Docker containers to monitor (or follow our setup below)
- Basic understanding of YAML configuration files
Choose Your Monitoring Backend
In this guide, we use Uptrace for examples, but you can use any OTLP-compatible backend.
For Uptrace:
- Sign up at uptrace.dev
- Get your DSN from Project → Data Source Name
For other backends:
- Configure the OTLP exporter endpoint in your collector config
- See the OpenTelemetry documentation for other exporters
Verify Your Docker Installation
Check your Docker version and API compatibility:
docker --version
# Docker version 24.0.0 or higher
docker version --format '{{.Server.APIVersion}}'
# Should return 1.25 or higher
Verify Docker is running:
docker ps
Start Sample Containers (Optional)
If you don't have containers running, start a few for testing:
# Start an Nginx web server
docker run -d --name nginx-test -p 8080:80 nginx:latest
# Start a MySQL database
docker run -d --name mysql-test \
-e MYSQL_ROOT_PASSWORD=mysecretpassword \
-p 3306:3306 mysql:latest
# Start Redis
docker run -d --name redis-test -p 6379:6379 redis:latest
Verify containers are running:
docker ps
You should see your containers listed with their status as "Up".
Setting Up OpenTelemetry Collector
You have several options for running the OpenTelemetry Collector. Choose the one that best fits your environment.
Option 1: Run Collector Natively
Running the collector as a native binary is ideal for development and testing environments.
Step 1: Download OpenTelemetry Collector
Download the appropriate binary for your operating system from the OpenTelemetry Collector releases page. We'll use the contrib distribution which includes the Docker Stats receiver.
For Linux (amd64):
curl --proto '=https' --tlsv1.2 -fOL \
https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.137.0/otelcol-contrib_0.137.0_linux_amd64.tar.gz
For macOS (Apple Silicon/arm64):
curl --proto '=https' --tlsv1.2 -fOL \
https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.137.0/otelcol-contrib_0.137.0_darwin_arm64.tar.gz
For macOS (Intel/amd64):
curl --proto '=https' --tlsv1.2 -fOL \
https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.137.0/otelcol-contrib_0.137.0_darwin_amd64.tar.gz
Step 2: Extract and Install
# Create directory
mkdir otelcol-contrib
# Extract the archive (adjust filename for your OS)
tar xvzf otelcol-contrib_0.137.0_*.tar.gz -C otelcol-contrib
# Navigate to the directory
cd otelcol-contrib
# Make binary executable
chmod +x otelcol-contrib
Step 3: Verify Installation
./otelcol-contrib --version
# Output: otelcol-contrib version v0.137.0
Optionally, move the binary to your system path for easier access:
sudo mv otelcol-contrib /usr/local/bin/
Option 2: Run Collector in Docker
Running the collector as a Docker container is often the preferred method for containerized environments. The OpenTelemetry project provides official Docker images on Docker Hub under otel/opentelemetry-collector-contrib.
Pull the OpenTelemetry Docker Image
OpenTelemetry provides official Docker images on Docker Hub and GitHub Container Registry. The opentelemetry-collector-contrib distribution includes all community-contributed receivers, processors, and exporters, including the Docker Stats receiver.
# From Docker Hub (recommended)
docker pull otel/opentelemetry-collector-contrib:0.137.0
# Or from GitHub Container Registry
docker pull ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.137.0
You can find all available versions on Docker Hub: otel/opentelemetry-collector-contrib.
Download and Verify the Image
After pulling the image, verify it's available:
# List Docker images
docker images | grep opentelemetry-collector
# Check image details
docker inspect otel/opentelemetry-collector-contrib:0.137.0
Basic Docker Run Command
Run the collector with a custom configuration:
docker run -d \
--name otel-collector \
-v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-p 4317:4317 \
-p 4318:4318 \
--restart unless-stopped \
otel/opentelemetry-collector-contrib:0.137.0
Important: The -v /var/run/docker.sock:/var/run/docker.sock:ro mount is required for Docker Stats receiver to access container metrics.
Understanding the Docker Run Parameters
-
-d: Run container in detached mode (background) -
--name: Assign a name for easy reference -
-v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml: Mount your configuration file -
-v /var/run/docker.sock:/var/run/docker.sock:ro: Mount Docker socket (read-only) -
-p 4317:4317: Expose OTLP gRPC receiver port -
-p 4318:4318: Expose OTLP HTTP receiver port -
--restart unless-stopped: Automatically restart on failure
Docker Socket Access on Different OS
Linux:
-v /var/run/docker.sock:/var/run/docker.sock:ro
macOS:
-v /var/run/docker.sock:/var/run/docker.sock:ro
Windows (Docker Desktop):
-v //var/run/docker.sock:/var/run/docker.sock:ro
Viewing Logs
Monitor collector logs to ensure it's working correctly:
# Follow logs in real-time
docker logs -f otel-collector
# View last 100 lines
docker logs --tail 100 otel-collector
Stopping and Removing
# Stop the container
docker stop otel-collector
# Remove the container
docker rm otel-collector
Option 3: Docker Compose Setup
For production-like environments or when running multiple services together, Docker Compose provides an elegant solution. This Docker Compose example shows how to set up the OpenTelemetry Collector for monitoring your Docker containers.
Basic OpenTelemetry Docker Compose Configuration
Create a docker-compose.yml file for the OpenTelemetry Collector:
version: '3.8'
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.137.0
container_name: otel-collector
command: ["--config=/etc/otelcol-contrib/config.yaml"]
volumes:
- ./config.yaml:/etc/otelcol-contrib/config.yaml
- /var/run/docker.sock:/var/run/docker.sock:ro
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8888:8888" # Prometheus metrics (collector's own metrics)
- "13133:13133" # Health check
restart: unless-stopped
networks:
- monitoring
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
interval: 30s
timeout: 10s
retries: 3
networks:
monitoring:
driver: bridge
This OpenTelemetry Docker Compose example provides a production-ready setup with health checks and proper networking.
Complete OpenTelemetry Collector Docker Compose Example
For a complete monitoring stack, the following OpenTelemetry Docker Compose example includes the Collector and sample applications to demonstrate an end-to-end setup.
version: '3.8'
services:
# OpenTelemetry Collector
otel-collector:
image: otel/opentelemetry-collector-contrib:0.137.0
container_name: otel-collector
command: ["--config=/etc/otelcol-contrib/config.yaml"]
volumes:
- ./otel-config.yaml:/etc/otelcol-contrib/config.yaml
- /var/run/docker.sock:/var/run/docker.sock:ro
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
environment:
- UPTRACE_DSN=${UPTRACE_DSN}
networks:
- monitoring
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
interval: 30s
timeout: 10s
retries: 3
# Sample Application (to generate telemetry)
sample-app:
image: nginx:latest
container_name: sample-app
ports:
- "8080:80"
networks:
- monitoring
labels:
app: "sample-nginx"
environment: "production"
# Another sample service
redis:
image: redis:latest
container_name: sample-redis
ports:
- "6379:6379"
networks:
- monitoring
labels:
app: "redis"
environment: "production"
networks:
monitoring:
driver: bridge
This Docker Compose tutorial demonstrates a complete setup for OpenTelemetry monitoring with multiple services.
# Start all services
docker-compose up -d
# View logs from all services
docker-compose logs -f
# View logs from specific service
docker-compose logs -f otel-collector
# Check service status
docker-compose ps
Stopping the Stack
# Stop all services
docker-compose down
# Stop and remove volumes
docker-compose down -v
Environment Variables in Docker Compose
Use environment variables for sensitive data:
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.137.0
environment:
- UPTRACE_DSN=${UPTRACE_DSN}
env_file:
- .env
Create a .env file:
UPTRACE_DSN=your_dsn_here
Important: Add .env to your .gitignore to avoid committing secrets.
Configuring Docker Stats Receiver
The Docker Stats receiver connects to the Docker daemon to collect container metrics. Create a configuration file to define how metrics are collected and exported.
Basic Configuration
For a minimal setup, create a file named config.yaml. This example exports to Uptrace, but you can replace the exporter with any OTLP-compatible backend:
receivers:
docker_stats:
endpoint: unix:///var/run/docker.sock
collection_interval: 30s
exporters:
otlp:
endpoint: api.uptrace.dev:4317
headers:
uptrace-dsn: '<YOUR_DSN>'
processors:
batch:
timeout: 10s
service:
pipelines:
metrics:
receivers: [docker_stats]
processors: [batch]
exporters: [otlp]
Replace <YOUR_DSN> with your Uptrace DSN, or configure a different exporter for your monitoring backend.
This basic configuration:
- Collects Docker container metrics every 30 seconds
- Uses the batch processor to optimize data transmission
- Exports metrics to Uptrace via OTLP protocol
Production Configuration
For production environments, use this enhanced configuration with additional features:
receivers:
otlp:
protocols:
grpc:
http:
docker_stats:
endpoint: unix:///var/run/docker.sock
collection_interval: 15s
timeout: 10s
api_version: "1.24"
# Enable additional metrics
metrics:
container.uptime:
enabled: true
container.restarts:
enabled: true
container.network.io.usage.rx_errors:
enabled: true
container.network.io.usage.tx_errors:
enabled: true
container.network.io.usage.rx_packets:
enabled: true
container.network.io.usage.tx_packets:
enabled: true
container.cpu.usage.percpu:
enabled: true
# Filter out unwanted containers
excluded_images:
- undesired-container
- /.*undesired.*/
- another-*-container
# Map container labels to metric labels
container_labels_to_metric_labels:
my.container.label: my-metric-label
my.other.container.label: my-other-metric-label
com.docker.compose.service: service_name
com.docker.compose.project: project_name
# Map environment variables to metric labels
env_vars_to_metric_labels:
MY_ENVIRONMENT_VARIABLE: my-metric-label
MY_OTHER_ENVIRONMENT_VARIABLE: my-other-metric-label
exporters:
otlp:
endpoint: api.uptrace.dev:4317 # Replace with your backend endpoint
headers:
uptrace-dsn: '<YOUR_DSN>' # Or use your backend's auth method
processors:
resourcedetection:
detectors: [env, system]
cumulativetodelta:
batch:
timeout: 10s
send_batch_size: 1000
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
metrics:
receivers: [otlp, docker_stats]
processors: [cumulativetodelta, batch, resourcedetection]
exporters: [otlp]
This production configuration includes:
- Additional metrics for better observability (uptime, restarts, network errors)
- Container filtering to exclude test/development containers
- Label mapping to enrich metrics with custom labels
- Resource detection to add host information to metrics
- Both OTLP receivers for collecting traces from applications
Docker Socket Configuration
The Docker socket path varies by operating system:
For Linux:
The default socket path is /var/run/docker.sock
docker_stats:
endpoint: unix:///var/run/docker.sock
For macOS with Docker Desktop:
Check the socket location:
ls -la /var/run/docker.sock
# or
ls -la ~/.docker/run/docker.sock
Update your configuration accordingly if the path differs.
For Windows with Docker Desktop:
Docker uses a named pipe:
docker_stats:
endpoint: npipe:////./pipe/docker_engine
Alternative: Docker Daemon TCP Endpoint
If you prefer using a TCP endpoint instead of Unix socket, you can configure the Docker daemon to listen on a TCP port. This is useful for remote monitoring.
First, configure Docker daemon to listen on 0.0.0.0:2375, then adjust the OpenTelemetry Collector config:
receivers:
docker_stats:
endpoint: http://localhost:2375
Note: The Unix socket approach is more secure and should be preferred for local monitoring.
Combining Multiple Receivers
The OpenTelemetry Collector can receive telemetry from multiple sources simultaneously, making it a central hub for all your observability data.
Docker Stats + OTLP Configuration
Monitor both Docker containers and application traces/metrics:
receivers:
# Collect Docker container metrics
docker_stats:
endpoint: unix:///var/run/docker.sock
collection_interval: 15s
metrics:
container.uptime:
enabled: true
container.restarts:
enabled: true
# Receive application telemetry
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
resourcedetection:
detectors: [env, system, docker]
exporters:
otlp:
endpoint: api.uptrace.dev:4317
headers:
uptrace-dsn: '<YOUR_DSN>'
service:
pipelines:
# Pipeline for application traces
traces:
receivers: [otlp]
processors: [batch, resourcedetection]
exporters: [otlp]
# Pipeline for all metrics (Docker + Application)
metrics:
receivers: [otlp, docker_stats]
processors: [batch, resourcedetection]
exporters: [otlp]
# Pipeline for application logs
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
This configuration creates a complete observability pipeline:
- Docker Stats receiver collects container resource metrics
- OTLP receivers collect application traces, metrics, and logs
- All data is enriched with resource information
- Everything flows to a single backend (Uptrace)
Benefits of Combined Collection
- Unified View: Correlate application performance with container resource usage
- Simplified Architecture: One collector handles all telemetry types
- Consistent Processing: Apply the same processors to all data
- Reduced Overhead: Single agent instead of multiple monitoring tools
Running the Collector
Run in Foreground (for testing)
From the directory containing your config.yaml file:
./otelcol-contrib --config ./config.yaml
Or if you installed it to your system path:
otelcol-contrib --config ./config.yaml
You should see output indicating the collector has started:
2024-10-09T10:00:00.000Z info service/telemetry.go:84 Setting up own telemetry...
2024-10-09T10:00:00.100Z info service/service.go:143 Starting otelcol-contrib...
2024-10-09T10:00:00.200Z info extensions/extensions.go:42 Starting extensions...
2024-10-09T10:00:01.000Z info dockerstatsreceiver@v0.137.0/receiver.go:123 Starting docker_stats receiver
Press Ctrl+C to stop the collector.
Run in Background
To run the collector in the background:
# Start in background and redirect output to log file
./otelcol-contrib --config ./config.yaml &> otelcol-output.log & echo "$!" > otel-pid
# View logs in real-time
tail -f otelcol-output.log
# View last 50 lines of logs
tail -n 50 otelcol-output.log
# Stop the collector
kill "$(< otel-pid)"
Run as systemd Service (Linux)
For production deployments on Linux, create a systemd service for automatic startup and management.
Create a service file /etc/systemd/system/otelcol.service:
[Unit]
Description=OpenTelemetry Collector
After=network.target docker.service
Requires=docker.service
[Service]
Type=simple
User=otel
Group=docker
ExecStart=/usr/local/bin/otelcol-contrib --config /etc/otelcol/config.yaml
Restart=on-failure
RestartSec=30
[Install]
WantedBy=multi-user.target
Create the otel user and copy your configuration:
# Create otel user
sudo useradd -r -s /bin/false otel
# Add otel user to docker group
sudo usermod -aG docker otel
# Create config directory
sudo mkdir -p /etc/otelcol
# Copy your config file
sudo cp config.yaml /etc/otelcol/
# Set permissions
sudo chown -R otel:otel /etc/otelcol
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable otelcol
sudo systemctl start otelcol
# Check status
sudo systemctl status otelcol
# View logs
sudo journalctl -u otelcol -f
Verify Data Collection
Within 30 seconds of starting the collector, you should see metrics appearing in your Uptrace dashboard:
- Navigate to your Uptrace instance
- Go to the Metrics section
- You should see Docker container metrics like:
container.cpu.usage.totalcontainer.memory.usage.totalcontainer.network.io.usage.rx_bytes- And many more
If metrics don't appear, check the Troubleshooting section below.
Monitoring with Uptrace
Once the metrics are collected and exported, you can visualize them using Uptrace dashboards. Uptrace provides powerful querying capabilities and customizable dashboards for analyzing your Docker container metrics.
Creating Dashboards
In the Uptrace dashboard:
- Navigate to Dashboards tab
- Click New Dashboard
- Add panels to visualize different metrics
You can create various types of visualizations:
- Time series charts for CPU and memory usage over time
- Gauges for current resource utilization
- Tables for listing containers and their current state
- Heatmaps for distribution analysis
Example Queries
Here are some useful queries to get started:
Average CPU usage per container:
avg(container.cpu.utilization) by container.name
Memory usage percentage:
(container.memory.usage.total / container.memory.usage.limit) * 100
Network throughput:
rate(container.network.io.usage.rx_bytes[5m]) + rate(container.network.io.usage.tx_bytes[5m])
Containers by state:
count(container.cpu.utilization) by container.name
Setting Up Alerts
Configure alerts to be notified of potential issues:
- In your dashboard panel Set Up Monitors then Create Alerets
- Set conditions, for example:
- CPU usage > 80% for 5 minutes
- Memory usage > 90% for 2 minutes
- Container restarts > 3 in 10 minutes
- Configure notification channels (email, Slack, PagerDuty, etc.)
Common alert rules:
- High CPU utilization
- Memory pressure
- Excessive container restarts
- Network errors
- Disk I/O saturation
OpenTelemetry Backend
The OpenTelemetry Collector exports metrics to any OTLP-compatible backend. This guide uses Uptrace in examples, but you can use for example:
- Prometheus + Grafana - Self-hosted metrics and visualization
- Grafana Cloud - Managed observability platform
- Datadog, New Relic - Commercial APM solutions
To switch backends, update the exporter configuration in your config.yaml.
Troubleshooting
Permission Denied: Cannot Access Docker Socket
Error:
Error: Get "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied
Solution:
Add your user to the docker group:
sudo usermod -aG docker $USER
Then log out and log back in, or run:
newgrp docker
Alternatively, if running the collector in Docker, ensure the container has access to the socket.
For systemd service, ensure the service user is in the docker group (as shown in the systemd setup above).
Docker Socket Not Found
Error:
Error: dial unix /var/run/docker.sock: connect: no such file or directory
Solution:
Verify Docker is running:
docker ps
Check if the socket exists:
# Linux
ls -la /var/run/docker.sock
# macOS
ls -la /var/run/docker.sock
ls -la ~/.docker/run/docker.sock
Update your config.yaml with the correct socket path if it differs from the default.
No Metrics in Uptrace
If metrics aren't appearing in Uptrace after a few minutes:
1. Verify DSN is correct:
- Check that you've correctly copied your DSN from Uptrace Settings → Ingestion Settings
- Ensure there are no extra spaces or quotes around the DSN value
2. Check network connectivity:
# Test connection to Uptrace
curl -v https://api.uptrace.dev:4317
3. Review collector logs:
# If running in foreground, check the console output
# If running in background:
tail -f otelcol-output.log
# If running in Docker:
docker logs otel-collector
# If running as systemd:
sudo journalctl -u otelcol -f
Look for error messages related to exporters or authentication.
4. Enable debug logging:
Add to your config.yaml:
service:
telemetry:
logs:
level: debug
pipelines:
# ... your existing pipelines
Restart the collector and check logs for detailed output.
5. Verify firewall rules:
Ensure outbound connections to api.uptrace.dev:4317 are allowed.
Docker API Version Incompatibility
Error:
Error response from daemon: client version 1.41 is too new. Maximum supported API version is 1.40
Solution:
Check your Docker API version:
docker version --format '{{.Server.APIVersion}}'
Specify the API version in your config.yaml:
receivers:
docker_stats:
endpoint: unix:///var/run/docker.sock
api_version: "1.40" # Use your Docker's API version
High CPU Usage by Collector
If the OpenTelemetry Collector is consuming excessive CPU:
1. Increase collection interval:
docker_stats:
collection_interval: 60s # Increase from default 30s or 15s
2. Disable expensive metrics:
docker_stats:
metrics:
container.cpu.usage.percpu:
enabled: false
container.blockio.io_service_bytes_recursive:
enabled: false
3. Optimize batch processing:
processors:
batch:
timeout: 30s
send_batch_size: 2000 # Increase batch size
4. Filter containers:
Use excluded_images to reduce the number of monitored containers:
docker_stats:
excluded_images:
- /.*test.*/
- /.*dev.*/
Collector Crashes or Restarts Frequently
Common causes and solutions:
1. Memory issues:
- Reduce collection frequency
- Decrease batch size
- Enable memory_limiter processor:
processors:
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
service:
pipelines:
metrics:
receivers: [docker_stats]
processors: [memory_limiter, batch]
exporters: [otlp]
2. Configuration errors:
- Validate your YAML syntax
- Check collector logs for specific error messages
- Test with minimal configuration first
3. Network issues:
Add retry configuration to exporter:
exporters:
otlp:
endpoint: api.uptrace.dev:4317
headers:
uptrace-dsn: '<YOUR_DSN>'
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
Missing Expected Metrics
If some metrics are not appearing:
1. Check if metrics are enabled:
Many metrics are disabled by default. Enable them explicitly:
docker_stats:
metrics:
container.uptime:
enabled: true
container.restarts:
enabled: true
# ... enable other metrics as needed
2. Verify container is running:
The receiver only collects metrics from running containers.
3. Check cgroup version:
Some metrics are only available on cgroup v1 or v2. Check your system:
# Check cgroup version
mount | grep cgroup
If you're using cgroup v2, some v1-specific metrics won't be available.
Additional Resources
Documentation
Docker Images & Downloads
- Docker Hub
- GitHub Repo
- Latest image (not recommended for production):
docker pull otel/opentelemetry-collector-contrib:latest - Specific version (recommended):
docker pull otel/opentelemetry-collector-contrib:0.137.0 - Binary downloads
- Package managers:
- Homebrew (macOS):
brew install opentelemetry-collector-contrib - APT (Debian/Ubuntu): see official docs
- Homebrew (macOS):
What's next?
With Docker container monitoring configured, you can track resource usage, container health, and application performance within your containerized environments. Scale up to Kubernetes monitoring for orchestrated deployments, or explore database monitoring with PostgreSQL and MySQL. For APM capabilities, compare top APM tools for container monitoring.
Top comments (0)