Building a monitoring environment on a local machine is a great weekend project, but scaling it up to look after a live fleet of remote servers requires shifts in how you handle configuration stability, dashboard variables, and hardware persistence.
In this post, I want to walk through how I configured and optimized a multi-node monitoring stack utilizing Prometheus, Node Exporter, and Grafana deployed entirely via Docker Compose.
- The Deployment Architecture To keep things clean and modular, the entire monitoring core runs as separate containerized microservices. The telemetry relies on bind-mounts to guarantee that if a container is wiped or updated, the custom target definitions stay safe on disk. Here is the structural framework of the modern docker-compose.yml layout used to spin it up: version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: always
volumes:
- ./prometheus:/etc/prometheus
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: always
ports:
- "3000:3000"
Solving the High-Availability Problem
A common issue with basic Docker deployments is that if the physical or virtual host undergoes a sudden reboot or power failure, your container instances drop offline into an Exited state.
By applying the restart: always policy under our services, the Docker daemon automatically handles relaunching the infrastructure as soon as the system initializes. No manual ssh intervention required.
- Scraping Multiple Remote Targets Inside the prometheus.yml target profile, I pooled our infrastructure assets into distinct target blocks. Rather than hardcoding distinct jobs for every server, grouping identical server profiles under a singular array makes filtering exponentially cleaner global: scrape_interval: 15s
scrape_configs:
-
job_name: 'prometheus'
static_configs:- targets: ['localhost:9090']
-
job_name: 'remote_ubuntu_nodes'
static_configs:- targets:
- '192.168.23.87:9110'
- '192.168.23.88:9100'
- '192.168.23.89:9100'
- '192.168.23.90:9100'
Transitioning to a Fleet View in Grafana
Standard configurations for public dashboards (like the classic Node Exporter Full) default to strict single-select filters. When checking on multiple nodes like load balancers or app-services, clicking down an endless dropdown isn't sustainable.
To move to a comprehensive fleet view, we can tap into Dashboard Settings (s shortcut in Grafana) and adjust the query variables:
Multi-value selection: Enabled.
Include All option: Enabled.
To prevent the gauges from blending the metrics into a confusing average, you can open the row settings for your graphs and toggle Repeat For: Instance. Grafana will then dynamically duplicate that entire row of health metrics for every machine checking into the cluster.
- targets:
Top comments (0)