abidaslam892

Posted on Nov 15

Production Monitoring Made Easy: Prometheus, Grafana, and Docker Explained

#webdev #productivity #kubernetes

Original Post:

https://medium.com/design-bootcamp/production-monitoring-made-easy-prometheus-grafana-and-docker-explained-f373607102ed

From Zero to Observability: Building a Production-Grade Monitoring Stack with Prometheus & Grafana

Introduction
In today’s cloud-native world, monitoring isn’t optional — it’s essential. Whether you’re running a small side project or managing enterprise infrastructure, you need visibility into your systems. But setting up monitoring shouldn’t require weeks of configuration and a PhD in DevOps.

In this comprehensive guide, I’ll walk you through building a production-ready monitoring stack using three powerful open-source tools:

Docker for containerization
Prometheus for metrics collection
Grafana for visualization

By the end of this tutorial, you’ll have:

A fully functional monitoring stack running in containers
Real-time system metrics from your infrastructure
Beautiful, interactive dashboards
Knowledge to extend and customize for your needs
Time to complete: 30 minutes
Skill level: Beginner to Intermediate
Prerequisites: Basic command-line knowledge, Docker installed
Why This Stack?

The Problem

Traditional monitoring setups are often:

Complex - Multiple services, complicated configurations
Expensive - Enterprise solutions cost thousands per month
Inflexible- Vendor lock-in limits customization
Hard to scale - Difficult to add new metrics or exporters
The Solution
Our stack solves these problems:

Simple - Deploy everything with one command
Free & Open Source - No licensing costs
Highly Customizable - Full control over metrics and dashboards
Scalable - Easy to add exporters and federate Prometheus

Architecture Overview

_Here's what we're building: _

*Components: *
Prometheus— Collects and stores time-series metrics
Grafana — Creates beautiful dashboards and visualizations
Node Exporter — Exposes system-level metrics (CPU, RAM, disk)
Application Exporter — Custom metrics from your applications

Part 1: Setting Up the Foundation

Step 1: Prepare Your Environment

First, ensure you have Docker and Docker Compose installed:

Check Docker version

docker — version

Docker version 20.10.0 or higher required

Check Docker Compose version

docker-compose — version

Docker Compose version 2.20.0 or higher recommended

Step 2: Create Project Structure

Create project directory

mkdir monitoring-stack && cd monitoring-stack

Create necessary directories

mkdir -p prometheus grafana src

Step 3: Configure Prometheus

Create prometheus/prometheus.yml:


global:

scrape_interval: 15s # Scrape targets every 15 seconds

evaluation_interval: 15s # Evaluate rules every 15 seconds

scrape_configs:

Prometheus monitors itself
- job_name: ‘prometheus’

static_configs:

- targets: [‘localhost:9090’]

Node Exporter — System metrics
- job_name: ‘node_exporter’

static_configs:

- targets: [‘host.docker.internal:9100’]

scrape_interval: 15s

Custom application metrics
- job_name: ‘application’

static_configs:

- targets: [‘host.docker.internal:8000’]

metrics_path: ‘/metrics’

scrape_interval: 5s

## What’s happening here?

scrape_interval: How often Prometheus collects metrics
job_name: Logical grouping for targets
targets: Where to find metrics endpoints
host.docker.internal: Allows containers to reach the host machine

## Part 2: Docker Compose Configuration

Create `docker-compose.yml` in your project root:

yaml

version: ‘3.8’

services:

prometheus:

image: prom/prometheus:latest

container_name: prometheus

ports:

“9091:9090”

volumes:

./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
prometheus_data:/prometheus

command:

‘ — config.file=/etc/prometheus/prometheus.yml’
‘ — storage.tsdb.path=/prometheus’
‘ — storage.tsdb.retention.time=30d’

restart: unless-stopped

networks:

monitoring

grafana:
image: grafana/grafana:latest

container_name: grafana

ports:

“3000:3000”

environment:

GF_SECURITY_ADMIN_PASSWORD=admin
GF_USERS_ALLOW_SIGN_UP=false

volumes:

grafana_data:/var/lib/grafana

restart: unless-stopped

networks:

monitoring

depends_on:

prometheus

volumes:

prometheus_data:

grafana_data:

networks:

monitoring:

driver: bridge


## Key Configuration Details

1. Ports: Prometheus on 9091, Grafana on 3000
2. Volumes: Persist data even if containers restart
3. Networks: Isolated bridge network for service communication
4. Retention: Keep metrics for 30 days
5. Restart Policy: Automatically restart on failure

## Part 3: Installing Node Exporter

Node Exporter provides system-level metrics. Install it on your host machine:
Create a dedicated user

sudo useradd — no-create-home — shell /bin/false node_exporter

Download Node Exporter

cd /tmp

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

Extract and install

tar xzf node_exporter-1.7.0.linux-amd64.tar.gz

sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create a systemd service at `/etc/systemd/system/node_exporter.service`:

ini

[Unit]

Description=Node Exporter

After=network.target

[Service]

User=node_exporter

Group=node_exporter

Type=simple

ExecStart=/usr/local/bin/node_exporter

[Install]

WantedBy=multi-user.target


Start the service:

bash

sudo systemctl daemon-reload

sudo systemctl start node_exporter

sudo systemctl enable node_exporter

Verify it’s running

curl http://localhost:9100/metrics | head -20

You should see metrics output like:

HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.

TYPE node_cpu_seconds_total counter

node_cpu_seconds_total{cpu=”0",mode=”idle”} 12345.67

node_cpu_seconds_total{cpu=”0",mode=”user”} 890.12

Part 4: Creating a Custom Metrics Exporter
Let’s create a simple Python application that exposes custom metrics.

Create src/metrics_exporter.py:


#!/usr/bin/env python3

“””

## Simple Prometheus Metrics Exporter

Demonstrates how to instrument your applications

“””

from prometheus_client import start_http_server, Counter, Gauge, Histogram

import psutil

import random

import time

Define application metrics

request_count = Counter(

‘app_requests_total’,

‘Total HTTP requests’,

[‘method’, ‘endpoint’, ‘status’]

)

active_users = Gauge(

‘app_active_users’,

‘Number of active users’

)

response_time = Histogram(

‘app_response_time_seconds’,

‘Response time in seconds’,

buckets=[0.1, 0.5, 1.0, 2.0, 5.0]

)

System metrics

cpu_gauge = Gauge(‘system_cpu_percent’, ‘CPU usage percentage’)

memory_gauge = Gauge(‘system_memory_percent’, ‘Memory usage percentage’)

disk_gauge = Gauge(‘system_disk_percent’, ‘Disk usage percentage’)

def collect_system_metrics():

Collect system metrics using psutil

cpu_gauge.set(psutil.cpu_percent(interval=1))

memory_gauge.set(psutil.virtual_memory().percent)

disk_gauge.set(psutil.disk_usage(‘/’).percent)

def simulate_application_activity():

Simulate application metrics for demo purposes

methods = [‘GET’, ‘POST’, ‘PUT’, ‘DELETE’]

endpoints = [‘/api/users’, ‘/api/orders’, ‘/api/products’]

statuses = [200, 201, 400, 404, 500]

Simulate a request

method = random.choice(methods)

endpoint = random.choice(endpoints)

status = random.choices(statuses, weights=[85, 10, 3, 1, 1])[0]

request_count.labels(method=method, endpoint=endpoint, status=status).inc()

Simulate response time

response_time.observe(random.uniform(0.05, 2.0))

Update active users

active_users.set(random.randint(10, 100))

def main():

“””Main exporter loop”””

Start metrics server on port 8000

PORT = 8000

start_http_server(PORT)

print (f”Metrics server started on port {PORT}”)

print (f” Metrics available at http://localhost: {PORT}/metrics")

Create `requirements.txt`:
Press enter or click to view image in full size

Create `start_exporter.sh
Press enter or click to view image in full size

Check if Python is installed

if ! command -v python3 &> /dev/null; then

echo “ Python 3 is not installed”

exit 1

fi

Install dependencies

pip3 install -r requirements.txt

Start the exporter

python3 src/metrics_exporter.py

Part 5: Launching the Stack
- Now we’re ready to start everything:
Start Prometheus and Grafana

docker-compose up -d

Check if containers are running

docker-compose ps

NAME IMAGE STATUS

grafana grafana/grafana:latest Up

prometheus prom/prometheus:latest Up

## Access your services

Prometheus: http://localhost:9091
Grafana: http://localhost:3000 (admin/admin)
Node Exporter Metrics: http://localhost:9100/metrics
Application Metrics: http://localhost:8000/metrics

## Part 6: Configuring Grafana

**Step 1: Add Prometheus as a Data Source**

Open Grafana at http://localhost:3000
Login with `admin` / `admin` (change password when prompted)
Go to Configuration → Data Sources
Click Add data source
Select Prometheus
Set URL: `http://prometheus:9090
Click Save & Test
You should see: “Data source is working”

**Step 2: Import a Dashboard**
Go to Dashboards → Import
Enter dashboard ID: 1860 (Node Exporter Full)
Click Load
Select Prometheus as the data source
Click Import
You now have a beautiful dashboard showing:
CPU usage across all cores
Memory utilization
Disk space and I/O
Network traffic
System load
Press enter or click to view image in full size

Press enter or click to view image in full size

Press enter or click to view image in full size

## Part 7: Creating Custom Dashboards
Let’s create a custom dashboard for our application metrics.

**Step 1: Create a New Dashboard**

Click **+** → **Create Dashboard**
Click **Add new panel**
Step 2: Add a Request Rate Panel

Query:

promql

rate(app_requests_total[5m])


Panel Settings:
Title: “HTTP Request Rate”
Visualization: Time series
Legend: `{{method}} {{endpoint}}`
Step 3: Add Active Users Panel
Query:

promql

app_active_users


Panel Settings:
Title: “Active Users”
Visualization: Stat
Color: Based on thresholds (green < 50, yellow < 80, red >= 80)
Step 4: Add Response Time Panel
Query:

promql

histogram_quantile(0.95, rate(app_response_time_seconds_bucket[5m]))


Panel Settings:
Title: “95th Percentile Response Time”
Visualization: Gauge
Unit: seconds
Step 5: Add CPU Usage Panel
Query:

promql

100 — (avg(rate(node_cpu_seconds_total{mode=”idle”}[5m])) * 100)


## Panel Settings:

Title: “CPU Usage %”
Visualization: Graph
Thresholds: Yellow at 60%, Red at 80%
Click Save dashboard and give it a name like “Application Monitoring”.
Part 8: Understanding PromQL
Prometheus Query Language (PromQL) is powerful. Here are essential queries:

## Press enter or click to view image in full size

Basic Queries

promql

Get current value

node_memory_MemTotal_bytes

Press enter or click to view image in full size

Rate of change over 5 minutes

rate(node_cpu_seconds_total[5m])

Press enter or click to view image in full size

Average across all instances

avg(node_load1)

Press enter or click to view image in full size

Part 9: Setting Up Alerts

Alerts notify you when things go wrong. Let’s configure some.

Create prometheus/alerts.yml:

Press enter or click to view image in full size

Update prometheus/prometheus.yml to include alerts:


global:

scrape_interval: 15s

evaluation_interval: 15s

Load alert rules

rule_files:

- ‘/etc/prometheus/alerts.yml’

scrape_configs:

# … (existing scrape configs)

Update docker-compose.yml to mount the alerts file:


prometheus:

# … (existing config)

volumes:

- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml

- prometheus_data:/prometheus

Restart Prometheus:


docker-compose restart prometheus

Check alerts at http://localhost:9091/alerts
Press enter or click to view image in full size

## Part 10: Production Best Practices

**Security**

**1. Change Default Passwords**

Update `docker-compose.yml`:

yaml

grafana:

environment:

GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}


Create `.env` file:

GRAFANA_PASSWORD=your_secure_password_here


**2. Use Read-Only Volumes**

yaml

volumes:

./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro


**3. Run as Non-Root User**

Resource Limits


Backup Strategy

Backup Prometheus data

docker run — rm \

-v prometheus_data:/data \

-v $(pwd)/backups:/backup \

alpine tar czf /backup/prometheus-$(date +%Y%m%d).tar.gz /data

Backup Grafana data

docker run — rm \

-v grafana_data:/data \

-v $(pwd)/backups:/backup \

alpine tar czf /backup/grafana-$(date +%Y%m%d).tar.gz /data

High Availability

For production, consider:

Prometheus Federation — Multiple Prometheus instances
Thanos — Long-term storage and global view
Grafana HA — Multiple Grafana instances behind load balancer

Part 11: Troubleshooting Common Issues

Issue 1: Container Won’t Start


# Check logs

docker-compose logs prometheus

docker-compose logs grafana

# Common causes:

# — Port already in use

# — Configuration file syntax error

# — Insufficient permissions

**Issue 2: Grafana Can’t Connect to Prometheus**

**Problem: Data source test fails**

Solution: Use container name, not localhost:

URL: http://prometheus:9090
URL: http://localhost:9091
Issue 3: No Metrics Showing

# Check Prometheus targets

curl http://localhost:9091/api/v1/targets | jq

# Verify exporters are reachable

curl http://localhost:9100/metrics

curl http://localhost:8000/metrics

**Issue 4: Data Not Persisting**

# Check volume mounts

docker inspect prometheus | grep -A 10 Mounts

# Fix permissions (Prometheus runs as UID 65534)

sudo chown -R 65534:65534 prometheus_data/

## Part 12: Extending Your Stack

Add MySQL Monitoring

Add Nginx Monitoring

Add Redis Monitoring

## Part 13: Real-World Use Cases

**Use Case 1: E-commerce Platform**

Metrics to track:

1. Order processing rate
2. Payment gateway latency
3. Inventory stock levels
4. User cart abandonment rate

**Sample custom metrics:**

python

from prometheus_client import Counter, Histogram

orders_total = Counter(‘orders_total’, ‘Total orders’, [‘status’])

payment_duration = Histogram(‘payment_duration_seconds’, ‘Payment processing time’)

inventory_stock = Gauge(‘inventory_stock’, ‘Product stock level’, [‘product_id’])


Use Case 2: API Service
Metrics to track:

Request rate per endpoint
Response time percentiles
Error rates by status code
Rate limiting hits
PromQL Queries:

promql

Requests per second by endpoint

rate(api_requests_total[1m]) by (endpoint)

99th percentile latency

histogram_quantile(0.99, rate(api_duration_seconds_bucket[5m]))

Error rate

sum(rate(api_requests_total{status=~”5..”}[5m])) / sum(rate(api_requests_total[5m]))


Use Case 3: Batch Processing Pipeline
Metrics to track:

Job completion time
Records processed per minute
Failed jobs count
Queue depth
Part 14: Performance Optimization
Optimize Prometheus Storage

Optimize Scrape Intervals

Use Recording Rules for Expensive Queries
Press enter or click to view image in full size

Then use the pre-computed metrics:

promql

Instead of this expensive query:

rate(api_requests_total[5m])

Use this:

job:api_request_rate:5m




Conclusion
You’ve built a complete monitoring stack from scratch. Here’s what you’ve accomplished:

Deployed a containerized monitoring infrastructure
Configured Prometheus to collect metrics
Created beautiful Grafana dashboards
Instrumented a custom application
Set up alerts for critical issues
Learned PromQL for advanced queries
Applied production best practices
Key Takeaways
Docker makes deployment simple — One command starts everything
Prometheus is powerful — Time-series data with flexible querying
Grafana is beautiful — Create stunning, informative dashboards
Monitoring is essential — Know what’s happening in your systems
Start simple, extend gradually — Add exporters as you need them
Next Steps
Deploy to production — Use Docker Swarm or Kubernetes
Add more exporters — Monitor databases, message queues, etc.
Implement alerting — Connect to Slack, PagerDuty, or email
Long-term storage — Integrate Thanos for infinite retention
Advanced dashboards — Create business-specific metrics
Resources
GitHub Repository: (https://github.com/abidaslam892/Grafana-Prometheus-Monitoring-Deployment-)
Prometheus Docs: https://prometheus.io/docs/
Grafana Dashboards: https://grafana.com/grafana/dashboards/
PromQL Guide: https://prometheus.io/docs/prometheus/latest/querying/basics/
Docker Docs: https://docs.docker.com/
Questions?
Feel free to reach out in the comments below! I’d love to hear:

What are you monitoring?
What challenges did you face?
What metrics matter most to your business?
If this guide helped you, please:
- ⭐ Star the GitHub repository

- 👏 Clap for this article

- 🔗 Share with your team

- 💬 Leave a comment

Happy monitoring! 📊
#Docker #Prometheus #Grafana #DevOps #Monitoring #Kubernetes #CloudNative #SRE #Infrastructure #Tutorial

DEV Community

Production Monitoring Made Easy: Prometheus, Grafana, and Docker Explained

From Zero to Observability: Building a Production-Grade Monitoring Stack with Prometheus & Grafana

By the end of this tutorial, you’ll have:

The Problem

Architecture Overview

Part 1: Setting Up the Foundation

Step 2: Create Project Structure

Step 3: Configure Prometheus

HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.

TYPE node_cpu_seconds_total counter

Part 9: Setting Up Alerts

High Availability

Part 11: Troubleshooting Common Issues

Requests per second by endpoint

99th percentile latency

Error rate

Instead of this expensive query:

Use this:

Top comments (0)