Database Monitoring Setup
Production-ready monitoring configurations for PostgreSQL, MySQL, and MongoDB — complete with Grafana dashboard JSON, Prometheus exporter configs, and alert rules that actually tell you why something is wrong, not just that it is. Deploy in under 30 minutes and stop getting woken up by false alarms.
Key Features
-
PostgreSQL monitoring via
pg_stat_statements,pg_stat_user_tables, and custom diagnostic queries with pre-built Grafana panels - MySQL Performance Schema dashboards tracking query latency histograms, InnoDB buffer pool hit rates, and replication lag
-
MongoDB Atlas-compatible metrics collection plus self-hosted
mongodmonitoring viamongostatandmongotopwrappers - 18 Grafana dashboard panels organized into overview, query performance, replication health, and storage dashboards
- Alert rules with runbooks — every alert includes a severity, description, and step-by-step resolution procedure
-
Prometheus exporters pre-configured:
postgres_exporter,mysqld_exporter, andmongodb_exporterwith secure credential handling - Baseline capture scripts to establish normal operating ranges before setting thresholds
- Slack and PagerDuty notification templates for critical, warning, and informational alerts
Quick Start
unzip database-monitoring-setup.zip
cd database-monitoring-setup/
# 1. Configure your database connection
cp config.example.yaml config.yaml
# Edit config.yaml with your database host, port, credentials
# 2. Deploy the PostgreSQL exporter
docker run -d --name pg_exporter \
-p 9187:9187 \
-e DATA_SOURCE_NAME="postgresql://monitor:YOUR_PASSWORD_HERE@db-host:5432/postgres?sslmode=require" \
prometheuscommunity/postgres-exporter
# 3. Import the Grafana dashboard
# In Grafana UI: Dashboards > Import > Upload JSON
# Use: dashboards/main.json
# 4. Load alert rules into Prometheus
cp alerts/rules.yml /etc/prometheus/rules.d/
systemctl reload prometheus
Verify the exporter is scraping:
curl -s http://localhost:9187/metrics | grep pg_stat_user_tables_n_tup_ins
# pg_stat_user_tables_n_tup_ins{datname="myapp",relname="orders"} 284710
Architecture / How It Works
database-monitoring-setup/
├── configs/
│ ├── development.yaml # Dev environment settings
│ ├── production.yaml # Production tuned settings
│ ├── exporters/
│ │ ├── postgres_exporter.yml # Custom query definitions
│ │ ├── mysqld_exporter.cnf # MySQL exporter config
│ │ └── mongodb_exporter.env # MongoDB exporter environment
│ └── collectors/
│ ├── pg_diagnostic.sql # Queries for pg_stat_*
│ ├── mysql_perf_schema.sql # Performance Schema queries
│ └── mongo_serverStatus.js # mongosh diagnostic script
├── dashboards/
│ └── main.json # Grafana dashboard panels
├── alerts/
│ ├── rules.yml # Prometheus alert rules
│ └── notification_templates/
│ ├── slack.tmpl
│ └── pagerduty.tmpl
└── config.example.yaml
Data flows: Database -> Exporter -> Prometheus -> Grafana dashboards + Alert Manager -> Slack/PagerDuty.
Usage Examples
PostgreSQL: custom exporter query for cache hit ratio:
# postgres_exporter custom queries
pg_cache_hit_ratio:
query: |
SELECT
datname,
ROUND(
blks_hit::numeric / NULLIF(blks_hit + blks_read, 0) * 100, 2
) AS cache_hit_ratio
FROM pg_stat_database
WHERE datname NOT IN ('template0', 'template1')
metrics:
- datname:
usage: "LABEL"
- cache_hit_ratio:
usage: "GAUGE"
description: "Buffer cache hit ratio percentage"
Prometheus alert rule for replication lag:
# alerts/rules.yml
groups:
- name: postgresql_replication
rules:
- alert: PostgreSQLReplicationLagHigh
expr: pg_replication_lag_seconds > 30
for: 5m
labels:
severity: warning
annotations:
summary: "Replication lag exceeds 30s on {{ $labels.instance }}"
runbook: |
1. Check replica status: SELECT * FROM pg_stat_replication;
2. Verify network between primary and replica
3. Check long-running txns: SELECT * FROM pg_stat_activity
WHERE state = 'active'
AND age(now(), xact_start) > interval '5 min';
4. If WAL replay stuck, check replica logs for errors
- alert: PostgreSQLCacheHitRatioLow
expr: pg_cache_hit_ratio < 95
for: 10m
labels:
severity: warning
annotations:
summary: "Cache hit ratio {{ $value }}% on {{ $labels.datname }}"
runbook: |
1. Check shared_buffers: SHOW shared_buffers;
2. Compare to DB size: SELECT pg_size_pretty(pg_database_size(current_database()));
3. Consider increasing shared_buffers (target: 25% of RAM)
MySQL Performance Schema diagnostic query:
-- Top 10 queries by total execution time
SELECT
DIGEST_TEXT AS query_pattern,
COUNT_STAR AS executions,
ROUND(SUM_TIMER_WAIT / 1e12, 2) AS total_seconds,
ROUND(AVG_TIMER_WAIT / 1e9, 2) AS avg_ms,
SUM_ROWS_EXAMINED AS rows_examined,
SUM_ROWS_SENT AS rows_sent
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
Configuration
# config.example.yaml
monitoring:
engine: postgresql # postgresql | mysql | mongodb
host: db-host.example.com
port: 5432
monitor_user: monitor_ro # dedicated read-only monitoring user
monitor_password: YOUR_PASSWORD_HERE
prometheus:
scrape_interval: 15s
evaluation_interval: 15s
retention: 30d
alerting:
slack_webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/HERE
pagerduty_key: YOUR_PAGERDUTY_KEY_HERE
alert_cooldown: 10m # suppress repeat alerts within window
grafana:
datasource_name: Prometheus
refresh_interval: 30s
default_time_range: 6h
Best Practices
-
Create a dedicated monitoring user with
SELECT-only permissions. Never monitor with the application user or superuser. - Set alert thresholds from baselines, not guesses. Run baseline capture for 7 days, then set thresholds at p95 + 20%.
- Alert on symptoms, not causes. "Query latency p99 > 500ms" is actionable. "CPU > 80%" often is not.
- Include runbooks in every alert annotation. At 3 AM, the on-call engineer needs steps, not a dashboard link.
- Monitor the monitors. Set a dead-man's switch alert that fires if Prometheus stops receiving metrics for > 5 minutes.
- Rotate exporter credentials on the same schedule as application credentials.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| Exporter shows 0 for all metrics | Monitor user lacks pg_monitor role |
GRANT pg_monitor TO monitor_ro; |
| Grafana dashboard shows "No Data" | Prometheus datasource misconfigured | Verify datasource URL in Grafana > Configuration > Data Sources |
| Alerts firing immediately after deploy | Thresholds too tight for your workload | Run baseline capture for 7 days, then adjust |
pg_stat_statements empty |
Extension not loaded |
CREATE EXTENSION pg_stat_statements; and add to shared_preload_libraries
|
This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [Database Monitoring Setup] with all files, templates, and documentation for $39.
Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.
Top comments (0)