This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.
Database Capacity Planning: Sizing, Growth Forecasting, and Scaling
Database Capacity Planning: Sizing, Growth Forecasting, and Scaling
Capacity planning ensures your database has enough resources to handle current and future workloads without over-provisioning. It is a continuous process that combines monitoring, forecasting, and proactive scaling.
Key Capacity Metrics
Storage
Storage is the most predictable resource to plan. Track:
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Database sizes
SELECT datname,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Table sizes (top 10)
SELECT relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Growth by day
SELECT date(created_at) AS day,
count(*) AS rows_added,
count(*) * 200 AS estimated_bytes -- rough estimate
FROM orders
WHERE created_at > now() - interval '30 days'
GROUP BY day
ORDER BY day;
Compute (CPU)
CPU usage correlates with query complexity and concurrency:
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Queries with highest total CPU time
SELECT queryid, query,
total_exec_time,
calls,
mean_exec_time,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Monitor: CPU utilization %, replication CPU usage, autovacuum CPU usage.
Memory
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Shared buffers usage
SELECT name, setting, unit,
current_setting(name)::numeric / pg_size_pretty('') AS ratio
FROM pg_settings
WHERE name IN ('shared_buffers', 'effective_cache_size',
'work_mem', 'maintenance_work_mem');
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Hit ratio (should be >99%)
SELECT 'shared_buffers' AS area,
sum(blks_hit)::float / (sum(blks_hit) + sum(blks_read)) AS hit_ratio
FROM pg_stat_database;
Connections
SELECT max_conn.setting AS max_connections,
used_conn.count AS used_connections,
used_conn.count::float / max_conn.setting::int AS utilization_pct
FROM (SELECT setting FROM pg_settings WHERE name = 'max_connections') max_conn,
(SELECT count(*) AS count FROM pg_stat_activity) used_conn;
Growth Forecasting
Simple Linear Model
import psycopg2
from datetime import datetime, timedelta
import numpy as np
conn = psycopg2.connect("dbname=mydb")
cur = conn.cursor()
Get daily row counts for last 90 days
cur.execute("""
SELECT date(created_at) AS day, count(*) AS rows
FROM orders
WHERE created_at > now() - interval '90 days'
GROUP BY day
ORDER BY day
""")
data = cur.fetchall()
days = np.array([(row[0] - data[0][0]).days for row in data])
rows = np.array([row[1] for row in data])
Linear regression
coefficients = np.polyfit(days, rows, 1)
daily_growth = coefficients[0]
Forecast: 90 days out
forecast_days = 90
current_total = sum(rows)
forecast_total = current_total + daily_growth * forecast_days
print(f"Daily growth: {daily_growth:.0f} rows")
print(f"Current monthly row count: {current_total}")
print(f"Forecast in 90 days: {forecast_total:.0f} rows")
Projecting Storage
avg_row_size_bytes = 250 # From pgstattuple
bytes_per_day = daily_growth * avg_row_size_bytes
gb_per_month = bytes_per_day * 30 / (1024**3)
current_gb = 10 # Current database size
months_to_full = (50 - current_gb) / gb_per_month # Assuming 50 GB limit
print(f"Growth: {gb_per_month:.1f} GB/month")
print(f"Time to 50 GB: {months_to_full:.0f} months")
Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.
Found this useful? Check out more developer guides and tool comparisons on AI Study Room.
Top comments (0)