PostgreSQL Partitioning for Multi-Tenant Audit Logs: Querying 100M Events Without Table Scans
I'll be direct: if you're running a SaaS with compliance requirements and your audit_logs table is approaching 50M rows, you're three months away from pain. I've watched audit queries go from 200ms to 8 seconds in production at 2am because someone ran a "give me all logs for tenant X" report. Partitioning isn't optimization theater—it's table-stakes infrastructure.
At CitizenApp, we store 9 months of audit logs across 50+ tenants. Without partitioning, a single compliance query would full-table scan 100M+ rows. With it, we hit the same data in <100ms. This post is exactly how we do it.
Why Partitioning Matters (The Reality Check)
Most developers treat audit_logs like any other table. You add an index on tenant_id and created_at, call it done, and move on. Then your compliance officer runs a query like:
SELECT * FROM audit_logs
WHERE tenant_id = 'acme-corp'
AND created_at >= '2024-01-01'
ORDER BY created_at DESC;
At 50M rows, even with a composite index, PostgreSQL has to:
- Index scan → finds millions of matching rows
- Random I/O all over the table
- Spill to disk if sorting is large
- Hope the OS cache is warm
Partitioning solves this by eliminating the data you don't need from day one. Instead of scanning a 100GB table and filtering it down, PostgreSQL can skip entire partitions. A query against January 2024 data simply ignores partitions for February–December.
I prefer partitioning because it's native PostgreSQL—no external caching layer, no read replicas, no Redis gymnastics. It's boring infrastructure that works.
The Partitioning Strategy: Composite Partitioning (Range + List)
I use a two-level partitioning scheme:
-
Range partition by month (
created_at) — keeps each partition to ~5–10GB - List subpartition by tenant — ensures compliance queries are single-partition scans
This is deliberately opinionated. You could do range-only, but then a multi-tenant query still scans the whole partition. You could do list-only, but then you can't efficiently age out old data.
Here's the schema:
-- Main partitioned table
CREATE TABLE audit_logs (
id BIGSERIAL,
tenant_id UUID NOT NULL,
user_id UUID,
action VARCHAR(50) NOT NULL,
resource_type VARCHAR(100),
resource_id VARCHAR(255),
changes JSONB,
ip_address INET,
user_agent TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, tenant_id, created_at)
) PARTITION BY RANGE (created_at);
-- January 2024 range partition
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
PARTITION BY LIST (tenant_id);
-- Subpartitions for each tenant in January
CREATE TABLE audit_logs_2024_01_acme PARTITION OF audit_logs_2024_01
FOR VALUES IN ('acme-corp-id-here');
CREATE TABLE audit_logs_2024_01_widget PARTITION OF audit_logs_2024_01
FOR VALUES IN ('widget-inc-id-here');
-- Repeat for Feb, Mar, etc.
CREATE TABLE audit_logs_2024_02 PARTITION OF audit_logs
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01')
PARTITION BY LIST (tenant_id);
Why this structure? Because a query like:
SELECT * FROM audit_logs
WHERE tenant_id = 'acme-corp-id-here'
AND created_at >= '2024-01-15'
AND created_at < '2024-02-01';
...only touches audit_logs_2024_01_acme. PostgreSQL's constraint exclusion prunes all other partitions at planning time. No scanning, no guessing.
Automating Partition Creation
Creating partitions manually is insane. I use a stored procedure that runs monthly via a cron job (or GitHub Actions):
CREATE OR REPLACE FUNCTION create_monthly_audit_partitions(
p_year INT,
p_month INT,
p_tenants UUID[]
)
RETURNS void AS $$
DECLARE
v_table_name TEXT;
v_start_date DATE;
v_end_date DATE;
v_tenant_id UUID;
BEGIN
v_start_date := MAKE_DATE(p_year, p_month, 1);
v_end_date := MAKE_DATE(p_year, p_month, 1) + INTERVAL '1 month';
v_table_name := FORMAT('audit_logs_%s_%s', p_year, LPAD(p_month::TEXT, 2, '0'));
-- Create range partition
EXECUTE FORMAT(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF audit_logs
FOR VALUES FROM (%L) TO (%L)
PARTITION BY LIST (tenant_id)',
v_table_name,
v_start_date,
v_end_date
);
-- Create subpartitions for each tenant
FOREACH v_tenant_id IN ARRAY p_tenants
LOOP
EXECUTE FORMAT(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
FOR VALUES IN (%L)',
FORMAT('%s_%s', v_table_name, SUBSTR(v_tenant_id::TEXT, 1, 8)),
v_table_name,
v_tenant_id
);
END LOOP;
END;
$$ LANGUAGE plpgsql;
Run it from your FastAPI worker or cron:
from sqlalchemy import text
from app.db import SessionLocal
def create_partitions_for_next_month():
from datetime import datetime, timedelta
db = SessionLocal()
today = datetime.now()
next_month = today + timedelta(days=32)
# Get all active tenants
tenants = db.execute(
text("SELECT id FROM tenants WHERE active = true")
).scalars().all()
db.execute(
text("""
SELECT create_monthly_audit_partitions(:year, :month, :tenants)
"""),
{
"year": next_month.year,
"month": next_month.month,
"tenants": [str(t) for t in tenants]
}
)
db.commit()
db.close()
# Schedule this monthly
# (via APScheduler, Celery, or GitHub Actions)
Query Example: Compliance Report (Still Fast at 100M Rows)
# FastAPI endpoint: export audit logs for compliance
@app.get("/api/v1/tenants/{tenant_id}/audit-export")
async def export_audit_logs(
tenant_id: str,
start_date: datetime,
end_date: datetime,
db: Session = Depends(get_db)
):
# This query touches only the relevant partitions
logs = db.query(AuditLog).filter(
AuditLog.tenant_id == tenant_id,
AuditLog.created_at >= start_date,
AuditLog.created_at < end_date
).order_by(AuditLog.created_at.desc()).all()
return [log.to_dict() for log in logs]
With partitioning, this runs in ~80ms even for a year of data. Without it, 8–12 seconds.
Gotcha: The Dreaded Default Partition
This burned me hard: if you don't explicitly list every tenant in your subpartitions, inserts fail silently or slow crawl into a DEFAULT partition.
-- DON'T do this:
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
PARTITION BY LIST (tenant_id)
DEFAULT; -- ← This is a trap
New tenants will insert into the DEFAULT partition, which defeats the entire purpose. Instead, always create explicit subpartitions when onboarding a tenant:
async def create_tenant(tenant_name: str, db: Session):
tenant = Tenant(name=tenant_name)
db.add(tenant)
db.flush()
# Create subpartitions for this tenant in all existing month partitions
from datetime import datetime
today = datetime.now()
for month_offset in range(-9, 1): # Last 9 months + current
dt = today.replace(day=1) - relativedelta(months=-month_offset)
partition_name = f"audit_logs_{dt.year}_{dt.month:02d}"
db.execute(text(f"""
CREATE TABLE IF NOT EXISTS {partition_name}_{tenant.id.hex[:8]}
PARTITION OF {partition_name}
FOR VALUES IN ('{tenant.id}')
"""))
db.commit()
Archival and Retention
Partitioning also makes data lifecycle management trivial. Drop old partitions instead of DELETE statements:
-- Drop all audit logs from January 2023 instantly
DROP TABLE audit_logs_2023_01;
This is O(1)—no table scan, no transaction log bloat. Compare to DELETE FROM audit_logs WHERE created_at < '2023-02-01', which locks the table and generates gigabytes of WAL.
Final Take
Partitioning isn't "optimization" in the sense of indexes or caching. It's correctness. A well-partitioned audit table is queryable, maintainable, and compliant. An unpartitioned
Top comments (0)