PostgreSQL Partitioning for Multi-Tenant Audit Logs: Querying 100M Events Without Table Scans

#postgres #multitenancy #performance #compliance

PostgreSQL Partitioning for Multi-Tenant Audit Logs: Querying 100M Events Without Table Scans

I'll be direct: if you're running a SaaS with compliance requirements and your audit_logs table is approaching 50M rows, you're three months away from pain. I've watched audit queries go from 200ms to 8 seconds in production at 2am because someone ran a "give me all logs for tenant X" report. Partitioning isn't optimization theater—it's table-stakes infrastructure.

At CitizenApp, we store 9 months of audit logs across 50+ tenants. Without partitioning, a single compliance query would full-table scan 100M+ rows. With it, we hit the same data in <100ms. This post is exactly how we do it.

Why Partitioning Matters (The Reality Check)

Most developers treat audit_logs like any other table. You add an index on tenant_id and created_at, call it done, and move on. Then your compliance officer runs a query like:

SELECT * FROM audit_logs 
WHERE tenant_id = 'acme-corp' 
  AND created_at >= '2024-01-01'
ORDER BY created_at DESC;

At 50M rows, even with a composite index, PostgreSQL has to:

Index scan → finds millions of matching rows
Random I/O all over the table
Spill to disk if sorting is large
Hope the OS cache is warm

Partitioning solves this by eliminating the data you don't need from day one. Instead of scanning a 100GB table and filtering it down, PostgreSQL can skip entire partitions. A query against January 2024 data simply ignores partitions for February–December.

I prefer partitioning because it's native PostgreSQL—no external caching layer, no read replicas, no Redis gymnastics. It's boring infrastructure that works.

The Partitioning Strategy: Composite Partitioning (Range + List)

I use a two-level partitioning scheme:

Range partition by month (created_at) — keeps each partition to ~5–10GB
List subpartition by tenant — ensures compliance queries are single-partition scans

This is deliberately opinionated. You could do range-only, but then a multi-tenant query still scans the whole partition. You could do list-only, but then you can't efficiently age out old data.

Here's the schema:

-- Main partitioned table
CREATE TABLE audit_logs (
  id BIGSERIAL,
  tenant_id UUID NOT NULL,
  user_id UUID,
  action VARCHAR(50) NOT NULL,
  resource_type VARCHAR(100),
  resource_id VARCHAR(255),
  changes JSONB,
  ip_address INET,
  user_agent TEXT,
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  PRIMARY KEY (id, tenant_id, created_at)
) PARTITION BY RANGE (created_at);

-- January 2024 range partition
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
  PARTITION BY LIST (tenant_id);

-- Subpartitions for each tenant in January
CREATE TABLE audit_logs_2024_01_acme PARTITION OF audit_logs_2024_01
  FOR VALUES IN ('acme-corp-id-here');

CREATE TABLE audit_logs_2024_01_widget PARTITION OF audit_logs_2024_01
  FOR VALUES IN ('widget-inc-id-here');

-- Repeat for Feb, Mar, etc.
CREATE TABLE audit_logs_2024_02 PARTITION OF audit_logs
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01')
  PARTITION BY LIST (tenant_id);

Why this structure? Because a query like:

SELECT * FROM audit_logs 
WHERE tenant_id = 'acme-corp-id-here' 
  AND created_at >= '2024-01-15' 
  AND created_at < '2024-02-01';

...only touches audit_logs_2024_01_acme. PostgreSQL's constraint exclusion prunes all other partitions at planning time. No scanning, no guessing.

Automating Partition Creation

Creating partitions manually is insane. I use a stored procedure that runs monthly via a cron job (or GitHub Actions):

CREATE OR REPLACE FUNCTION create_monthly_audit_partitions(
  p_year INT,
  p_month INT,
  p_tenants UUID[]
)
RETURNS void AS $$
DECLARE
  v_table_name TEXT;
  v_start_date DATE;
  v_end_date DATE;
  v_tenant_id UUID;
BEGIN
  v_start_date := MAKE_DATE(p_year, p_month, 1);
  v_end_date := MAKE_DATE(p_year, p_month, 1) + INTERVAL '1 month';
  v_table_name := FORMAT('audit_logs_%s_%s', p_year, LPAD(p_month::TEXT, 2, '0'));

  -- Create range partition
  EXECUTE FORMAT(
    'CREATE TABLE IF NOT EXISTS %I PARTITION OF audit_logs
     FOR VALUES FROM (%L) TO (%L)
     PARTITION BY LIST (tenant_id)',
    v_table_name,
    v_start_date,
    v_end_date
  );

  -- Create subpartitions for each tenant
  FOREACH v_tenant_id IN ARRAY p_tenants
  LOOP
    EXECUTE FORMAT(
      'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
       FOR VALUES IN (%L)',
      FORMAT('%s_%s', v_table_name, SUBSTR(v_tenant_id::TEXT, 1, 8)),
      v_table_name,
      v_tenant_id
    );
  END LOOP;

END;
$$ LANGUAGE plpgsql;

Run it from your FastAPI worker or cron:

from sqlalchemy import text
from app.db import SessionLocal

def create_partitions_for_next_month():
    from datetime import datetime, timedelta

    db = SessionLocal()
    today = datetime.now()
    next_month = today + timedelta(days=32)

    # Get all active tenants
    tenants = db.execute(
        text("SELECT id FROM tenants WHERE active = true")
    ).scalars().all()

    db.execute(
        text("""
            SELECT create_monthly_audit_partitions(:year, :month, :tenants)
        """),
        {
            "year": next_month.year,
            "month": next_month.month,
            "tenants": [str(t) for t in tenants]
        }
    )
    db.commit()
    db.close()

# Schedule this monthly
# (via APScheduler, Celery, or GitHub Actions)

Query Example: Compliance Report (Still Fast at 100M Rows)

# FastAPI endpoint: export audit logs for compliance
@app.get("/api/v1/tenants/{tenant_id}/audit-export")
async def export_audit_logs(
    tenant_id: str,
    start_date: datetime,
    end_date: datetime,
    db: Session = Depends(get_db)
):
    # This query touches only the relevant partitions
    logs = db.query(AuditLog).filter(
        AuditLog.tenant_id == tenant_id,
        AuditLog.created_at >= start_date,
        AuditLog.created_at < end_date
    ).order_by(AuditLog.created_at.desc()).all()

    return [log.to_dict() for log in logs]

With partitioning, this runs in ~80ms even for a year of data. Without it, 8–12 seconds.

Gotcha: The Dreaded Default Partition

This burned me hard: if you don't explicitly list every tenant in your subpartitions, inserts fail silently or slow crawl into a DEFAULT partition.

-- DON'T do this:
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
  PARTITION BY LIST (tenant_id)
  DEFAULT;  -- ← This is a trap

New tenants will insert into the DEFAULT partition, which defeats the entire purpose. Instead, always create explicit subpartitions when onboarding a tenant:

async def create_tenant(tenant_name: str, db: Session):
    tenant = Tenant(name=tenant_name)
    db.add(tenant)
    db.flush()

    # Create subpartitions for this tenant in all existing month partitions
    from datetime import datetime
    today = datetime.now()

    for month_offset in range(-9, 1):  # Last 9 months + current
        dt = today.replace(day=1) - relativedelta(months=-month_offset)
        partition_name = f"audit_logs_{dt.year}_{dt.month:02d}"

        db.execute(text(f"""
            CREATE TABLE IF NOT EXISTS {partition_name}_{tenant.id.hex[:8]}
            PARTITION OF {partition_name}
            FOR VALUES IN ('{tenant.id}')
        """))

    db.commit()

Archival and Retention

Partitioning also makes data lifecycle management trivial. Drop old partitions instead of DELETE statements:

-- Drop all audit logs from January 2023 instantly
DROP TABLE audit_logs_2023_01;

This is O(1)—no table scan, no transaction log bloat. Compare to DELETE FROM audit_logs WHERE created_at < '2023-02-01', which locks the table and generates gigabytes of WAL.

Final Take

Partitioning isn't "optimization" in the sense of indexes or caching. It's correctness. A well-partitioned audit table is queryable, maintainable, and compliant. An unpartitioned