ahmet gedik

Posted on Jun 26

Designing a Multi-Tenant Video Platform Schema in PostgreSQL with RLS

#postgres #multitenancy #gdpr #backend

Last quarter we onboarded three white-label partners onto our viral video discovery engine, and within a week I hit the wall every multi-tenant system eventually hits: a partner's analytics query returned 400 rows that belonged to another partner. Nobody got hurt — it was a staging leak caught by an integration test — but it was the kind of bug that ends companies when it happens in production under GDPR. At ViralVidVault we track trending European videos across dozens of branded front-ends, and every one of those front-ends is legally a separate data controller. A row leaking across tenant boundaries isn't a UX glitch; it's a reportable data breach. This article is the schema design I wish I'd had on day one: how to model tenants, videos, and analytics in PostgreSQL so that isolation is enforced by the database, not by hopeful WHERE clauses scattered across an application.

The three isolation models and why I picked shared-schema

Before writing a single CREATE TABLE, you have to commit to an isolation strategy, because it shapes everything downstream. There are three real options:

Database-per-tenant: every tenant gets its own PostgreSQL database. Maximum isolation, trivial per-tenant backup and deletion (great for GDPR erasure), but connection-pool explosion and migration pain. Running 200 migrations across 200 databases is a bad afternoon.
Schema-per-tenant: one database, a PostgreSQL schema per tenant. Decent isolation, but search_path juggling gets ugly and the pg_catalog bloats once you pass a few hundred tenants.
Shared-schema with a tenant discriminator: every table carries a tenant_id column, and you enforce isolation with Row-Level Security. One schema, one migration, one connection pool.

For a viral-video platform the access pattern is read-heavy and cross-tenant in aggregate — I want to compute "what's trending across all of Europe" while still keeping each partner's first-party analytics walled off. Shared-schema fits that, as long as the isolation is enforced at the database layer. The mistake people make is choosing shared-schema and then enforcing isolation in application code. That's the model that leaked 400 rows on me. PostgreSQL's Row-Level Security (RLS) moves the boundary into the engine, where a forgotten WHERE clause can't bypass it.

The core schema

Here's the foundation. Note that tenant_id is the first column of every composite primary key — this matters enormously for index locality, which I'll come back to.

-- Tenants are the data-controller boundary. One row per branded front-end.
CREATE TABLE tenants (
    id           uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    slug         text NOT NULL UNIQUE,
    region       text NOT NULL CHECK (region IN ('EU','UK','EEA')),
    data_region  text NOT NULL DEFAULT 'eu-central-1',
    created_at   timestamptz NOT NULL DEFAULT now()
);

-- Videos discovered from upstream sources, scoped to the tenant that surfaced them.
CREATE TABLE videos (
    tenant_id    uuid NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
    id           uuid NOT NULL DEFAULT gen_random_uuid(),
    source_id    text NOT NULL,              -- e.g. the upstream platform's video id
    title        text NOT NULL,
    lang         text NOT NULL,
    published_at timestamptz NOT NULL,
    trend_score  numeric(8,4) NOT NULL DEFAULT 0,
    PRIMARY KEY (tenant_id, id),
    UNIQUE (tenant_id, source_id)
);

-- First-party engagement events. This is the GDPR-sensitive table.
CREATE TABLE view_events (
    tenant_id   uuid NOT NULL,
    video_id    uuid NOT NULL,
    occurred_at timestamptz NOT NULL DEFAULT now(),
    -- A salted daily hash, NOT a raw IP. More on this below.
    visitor_key bytea NOT NULL,
    country     text NOT NULL,
    watch_ms    integer NOT NULL CHECK (watch_ms >= 0),
    FOREIGN KEY (tenant_id, video_id) REFERENCES videos(tenant_id, id) ON DELETE CASCADE
) PARTITION BY RANGE (occurred_at);

A few decisions worth defending:

The videos primary key is (tenant_id, id), not just id. Every lookup is already scoped by tenant, so the composite key gives you a tenant-local B-tree where a single tenant's rows sit physically adjacent in the index. On a 50-million-row table that's the difference between a tight index range scan and a scattered one.
The composite foreign key (tenant_id, video_id) → videos(tenant_id, id) makes it structurally impossible to attach a view event to a video belonging to a different tenant. The referential integrity check spans the discriminator. People forget you can do this, and it's one of the strongest guarantees in the whole design.
view_events is range-partitioned by time. Viral traffic is spiky and recent-heavy; partitioning by week lets old partitions be detached and dropped cheaply, which is also your GDPR retention lever.

Making Row-Level Security the actual boundary

RLS is where the isolation gets teeth. The pattern is: set a session-local variable holding the current tenant, then write policies that compare every row against it. The application connects as a role that is subject to RLS and can never see across the boundary, even with a buggy query.

-- A GUC (runtime parameter) carries the active tenant for the connection.
ALTER DATABASE vvv SET app.current_tenant = '';

ALTER TABLE videos      ENABLE ROW LEVEL SECURITY;
ALTER TABLE view_events ENABLE ROW LEVEL SECURITY;

-- FORCE makes the policy apply even to the table owner. Without this,
-- a superuser-ish role silently bypasses every policy you wrote.
ALTER TABLE videos      FORCE ROW LEVEL SECURITY;
ALTER TABLE view_events FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON videos
    USING (tenant_id = current_setting('app.current_tenant')::uuid)
    WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);

CREATE POLICY tenant_isolation ON view_events
    USING (tenant_id = current_setting('app.current_tenant')::uuid)
    WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);

The two clauses do different jobs and you need both:

USING filters what SELECT, UPDATE, and DELETE can see. A query without any tenant_id filter silently returns only the current tenant's rows.
WITH CHECK validates what INSERT and UPDATE can write. It blocks a tenant from inserting a row stamped with someone else's tenant_id — the move that would otherwise smuggle data across the wall.

One gotcha that cost me an afternoon: current_setting('app.current_tenant') throws if the GUC is unset, which turns a missing-context bug into a hard error. That's actually what you want — fail closed. If you prefer it to return empty instead, use the two-argument form current_setting('app.current_tenant', true), but then an unset context yields NULL = tenant_id, which is NULL, which filters everything out. Either way the failure mode is "see nothing," never "see everything." Design for that.

Setting tenant context from PHP 8.4

Our edge is PHP behind LiteSpeed, and the rule we enforce is: the tenant is resolved once, at the connection boundary, and never trusted from user input again. We derive it from the verified host header (each branded domain maps to exactly one tenant), set the GUC, and from that point RLS does the rest. Here's the connection wrapper, using PHP 8.4 property hooks and an explicit set_config bound as a parameter so there's no string interpolation into SQL.

<?php
declare(strict_types=1);

final class TenantConnection
{
    private \PDO $pdo;

    public string $tenantId {
        get => $this->tenantId;
    }

    public function __construct(\PDO $pdo, string $tenantId)
    {
        if (!self::isUuid($tenantId)) {
            throw new \InvalidArgumentException('tenant id must be a uuid');
        }
        $this->pdo = $pdo;
        $this->tenantId = $tenantId;

        // set_config(name, value, is_local=true) scopes the GUC to the
        // current transaction, so a pooled connection can't leak context
        // into the next request that reuses it.
        $stmt = $this->pdo->prepare(
            'SELECT set_config(:k, :v, true)'
        );
        $stmt->execute([':k' => 'app.current_tenant', ':v' => $tenantId]);
    }

    public function query(string $sql, array $params = []): array
    {
        $stmt = $this->pdo->prepare($sql);
        $stmt->execute($params);
        return $stmt->fetchAll(\PDO::FETCH_ASSOC);
    }

    private static function isUuid(string $s): bool
    {
        return (bool) preg_match(
            '/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i',
            $s
        );
    }
}

The is_local = true argument is the part to stare at. With pooled connections — and under LiteSpeed you absolutely have them — a connection handed back to the pool with a lingering SET app.current_tenant is a cross-tenant leak waiting for the next request. Binding the GUC to the transaction with set_config(..., true) means it evaporates at COMMIT or ROLLBACK. Every request opens a transaction, sets context, does its work, and commits. No transaction, no context, no data — fail closed again.

If you run a connection pooler like PgBouncer in transaction mode, this is mandatory rather than nice-to-have: session-level SET is meaningless there because you don't own the session between statements. Transaction-scoped set_config is the only thing that works.

Indexing for the cross-tenant trend query

Isolation is half the job; the other half is that the trending feed has to be fast. The flagship query is "top videos in a country over the last 24 hours," and naively it scans a firehose of view_events. The fix is a covering index whose leading column is tenant_id, so the planner walks a tenant-local slice and never touches another tenant's pages.

-- Covering index: filter by tenant + time, aggregate by video, all from the index.
CREATE INDEX idx_events_trend
    ON view_events (tenant_id, occurred_at DESC)
    INCLUDE (video_id, country, watch_ms);

-- A partial index for the hot path: only the last-7-days partition gets
-- a country-specific index, keeping it small and cache-resident.
CREATE INDEX idx_events_country_recent
    ON view_events (tenant_id, country, occurred_at DESC)
    WHERE occurred_at > now() - interval '7 days';

Note the partial index uses a non-immutable predicate (now()), which PostgreSQL won't actually allow in an index WHERE clause — so in practice you pin it to a constant boundary and recreate it as partitions roll, or you index per-partition. I lead with the naive version because half the engineers reading this will copy it, hit the immutability error, and then understand why partition-local indexes exist. Each weekly partition gets its own small index; the planner does partition pruning first, then hits an index that fits in memory. That's how you keep p99 on the trending endpoint under 20ms even as the events table crosses hundreds of millions of rows.

The trend computation itself runs as a periodic rollup rather than on every page load. We materialize scores into the videos.trend_score column and let Cloudflare Workers cache the rendered feed at the edge, so the database only sees the rollup write, not the read storm. The database is the system of record and the boundary enforcer; it is not the thing serving every viral spike directly.

GDPR is a schema concern, not a policy PDF

This is the part most multi-tenant tutorials skip, and it's the part that matters most in Europe. Three schema-level decisions carry the compliance weight:

No raw PII in the analytics table. Look back at view_events: there's no IP address and no user id, just visitor_key bytea — a daily-salted hash of the visitor signal. The salt rotates every 24 hours and the old salt is discarded, so yesterday's hashes can never be correlated to today's. This makes the data effectively pseudonymous-trending-to-anonymous, which dramatically lowers your obligations while still letting you de-duplicate views within a day.
Erasure is a partition drop, not a DELETE crawl. Because view_events is range-partitioned by time and the salt is unrecoverable, retention is enforced by detaching and dropping old partitions on a schedule. There is no expensive DELETE ... WHERE occurred_at < ... churning your indexes; you just drop the partition.
Per-tenant erasure is a cascade. When a partner offboards, DELETE FROM tenants WHERE id = ... cascades through every ON DELETE CASCADE foreign key and the tenant's footprint is gone in one transaction. This is the one place database-per-tenant would have been marginally cleaner, but the cascade gets us 95% of the way at a fraction of the operational cost.

Here's the salted-key derivation, the way our ingest worker computes it before the row ever reaches PostgreSQL. I'll show it in Python since that's where our ingest pipeline lives:

import hashlib
import hmac
from datetime import date

def visitor_key(ip: str, user_agent: str, daily_salt: bytes) -> bytes:
    """Derive a pseudonymous, non-reversible daily visitor key.

    The daily_salt is rotated at 00:00 UTC and the previous day's salt
    is destroyed, so keys cannot be correlated across day boundaries.
    """
    signal = f"{ip}|{user_agent}".encode("utf-8")
    # HMAC, not a bare hash: without the salt the output is unforgeable
    # and the input space (IPv4 + UA) can't be brute-forced into a rainbow table.
    return hmac.new(daily_salt, signal, hashlib.sha256).digest()

# Usage at ingest time
key = visitor_key("203.0.113.7", "Mozilla/5.0 ...", current_daily_salt())
# `key` (32 bytes) is what gets inserted into view_events.visitor_key.
# The raw ip and user_agent are never persisted.

Using HMAC rather than a plain sha256(ip + salt) matters: a bare hash over the small IPv4 space is brute-forceable in seconds if the salt ever leaks, whereas HMAC's construction resists that. The raw IP exists only in memory for the microseconds it takes to compute the key, and then it's gone. When a regulator asks "can you re-identify a visitor from this table," the honest answer is no, and that answer is baked into the schema rather than promised in a policy document.

Testing that the boundary actually holds

A boundary you don't test is a boundary you don't have. The leak that started this whole journey survived because nobody had a test that asserted tenant B cannot see tenant A's rows. Now every CI run executes exactly that, connecting as the RLS-bound application role:

-- In a test transaction, set context to tenant A and insert a row.
SELECT set_config('app.current_tenant', '11111111-1111-1111-1111-111111111111', true);
INSERT INTO videos (tenant_id, id, source_id, title, lang, published_at)
VALUES ('11111111-1111-1111-1111-111111111111', gen_random_uuid(),
        'abc', 'Tenant A video', 'de', now());

-- Switch context to tenant B and assert the row is invisible.
SELECT set_config('app.current_tenant', '22222222-2222-2222-2222-222222222222', true);
-- This MUST return 0. If it returns 1, the build fails.
SELECT count(*) AS leaked FROM videos WHERE source_id = 'abc';

-- Assert tenant B cannot forge a row stamped as tenant A (WITH CHECK).
-- This INSERT MUST raise: new row violates row-level security policy.
INSERT INTO videos (tenant_id, id, source_id, title, lang, published_at)
VALUES ('11111111-1111-1111-1111-111111111111', gen_random_uuid(),
        'forged', 'forged', 'de', now());

We run this on every pull request. The negative assertions — "this returns zero," "this raises an error" — are the ones that earn their keep, because they fail loudly the moment someone disables a policy, adds a BYPASSRLS role by accident, or connects the app as the table owner without FORCE ROW LEVEL SECURITY. Those are precisely the regressions that silent application-layer checks let through.

What I'd tell my past self

The single highest-leverage decision is moving tenant isolation out of application code and into PostgreSQL's RLS, then making tenant_id the leading column of every key and index so isolation and performance reinforce each other instead of fighting. After that, the GDPR posture mostly falls out of the schema for free: no raw PII in the hot table, partition-based retention, cascade-based erasure.

If you're starting a multi-tenant build in Europe, get these four right before you write a feature:

Pick shared-schema only if you commit to RLS with FORCE ROW LEVEL SECURITY and WITH CHECK policies — not application WHERE clauses.
Scope tenant context with transaction-local set_config, never session-level SET, so pooled connections can't leak.
Make tenant_id the first column of composite keys and indexes for both isolation and locality.
Treat erasure and retention as partition operations, and keep raw identifiers out of the analytics table entirely.

Get those in place on day one and the embarrassing staging leak that taught me all this never makes it to your production logs.

DEV Community