Designing a Multi-Tenant Video Aggregator Schema in PostgreSQL

#postgres #database #multitenancy #architecture

Four video sites, one ingestion pipeline, one trending engine. That was the situation I inherited running TopVideoHub and its sibling properties. Each site targets a different slice of the Asia-Pacific market — one leans Japanese and Korean, another Southeast Asian, another a global mix — but they all pull from the same YouTube and regional trending feeds, run the same ranking jobs, and share the same operational team. Cloning the database four times meant cloning every migration, every index tweak, and every backfill four times too. After the third time I forgot to apply a schema change to one of the clones and spent an afternoon debugging a phantom NULL, I tore the whole thing down and rebuilt it as a single multi-tenant PostgreSQL database.

This post is the schema I landed on, the trade-offs that drove it, and the parts that bit me. The reference stack around it is PHP 8.4 on LiteSpeed behind Cloudflare, with SQLite FTS5 (CJK tokenizer) still handling on-site search per property — but the canonical store, the thing every tenant ingests into, is one Postgres cluster.

Three ways to be multi-tenant, and why I picked the middle one

There are really only three honest options, and the marketing around "multi-tenancy" tends to blur them:

Database-per-tenant. Strong isolation, painful operations. Connection pools fragment, migrations fan out, and cross-tenant analytics ("which video is trending in both the JP and KR audiences?") become an ETL job instead of a query.
Schema-per-tenant. One database, N Postgres schemas. Slightly better, but search_path juggling is error-prone and you still run every migration N times. At a few hundred tenants the catalog bloats and pg_dump slows to a crawl.
Shared schema with a tenant_id column. One set of tables, every row tagged with its owner. Cheapest to operate, easiest to analyze across tenants, and — critically — the only model where a single index serves all tenants.

For four tenants with near-identical schemas and a real need for cross-tenant queries, the shared-schema model is the obvious win. The objection people raise is isolation: "what stops a bug in tenant A's code from reading tenant B's rows?" The answer in Postgres is Row-Level Security (RLS), which I'll get to. With RLS the database enforces the boundary, not the application, so a forgotten WHERE tenant_id = ? fails closed instead of leaking data.

The tenant table and the shape of everything else

Start with the tenant registry. Everything hangs off this.

CREATE TABLE tenant (
    id           smallint    PRIMARY KEY,
    slug         text        NOT NULL UNIQUE,   -- 'tvh', 'dwv', 'vvv', 'tvs'
    display_name text        NOT NULL,
    primary_locale text      NOT NULL DEFAULT 'en',
    created_at   timestamptz NOT NULL DEFAULT now()
);

-- A tenant aggregates from a set of regions, weighted.
CREATE TABLE tenant_region (
    tenant_id  smallint NOT NULL REFERENCES tenant(id),
    region     char(2)  NOT NULL,            -- ISO 3166-1 alpha-2: JP, KR, TW...
    weight     numeric(4,3) NOT NULL DEFAULT 1.0,
    PRIMARY KEY (tenant_id, region)
);

I made tenant.id a smallint, not a UUID. With four tenants (and headroom for maybe a few dozen), a 2-byte key keeps every composite index narrow. That matters more than it sounds: tenant_id is the leading column of almost every index in the system, so two bytes versus sixteen is multiplied across millions of rows. UUIDs earn their keep when tenants self-register at scale; mine are provisioned by hand.

Now the core content tables. The key decision: tenant_id is the first column of every primary key and every secondary index. This is the single most important rule in a shared-schema design.

CREATE TABLE video (
    tenant_id    smallint    NOT NULL REFERENCES tenant(id),
    id           bigint      GENERATED ALWAYS AS IDENTITY,
    yt_video_id  text        NOT NULL,        -- YouTube's opaque id
    title        text        NOT NULL,
    channel_id   bigint      NOT NULL,
    region       char(2)     NOT NULL,
    lang         text        NOT NULL DEFAULT 'und',  -- BCP-47, 'ja','ko','zh-Hant'
    duration_s   integer,
    published_at timestamptz NOT NULL,
    ingested_at  timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (tenant_id, id),
    UNIQUE (tenant_id, yt_video_id)
) PARTITION BY LIST (tenant_id);

CREATE TABLE video_tvh PARTITION OF video FOR VALUES IN (1);
CREATE TABLE video_dwv PARTITION OF video FOR VALUES IN (2);
CREATE TABLE video_vvv PARTITION OF video FOR VALUES IN (3);
CREATE TABLE video_tvs PARTITION OF video FOR VALUES IN (4);

A few things are deliberate here:

The same yt_video_id can exist under multiple tenants. A video trending in both Japan and a pan-Asian feed is two rows, one per tenant, because its ranking, region, and editorial context differ. The uniqueness constraint is scoped (tenant_id, yt_video_id), not global. Deduplication, if you want it, happens at the analytics layer, not by collapsing rows.
lang is BCP-47, not a two-letter code. CJK forces this: zh-Hant (Traditional, Taiwan/Hong Kong) and zh-Hans (Simplified) are different audiences and must never be merged. A char(2) zh would throw that distinction away.
List partitioning by tenant_id. Each tenant's videos live in their own physical partition. This gives partition pruning for free — a query filtered to one tenant never touches the others' heap — and lets me TRUNCATE video_vvv to wipe a tenant without a slow DELETE.

Letting the database enforce isolation with RLS

This is the part that makes shared-schema safe. I enable Row-Level Security and write a policy that reads the current tenant from a session variable. The application sets that variable once per request, on checkout from the connection pool, and from then on cannot see other tenants' rows — even with a buggy query.

ALTER TABLE video ENABLE ROW LEVEL SECURITY;
ALTER TABLE video FORCE ROW LEVEL SECURITY;  -- applies even to table owner

CREATE POLICY tenant_isolation ON video
    USING (tenant_id = current_setting('app.tenant_id')::smallint)
    WITH CHECK (tenant_id = current_setting('app.tenant_id')::smallint);

-- The app role the connection pool uses. NOT a superuser, NOT the owner,
-- because superusers and owners bypass RLS unless FORCE is set.
CREATE ROLE app_runtime LOGIN PASSWORD '...';
GRANT SELECT, INSERT, UPDATE, DELETE ON video TO app_runtime;

The USING clause filters reads; the WITH CHECK clause filters writes, so an INSERT with the wrong tenant_id is rejected rather than silently mis-tagged. FORCE ROW LEVEL SECURITY matters because RLS is bypassed by the table owner and superusers by default — forcing it closes that hole, and you should run application traffic under a dedicated non-owner role regardless.

The one sharp edge: current_setting('app.tenant_id') throws if the variable was never set. That's a feature. It means a code path that forgets to scope itself crashes loudly instead of leaking the whole table. I use the two-argument form current_setting('app.tenant_id', true) only in admin tooling that legitimately spans tenants.

Wiring the tenant context from PHP

On the application side, the rule is simple: the very first thing any request does after grabbing a connection is set the tenant. I do it with set_config(..., true) so the setting is transaction-local — it resets automatically when the transaction ends, which means a pooled connection can't carry one request's tenant into the next.

<?php
declare(strict_types=1);

final class TenantConnection
{
    public function __construct(private \PDO $pdo) {}

    /** Resolve tenant from the hostname Cloudflare forwards. */
    public function forHost(string $host): \PDO
    {
        $slugByHost = [
            'topvideohub.com'   => 'tvh',
            'dailywatch.video'  => 'dwv',
            'viralvidvault.com' => 'vvv',
            'trendvidstream.com'=> 'tvs',
        ];
        $slug = $slugByHost[strtolower($host)] ?? throw new \RuntimeException("unknown host $host");

        $this->pdo->beginTransaction();
        // set_config(key, value, is_local=true) -> scoped to this transaction only
        $stmt = $this->pdo->prepare(
            "SELECT set_config('app.tenant_id', (SELECT id::text FROM tenant WHERE slug = ?), true)"
        );
        $stmt->execute([$slug]);

        return $this->pdo;
    }
}

// Usage in the front controller:
$pdo = (new TenantConnection($pdo))->forHost($_SERVER['HTTP_HOST']);
$videos = $pdo->query(
    'SELECT id, title, lang FROM video ORDER BY published_at DESC LIMIT 50'
)->fetchAll(\PDO::FETCH_ASSOC);
// No WHERE tenant_id needed — RLS already constrained it.

Notice the SELECT has no tenant_id filter at all. RLS supplies it. That's the payoff: the isolation logic lives in one policy, not scattered across hundreds of query sites where any one omission is a breach. The PHP layer's only job is to set the variable correctly, and that's a single, auditable code path in the front controller.

A practical note on pooling: if you front Postgres with PgBouncer in transaction mode, set_config(..., true) is exactly right because the setting dies with the transaction. Session-level SET would leak across pooled clients and is a classic multi-tenant data-leak bug — don't use it.

Modeling regions, languages, and CJK reality

Asia-Pacific aggregation is where a naive schema falls apart. "Trending in Asia" is not one feed; it's a weighted blend of per-region feeds, and language is orthogonal to region (Korean content trends in Japan; English trends everywhere). So I keep them as separate dimensions and never conflate them.

Region lives on the video row and in tenant_region for the aggregation weights. Language is BCP-47 on the row. The thing I want to stress is the script distinction: Han characters are shared across Japanese, Simplified Chinese, and Traditional Chinese, so you cannot infer language from the bytes alone. I store the detector's verdict explicitly and index it:

CREATE INDEX idx_video_lang ON video (tenant_id, lang, published_at DESC);

Full-text search is the interesting boundary. PostgreSQL's built-in tsvector parsers do not segment CJK — there are no spaces between words in Japanese or Chinese, so the default parser produces garbage tokens. The pragmatic options are pg_bigm (bigram) or pgroonga (Groonga-backed, proper CJK morphology) as a Postgres extension. On the read side, though, each site already serves search from a local SQLite FTS5 index with a CJK tokenizer — it's embedded, lives next to the LiteSpeed worker, and survives a Postgres outage. So Postgres is the system of record and SQLite FTS5 is the per-tenant query cache, rebuilt from Postgres on each ingest. The schema just needs to expose a clean per-tenant export, which the partitioning already gives me.

Trending as a materialized, partition-aligned table

Ranking is recomputed on a schedule (every 2–7 hours depending on the tenant). I do not compute trending on the fly — the ranking query is heavy and the read traffic is cache-fronted by Cloudflare anyway, so I materialize it. The trending table mirrors the partitioning of video and carries its own tenant_id + RLS policy.

CREATE TABLE trending (
    tenant_id  smallint    NOT NULL REFERENCES tenant(id),
    region     char(2)     NOT NULL,
    rank       integer     NOT NULL,
    video_id   bigint      NOT NULL,
    score      numeric(10,4) NOT NULL,
    computed_at timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (tenant_id, region, rank),
    FOREIGN KEY (tenant_id, video_id) REFERENCES video(tenant_id, id)
) PARTITION BY LIST (tenant_id);

CREATE TABLE trending_tvh PARTITION OF trending FOR VALUES IN (1);
-- ...one partition per tenant, as before.

ALTER TABLE trending ENABLE ROW LEVEL SECURITY;
ALTER TABLE trending FORCE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON trending
    USING (tenant_id = current_setting('app.tenant_id')::smallint);

The composite foreign key (tenant_id, video_id) is what keeps a trending row from ever pointing at another tenant's video. Without the tenant_id in the FK, you could rank tenant B's video inside tenant A's feed — the database now makes that impossible.

Here's the ingestion side, the Python worker that recomputes one tenant's trending table inside a single transaction. It sets the tenant context the same way the PHP layer does, then does a swap-in-place via DELETE + INSERT scoped to the active partition:

import psycopg

def recompute_trending(dsn: str, tenant_id: int, half_life_h: float = 18.0) -> int:
    """Rebuild one tenant's trending table. Returns rows written."""
    with psycopg.connect(dsn) as conn:
        with conn.transaction():
            # transaction-local tenant context -> RLS scopes every statement below
            conn.execute("SELECT set_config('app.tenant_id', %s, true)", (str(tenant_id),))

            # time-decayed popularity score; view_count joined from a stats table
            conn.execute("DELETE FROM trending")  # RLS limits this to our tenant only
            result = conn.execute(
                """
                INSERT INTO trending (tenant_id, region, rank, video_id, score)
                SELECT
                    %(tid)s,
                    v.region,
                    row_number() OVER (
                        PARTITION BY v.region
                        ORDER BY s.view_count
                            * exp(-extract(epoch FROM now() - v.published_at)
                                  / (3600.0 * %(hl)s)) DESC
                    ),
                    v.id,
                    s.view_count
                        * exp(-extract(epoch FROM now() - v.published_at)
                              / (3600.0 * %(hl)s))
                FROM video v
                JOIN video_stat s ON s.tenant_id = v.tenant_id AND s.video_id = v.id
                WHERE v.published_at > now() - interval '7 days'
                """,
                {"tid": tenant_id, "hl": half_life_h},
            )
            return result.rowcount

The whole rebuild is one transaction: either the new ranking is fully visible or the old one is, never a half-written state. Because trending is partitioned by tenant, the DELETE only rewrites that tenant's partition — the other three are untouched and their readers never block.

The performance details that actually mattered

After running this in production for a while, the lessons that earned their keep:

Lead every index with tenant_id. I said it already; I'm saying it again because it's the whole game. A query is always implicitly tenant-scoped, so (tenant_id, published_at DESC) is the index that serves "latest videos" for every tenant out of one B-tree. Drop the tenant_id prefix and Postgres can't prune to the partition cleanly.
List-partition by tenant, then never VACUUM the world. Per-tenant partitions mean autovacuum works one tenant at a time, and a heavy backfill on tvs doesn't bloat tvh's heap. It also makes per-tenant TRUNCATE and bulk re-ingest trivially fast.
Use FORCE ROW LEVEL SECURITY and a non-owner runtime role. Plain ENABLE is silently bypassed by the table owner. If your app connects as the owner (many do, by accident), RLS does nothing. Verify it: connect as app_runtime, SET app.tenant_id to tenant 1, and confirm SELECT count(*) FROM video only counts tenant 1's rows.
Set the tenant variable transaction-local, behind a transaction-mode pooler. Session-scoped settings leak across pooled connections. This is the multi-tenant footgun; the set_config(..., true) form defuses it.
Keep tenant_id a smallint. Narrow keys ripple through every composite index. For a handful of tenants, a UUID is pure overhead.

Conclusion

The shared-schema-plus-RLS model collapsed four databases into one without giving up isolation, and it did so by moving the tenant boundary out of application code and into the database engine where a forgotten WHERE clause can't bypass it. Partitioning by tenant_id gave me physical separation for vacuum and bulk operations while keeping a single logical schema and a single migration to apply. The composite primary keys and foreign keys make cross-tenant references structurally impossible rather than merely discouraged.

If you take three things from this: put tenant_id first in every key and index, enforce isolation with FORCE ROW LEVEL SECURITY under a non-owner role, and set your tenant context transaction-locally behind a transaction-mode pooler. The rest — the CJK language handling, the materialized trending, the SQLite FTS5 read cache — are domain details you'll adapt to your own product. The tenancy skeleton underneath them is the part worth copying exactly.