DEV Community

ahmet gedik
ahmet gedik

Posted on

Database Sharding Strategies for Growing Video Platforms

When Do You Actually Need Sharding?

Most video platforms reach 10 million rows before sharding becomes necessary. Until then, proper indexing and query optimization outperform any sharding strategy. That said, understanding sharding early helps you design schemas that are shardable when the time comes.

TopVideoHub serves 9 Asia-Pacific regions. The natural partition keys — region and fetched_at — emerge directly from the data shape.

Strategy 1: PostgreSQL Native Partitioning

Declarative partitioning splits one logical table into physical child tables with zero application changes.

CREATE TABLE videos (
    id          BIGSERIAL,
    video_id    TEXT NOT NULL,
    title       TEXT NOT NULL,
    region      TEXT NOT NULL,
    category_id INT,
    view_count  BIGINT DEFAULT 0,
    fetched_at  TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (id, region)
) PARTITION BY LIST (region);

CREATE TABLE videos_jp PARTITION OF videos FOR VALUES IN ('JP');
CREATE TABLE videos_kr PARTITION OF videos FOR VALUES IN ('KR');
CREATE TABLE videos_tw PARTITION OF videos FOR VALUES IN ('TW');
CREATE TABLE videos_hk PARTITION OF videos FOR VALUES IN ('HK');
CREATE TABLE videos_vn PARTITION OF videos FOR VALUES IN ('VN');
CREATE TABLE videos_th PARTITION OF videos FOR VALUES IN ('TH');
CREATE TABLE videos_sg PARTITION OF videos FOR VALUES IN ('SG');
CREATE TABLE videos_us PARTITION OF videos FOR VALUES IN ('US');
CREATE TABLE videos_gb PARTITION OF videos FOR VALUES IN ('GB');
CREATE TABLE videos_other PARTITION OF videos DEFAULT;
Enter fullscreen mode Exit fullscreen mode

Now SELECT * FROM videos WHERE region = 'JP' only scans videos_jp. The planner prunes all other partitions automatically.

Composite Partition: Region + Date Range

CREATE TABLE video_events (
    id        BIGSERIAL,
    video_id  TEXT NOT NULL,
    region    TEXT NOT NULL,
    event     TEXT NOT NULL,
    occurred  TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY LIST (region);

CREATE TABLE video_events_jp PARTITION OF video_events
    FOR VALUES IN ('JP')
    PARTITION BY RANGE (occurred);

CREATE TABLE video_events_jp_2026q1 PARTITION OF video_events_jp
    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE video_events_jp_2026q2 PARTITION OF video_events_jp
    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
Enter fullscreen mode Exit fullscreen mode

Strategy 2: Application-Level Routing

When you outgrow a single PostgreSQL instance, route queries to different database hosts:

import asyncpg

SHARD_MAP = {
    'JP': 'postgresql://user:pass@db-asia-east/topvideohub',
    'KR': 'postgresql://user:pass@db-asia-east/topvideohub',
    'TW': 'postgresql://user:pass@db-asia-east/topvideohub',
    'HK': 'postgresql://user:pass@db-asia-east/topvideohub',
    'VN': 'postgresql://user:pass@db-asia-south/topvideohub',
    'TH': 'postgresql://user:pass@db-asia-south/topvideohub',
    'SG': 'postgresql://user:pass@db-asia-south/topvideohub',
    'US': 'postgresql://user:pass@db-west/topvideohub',
    'GB': 'postgresql://user:pass@db-eu/topvideohub',
}

_pools: dict[str, asyncpg.Pool] = {}

async def get_pool(region: str) -> asyncpg.Pool:
    dsn = SHARD_MAP.get(region, SHARD_MAP['US'])
    if dsn not in _pools:
        _pools[dsn] = await asyncpg.create_pool(dsn, min_size=2, max_size=10)
    return _pools[dsn]

async def fetch_trending(region: str, limit: int = 50) -> list[dict]:
    pool = await get_pool(region)
    async with pool.acquire() as conn:
        rows = await conn.fetch(
            'SELECT * FROM videos WHERE region=$1 ORDER BY view_count DESC LIMIT $2',
            region, limit,
        )
    return [dict(r) for r in rows]
Enter fullscreen mode Exit fullscreen mode

Strategy 3: Cross-Shard Global Trending

Scatter/Gather:

import asyncio

async def global_trending(limit: int = 20) -> list[dict]:
    tasks = [fetch_trending(r, limit=limit) for r in SHARD_MAP.keys()]
    results = await asyncio.gather(*tasks)
    merged = [v for shard in results for v in shard]
    merged.sort(key=lambda v: v['view_count'], reverse=True)
    return merged[:limit]
Enter fullscreen mode Exit fullscreen mode

Aggregation table (fast, slightly stale):

CREATE TABLE global_trending (
    rank       INT,
    video_id   TEXT,
    region     TEXT,
    view_count BIGINT,
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Refresh every 15 minutes
TRUNCATE global_trending;
INSERT INTO global_trending
    SELECT ROW_NUMBER() OVER (ORDER BY view_count DESC),
           video_id, region, view_count, NOW()
    FROM (
        SELECT video_id, region, view_count FROM videos_jp
        UNION ALL SELECT video_id, region, view_count FROM videos_kr
    ) combined
    ORDER BY view_count DESC
    LIMIT 100;
Enter fullscreen mode Exit fullscreen mode

For TopVideoHub, the aggregation table approach reduced global trending query time from 400ms to 3ms.

Migration Path

  1. Start with a single PostgreSQL instance, partitioned by LIST (region).
  2. Move hot partitions to read replicas as read load grows.
  3. Promote read replicas to shards when write load demands it.
  4. Add the application-level router only at step 3.

Premature sharding is the enemy of developer productivity. Partition early, shard late.


This article is part of the Building TopVideoHub series. Check out TopVideoHub to see these techniques in action.

Top comments (0)