When Do You Actually Need Sharding?
Most video platforms reach 10 million rows before sharding becomes necessary. Until then, proper indexing and query optimization outperform any sharding strategy. That said, understanding sharding early helps you design schemas that are shardable when the time comes.
TopVideoHub serves 9 Asia-Pacific regions. The natural partition keys — region and fetched_at — emerge directly from the data shape.
Strategy 1: PostgreSQL Native Partitioning
Declarative partitioning splits one logical table into physical child tables with zero application changes.
CREATE TABLE videos (
id BIGSERIAL,
video_id TEXT NOT NULL,
title TEXT NOT NULL,
region TEXT NOT NULL,
category_id INT,
view_count BIGINT DEFAULT 0,
fetched_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (id, region)
) PARTITION BY LIST (region);
CREATE TABLE videos_jp PARTITION OF videos FOR VALUES IN ('JP');
CREATE TABLE videos_kr PARTITION OF videos FOR VALUES IN ('KR');
CREATE TABLE videos_tw PARTITION OF videos FOR VALUES IN ('TW');
CREATE TABLE videos_hk PARTITION OF videos FOR VALUES IN ('HK');
CREATE TABLE videos_vn PARTITION OF videos FOR VALUES IN ('VN');
CREATE TABLE videos_th PARTITION OF videos FOR VALUES IN ('TH');
CREATE TABLE videos_sg PARTITION OF videos FOR VALUES IN ('SG');
CREATE TABLE videos_us PARTITION OF videos FOR VALUES IN ('US');
CREATE TABLE videos_gb PARTITION OF videos FOR VALUES IN ('GB');
CREATE TABLE videos_other PARTITION OF videos DEFAULT;
Now SELECT * FROM videos WHERE region = 'JP' only scans videos_jp. The planner prunes all other partitions automatically.
Composite Partition: Region + Date Range
CREATE TABLE video_events (
id BIGSERIAL,
video_id TEXT NOT NULL,
region TEXT NOT NULL,
event TEXT NOT NULL,
occurred TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY LIST (region);
CREATE TABLE video_events_jp PARTITION OF video_events
FOR VALUES IN ('JP')
PARTITION BY RANGE (occurred);
CREATE TABLE video_events_jp_2026q1 PARTITION OF video_events_jp
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE video_events_jp_2026q2 PARTITION OF video_events_jp
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
Strategy 2: Application-Level Routing
When you outgrow a single PostgreSQL instance, route queries to different database hosts:
import asyncpg
SHARD_MAP = {
'JP': 'postgresql://user:pass@db-asia-east/topvideohub',
'KR': 'postgresql://user:pass@db-asia-east/topvideohub',
'TW': 'postgresql://user:pass@db-asia-east/topvideohub',
'HK': 'postgresql://user:pass@db-asia-east/topvideohub',
'VN': 'postgresql://user:pass@db-asia-south/topvideohub',
'TH': 'postgresql://user:pass@db-asia-south/topvideohub',
'SG': 'postgresql://user:pass@db-asia-south/topvideohub',
'US': 'postgresql://user:pass@db-west/topvideohub',
'GB': 'postgresql://user:pass@db-eu/topvideohub',
}
_pools: dict[str, asyncpg.Pool] = {}
async def get_pool(region: str) -> asyncpg.Pool:
dsn = SHARD_MAP.get(region, SHARD_MAP['US'])
if dsn not in _pools:
_pools[dsn] = await asyncpg.create_pool(dsn, min_size=2, max_size=10)
return _pools[dsn]
async def fetch_trending(region: str, limit: int = 50) -> list[dict]:
pool = await get_pool(region)
async with pool.acquire() as conn:
rows = await conn.fetch(
'SELECT * FROM videos WHERE region=$1 ORDER BY view_count DESC LIMIT $2',
region, limit,
)
return [dict(r) for r in rows]
Strategy 3: Cross-Shard Global Trending
Scatter/Gather:
import asyncio
async def global_trending(limit: int = 20) -> list[dict]:
tasks = [fetch_trending(r, limit=limit) for r in SHARD_MAP.keys()]
results = await asyncio.gather(*tasks)
merged = [v for shard in results for v in shard]
merged.sort(key=lambda v: v['view_count'], reverse=True)
return merged[:limit]
Aggregation table (fast, slightly stale):
CREATE TABLE global_trending (
rank INT,
video_id TEXT,
region TEXT,
view_count BIGINT,
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Refresh every 15 minutes
TRUNCATE global_trending;
INSERT INTO global_trending
SELECT ROW_NUMBER() OVER (ORDER BY view_count DESC),
video_id, region, view_count, NOW()
FROM (
SELECT video_id, region, view_count FROM videos_jp
UNION ALL SELECT video_id, region, view_count FROM videos_kr
) combined
ORDER BY view_count DESC
LIMIT 100;
For TopVideoHub, the aggregation table approach reduced global trending query time from 400ms to 3ms.
Migration Path
- Start with a single PostgreSQL instance, partitioned by LIST (region).
- Move hot partitions to read replicas as read load grows.
- Promote read replicas to shards when write load demands it.
- Add the application-level router only at step 3.
Premature sharding is the enemy of developer productivity. Partition early, shard late.
This article is part of the Building TopVideoHub series. Check out TopVideoHub to see these techniques in action.
Top comments (0)