When a Single Database Isn't Enough
At DailyWatch, we aggregate trending videos from 8 regions with a cron pipeline running every 2 hours. The videos table grows by thousands of rows daily. At some point, queries slow down, indexes bloat, and backups take too long. That's when you start thinking about sharding.
This article covers three sharding strategies we evaluated, with real PostgreSQL examples.
Strategy 1: Range Partitioning by Date
The most natural fit for time-series video data. PostgreSQL's declarative partitioning handles this natively:
-- Parent table (no data stored directly here)
CREATE TABLE videos (
id BIGSERIAL,
video_id VARCHAR(16) NOT NULL,
title TEXT NOT NULL,
view_count BIGINT DEFAULT 0,
region VARCHAR(4) NOT NULL,
category_id INTEGER,
fetched_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, fetched_at)
) PARTITION BY RANGE (fetched_at);
-- Monthly partitions
CREATE TABLE videos_2026_01 PARTITION OF videos
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE videos_2026_02 PARTITION OF videos
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE videos_2026_03 PARTITION OF videos
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
-- Each partition gets its own indexes automatically
CREATE INDEX ON videos_2026_01 (video_id);
CREATE INDEX ON videos_2026_02 (video_id);
CREATE INDEX ON videos_2026_03 (video_id);
The query planner automatically prunes partitions:
-- Only scans videos_2026_03, skips others entirely
EXPLAIN ANALYZE
SELECT * FROM videos
WHERE fetched_at >= '2026-03-01' AND fetched_at < '2026-03-15';
Automating Partition Creation
Don't create partitions manually. Use a function:
CREATE OR REPLACE FUNCTION create_monthly_partition(target_date DATE)
RETURNS TEXT AS $$
DECLARE
partition_name TEXT;
start_date DATE;
end_date DATE;
BEGIN
start_date := DATE_TRUNC('month', target_date);
end_date := start_date + INTERVAL '1 month';
partition_name := 'videos_' || TO_CHAR(start_date, 'YYYY_MM');
-- Check if partition already exists
IF NOT EXISTS (
SELECT 1 FROM pg_class WHERE relname = partition_name
) THEN
EXECUTE format(
'CREATE TABLE %I PARTITION OF videos FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
EXECUTE format(
'CREATE INDEX ON %I (video_id)',
partition_name
);
RETURN 'Created: ' || partition_name;
END IF;
RETURN 'Already exists: ' || partition_name;
END;
$$ LANGUAGE plpgsql;
-- Call before each month
SELECT create_monthly_partition('2026-04-01');
Strategy 2: List Partitioning by Region
When queries almost always filter by region, partition on that dimension:
CREATE TABLE videos_by_region (
id BIGSERIAL,
video_id VARCHAR(16) NOT NULL,
title TEXT NOT NULL,
view_count BIGINT DEFAULT 0,
region VARCHAR(4) NOT NULL,
fetched_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, region)
) PARTITION BY LIST (region);
CREATE TABLE videos_americas PARTITION OF videos_by_region
FOR VALUES IN ('US', 'BR', 'CA');
CREATE TABLE videos_europe PARTITION OF videos_by_region
FOR VALUES IN ('GB', 'DE', 'FR');
CREATE TABLE videos_apac PARTITION OF videos_by_region
FOR VALUES IN ('IN', 'AU');
This shines when your application logic is already region-aware. At DailyWatch, each region's trending page only queries its own partition.
Strategy 3: Application-Level Sharding
When PostgreSQL partitioning isn't enough (different databases on different servers), shard at the application level:
import hashlib
from typing import Dict
import asyncpg
class VideoShardRouter:
"""Routes video queries to the correct database shard."""
def __init__(self, shard_configs: Dict[str, str]):
# shard_configs: {"shard_0": "postgres://...", "shard_1": "postgres://...", ...}
self.shard_count = len(shard_configs)
self.pools: Dict[str, asyncpg.Pool] = {}
self.configs = shard_configs
async def initialize(self):
for name, dsn in self.configs.items():
self.pools[name] = await asyncpg.create_pool(dsn, min_size=2, max_size=10)
def get_shard(self, video_id: str) -> str:
"""Consistent hash-based routing."""
hash_val = int(hashlib.md5(video_id.encode()).hexdigest(), 16)
shard_idx = hash_val % self.shard_count
return f"shard_{shard_idx}"
async def insert_video(self, video_id: str, title: str, region: str, views: int):
shard = self.get_shard(video_id)
pool = self.pools[shard]
await pool.execute(
"INSERT INTO videos (video_id, title, region, view_count) VALUES ($1, $2, $3, $4) "
"ON CONFLICT (video_id) DO UPDATE SET view_count = $4",
video_id, title, region, views,
)
async def get_video(self, video_id: str) -> dict:
shard = self.get_shard(video_id)
pool = self.pools[shard]
row = await pool.fetchrow(
"SELECT * FROM videos WHERE video_id = $1", video_id
)
return dict(row) if row else None
async def search_all_shards(self, query: str, limit: int = 20) -> list:
"""Fan-out query across all shards, merge results."""
import asyncio
tasks = [
pool.fetch(
"SELECT * FROM videos WHERE title ILIKE $1 ORDER BY view_count DESC LIMIT $2",
f"%{query}%", limit,
)
for pool in self.pools.values()
]
results = await asyncio.gather(*tasks)
merged = [dict(row) for shard_rows in results for row in shard_rows]
merged.sort(key=lambda v: v["view_count"], reverse=True)
return merged[:limit]
Which Strategy to Choose?
| Strategy | Best When | Complexity | DailyWatch Fit |
|---|---|---|---|
| Date partitioning | Time-series queries dominate | Low | Good for archiving |
| Region partitioning | Region-scoped queries | Low | Great for trending pages |
| App-level sharding | Multiple DB servers needed | High | Future-proofing |
For most video platforms under 10M rows, PostgreSQL native partitioning is the right call. You get automatic partition pruning, simpler ops, and no application code changes. App-level sharding is the nuclear option for when you've outgrown a single server.
Key Takeaway
Start with date-based partitioning. It's the lowest-effort, highest-impact sharding strategy for content platforms where "recent" is always the most-queried dimension. You can always add region partitioning as a sub-partition later.
This article is part of the Building DailyWatch series. Check out DailyWatch to see these techniques in action.
Top comments (0)