Scaling a European Video Database
ViralVidVault stores viral video data from 7 European regions. As the database grows, queries that scan the entire videos table get slower. Sharding -- splitting data across multiple tables or databases -- is how you keep query performance flat as data grows linearly.
Here are three strategies, with concrete PostgreSQL implementations.
Strategy 1: List Partitioning by Region
The most natural fit for a multi-region video vault. Each European region gets its own partition:
CREATE TABLE videos (
id BIGSERIAL,
video_id VARCHAR(16) NOT NULL,
title TEXT NOT NULL,
view_count BIGINT DEFAULT 0,
virality_score FLOAT DEFAULT 0,
region VARCHAR(4) NOT NULL,
category_id INTEGER,
fetched_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, region)
) PARTITION BY LIST (region);
-- One partition per tracked region
CREATE TABLE videos_pl PARTITION OF videos FOR VALUES IN ('PL');
CREATE TABLE videos_nl PARTITION OF videos FOR VALUES IN ('NL');
CREATE TABLE videos_se PARTITION OF videos FOR VALUES IN ('SE');
CREATE TABLE videos_no PARTITION OF videos FOR VALUES IN ('NO');
CREATE TABLE videos_at PARTITION OF videos FOR VALUES IN ('AT');
CREATE TABLE videos_gb PARTITION OF videos FOR VALUES IN ('GB');
CREATE TABLE videos_us PARTITION OF videos FOR VALUES IN ('US');
-- Default partition for any unexpected regions
CREATE TABLE videos_other PARTITION OF videos DEFAULT;
-- Each partition gets its own index
CREATE INDEX ON videos_pl (virality_score DESC);
CREATE INDEX ON videos_nl (virality_score DESC);
CREATE INDEX ON videos_se (virality_score DESC);
CREATE INDEX ON videos_no (virality_score DESC);
CREATE INDEX ON videos_at (virality_score DESC);
CREATE INDEX ON videos_gb (virality_score DESC);
CREATE INDEX ON videos_us (virality_score DESC);
Now region-scoped queries are lightning fast:
-- Only scans videos_pl, not the other 6 partitions
SELECT * FROM videos
WHERE region = 'PL' AND virality_score >= 85
ORDER BY virality_score DESC
LIMIT 20;
Strategy 2: Composite Partitioning (Region + Date)
For even finer granularity, partition by region first, then by date:
CREATE TABLE videos_partitioned (
id BIGSERIAL,
video_id VARCHAR(16) NOT NULL,
title TEXT NOT NULL,
virality_score FLOAT DEFAULT 0,
region VARCHAR(4) NOT NULL,
fetched_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, region, fetched_at)
) PARTITION BY LIST (region);
-- Region partition, further sub-partitioned by month
CREATE TABLE videos_gb_part PARTITION OF videos_partitioned
FOR VALUES IN ('GB')
PARTITION BY RANGE (fetched_at);
CREATE TABLE videos_gb_2026_02 PARTITION OF videos_gb_part
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE videos_gb_2026_03 PARTITION OF videos_gb_part
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
This is ideal for queries like "show me viral UK videos from this week."
Strategy 3: Application-Level Sharding
When you outgrow a single PostgreSQL instance, shard at the application layer:
import hashlib
from typing import Dict, List
import asyncpg
class RegionShardRouter:
"""Route queries to region-specific database instances."""
SHARD_MAP = {
"western": ["GB", "NL", "AT"], # Western Europe shard
"nordic": ["SE", "NO"], # Nordic shard
"eastern": ["PL"], # Eastern Europe shard
"global": ["US"], # Global shard
}
def __init__(self, shard_dsns: Dict[str, str]):
self.dsns = shard_dsns
self.pools: Dict[str, asyncpg.Pool] = {}
# Build reverse lookup: region -> shard name
self.region_to_shard = {}
for shard, regions in self.SHARD_MAP.items():
for r in regions:
self.region_to_shard[r] = shard
async def initialize(self):
for name, dsn in self.dsns.items():
self.pools[name] = await asyncpg.create_pool(dsn, min_size=2, max_size=10)
def _get_pool(self, region: str) -> asyncpg.Pool:
shard = self.region_to_shard.get(region, "western")
return self.pools[shard]
async def get_viral_videos(self, region: str, min_score: float = 85.0, limit: int = 20) -> list:
pool = self._get_pool(region)
return await pool.fetch(
"SELECT * FROM videos WHERE region = $1 AND virality_score >= $2 "
"ORDER BY virality_score DESC LIMIT $3",
region, min_score, limit,
)
async def search_all_regions(self, query: str, limit: int = 20) -> list:
"""Fan-out search across all shards, merge by virality score."""
import asyncio
tasks = [
pool.fetch(
"SELECT * FROM videos WHERE title ILIKE $1 ORDER BY virality_score DESC LIMIT $2",
f"%{query}%", limit,
)
for pool in self.pools.values()
]
results = await asyncio.gather(*tasks)
merged = [dict(r) for shard_rows in results for r in shard_rows]
merged.sort(key=lambda v: v["virality_score"], reverse=True)
return merged[:limit]
Choosing the Right Strategy
| Strategy | Best For | Complexity | ViralVidVault Fit |
|---|---|---|---|
| Region list partition | Region-scoped queries | Low | Ideal for trending pages |
| Region + date composite | Time-bounded + region queries | Medium | Great for analytics |
| App-level shard | Multiple DB servers | High | Future scaling |
For ViralVidVault, region-based list partitioning is the clear winner. Every page on the site is implicitly region-scoped -- "viral in Poland," "trending in Sweden" -- so partition pruning eliminates 80%+ of rows from every query automatically.
Partition Maintenance
Automate monthly partition creation:
CREATE OR REPLACE FUNCTION ensure_monthly_partitions(base_region TEXT, months_ahead INT DEFAULT 3)
RETURNS VOID AS $$
DECLARE
m INT;
part_name TEXT;
start_d DATE;
end_d DATE;
BEGIN
FOR m IN 0..months_ahead LOOP
start_d := DATE_TRUNC('month', NOW() + (m || ' months')::INTERVAL);
end_d := start_d + INTERVAL '1 month';
part_name := format('videos_%s_%s', lower(base_region), TO_CHAR(start_d, 'YYYY_MM'));
IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = part_name) THEN
EXECUTE format(
'CREATE TABLE %I PARTITION OF videos_%s_part FOR VALUES FROM (%L) TO (%L)',
part_name, lower(base_region), start_d, end_d
);
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Start simple, partition early. It's far easier to add partitioning from day one than to migrate a bloated table later.
This article is part of the Building ViralVidVault series. Check out ViralVidVault to see these techniques in action.
Top comments (0)