Splitting data across servers
Day 121 of 149
👉 Full deep-dive with code examples
The Library Card Catalog Analogy
One huge card catalog is hard to use:
- Long lines at one catalog
- If it breaks, no one can find books
Split by letters:
- A-F catalog in room 1
- G-M catalog in room 2
- N-Z catalog in room 3
Search is faster, and if one catalog goes down, you can still use the others (but you lose access to that section until it comes back).
Sharding splits your database into pieces across servers!
Why Shard?
One server has limits:
- Storage → Runs out of disk space
- Speed → Too many queries to handle
- Memory → Can't fit all data in RAM
Sharding spreads the load across multiple servers.
How It Works
Split data by some key:
Users A-H → Server 1
Users I-P → Server 2
Users Q-Z → Server 3
Each piece is a "shard." Each shard handles its portion.
Sharding Strategies
By key range:
- Users 1-1000 → Shard 1
- Users 1001-2000 → Shard 2
By hash:
- hash(user_id) % 3 → determines shard
- Often spreads data more evenly
By geography:
- US users → US shard
- EU users → EU shard
The Trade-offs
Complexity:
- Queries across shards are hard
- Need to route requests correctly
Availability:
- Sharding doesn't automatically make you highly available
- If a shard goes down, data on that shard is unavailable unless you also use replication
Joins:
- Can't easily join data across shards
- Might need to restructure queries
Rebalancing:
- Adding new shard? Need to move data
In One Sentence
Database Sharding splits your data across multiple servers so each handles a smaller piece, allowing your database to scale beyond one machine's limits.
🔗 Enjoying these? Follow for daily ELI5 explanations!
Making complex tech concepts simple, one day at a time.
Top comments (0)