DEV Community

Cover image for Scaling Databases with ClickHouse Sharding (Hands-On Simulation)
Mohamed Hussain S
Mohamed Hussain S

Posted on

Scaling Databases with ClickHouse Sharding (Hands-On Simulation)

Hey Devs πŸ‘‹,

When datasets grow, even the most powerful database server eventually hits its limits:

πŸ“¦ Disk space fills up
⚑ CPU maxes out under query load
🧠 Memory struggles with joins & aggregations

That’s where sharding comes in. Instead of scaling up a single machine, we split the data across multiple nodes.

In this post, I’ll walk you through a hands-on simulation of ClickHouse sharding that I built β€” so you can try it locally and understand how it works in practice.

πŸ”— GitHub Repo: Check my profile β†’ GitHub β†’ look for ClickHouse_Sharding_Simulation


πŸ“¦ What This Project Does

This is a beginner-friendly project that demonstrates:

πŸ—‚οΈ Creating a multi-shard ClickHouse setup with Docker Compose
πŸ“Š Distributing data across shards with weights (e.g., one shard can take 10Γ— more data)
πŸ” Querying through a Distributed table to merge results across shards
πŸ“ˆ Showing how queries scale horizontally as data grows


πŸ› οΈ Tech Stack

  • ClickHouse β†’ high-performance OLAP database
  • Docker Compose β†’ spin up shards + distributed node easily
  • SQL β†’ to define shards, distributed tables, and run queries

βš™οΈ How To Run It Locally

Step 1. Clone the repo

git clone https://github.com/mohhddhassan/ClickHouse_Sharding_Simulation.git
cd ClickHouse_Sharding_Simulation
Enter fullscreen mode Exit fullscreen mode

Step 2. Start the cluster

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Step 3. Enter a shard container and insert sample data

docker exec -it ch1 clickhouse-client
Enter fullscreen mode Exit fullscreen mode

Step 4. Query from the distributed table

SELECT * FROM distributed_table;
Enter fullscreen mode Exit fullscreen mode

Boom πŸš€ you’ll see results merged from multiple shards!


πŸ—‚οΈ Project Structure

ClickHouse_Sharding_Simulation/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ configs/
β”‚   └── remote_servers.xml
└── README.md                  # Example queries + schema
Enter fullscreen mode Exit fullscreen mode

🀯 What I Learned

πŸ’‘ How ClickHouse uses Distributed tables to query across shards
πŸ’‘ How shard weights balance load between nodes
πŸ’‘ Why horizontal scaling beats vertical scaling for OLAP workloads
πŸ’‘ How to simulate real-world database scaling locally with Docker


πŸ” Why You Should Try This

If you’re learning data engineering or databases:

πŸ”Ή Understand sharding in a safe, local environment
πŸ”Ή Practice setting up a mini ClickHouse cluster with Docker
πŸ”Ή See how queries scale across nodes
πŸ”Ή Build intuition for horizontal scaling vs vertical scaling


πŸ“Œ What’s Next?

  • Add replication for fault tolerance
  • Benchmark query speed vs single-node setup
  • Try larger datasets for performance testing

πŸ™‹β€β™‚οΈ About Me

Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub

πŸ§ͺ Building simple to understand the logic.


Top comments (0)