Hey Devs π,
When datasets grow, even the most powerful database server eventually hits its limits:
π¦ Disk space fills up
β‘ CPU maxes out under query load
π§ Memory struggles with joins & aggregations
Thatβs where sharding comes in. Instead of scaling up a single machine, we split the data across multiple nodes.
In this post, Iβll walk you through a hands-on simulation of ClickHouse sharding that I built β so you can try it locally and understand how it works in practice.
π GitHub Repo: Check my profile β GitHub β look for ClickHouse_Sharding_Simulation
π¦ What This Project Does
This is a beginner-friendly project that demonstrates:
ποΈ Creating a multi-shard ClickHouse setup with Docker Compose
π Distributing data across shards with weights (e.g., one shard can take 10Γ more data)
π Querying through a Distributed table to merge results across shards
π Showing how queries scale horizontally as data grows
π οΈ Tech Stack
- ClickHouse β high-performance OLAP database
- Docker Compose β spin up shards + distributed node easily
- SQL β to define shards, distributed tables, and run queries
βοΈ How To Run It Locally
Step 1. Clone the repo
git clone https://github.com/mohhddhassan/ClickHouse_Sharding_Simulation.git
cd ClickHouse_Sharding_Simulation
Step 2. Start the cluster
docker-compose up -d
Step 3. Enter a shard container and insert sample data
docker exec -it ch1 clickhouse-client
Step 4. Query from the distributed table
SELECT * FROM distributed_table;
Boom π youβll see results merged from multiple shards!
ποΈ Project Structure
ClickHouse_Sharding_Simulation/
βββ docker-compose.yml
βββ configs/
β βββ remote_servers.xml
βββ README.md # Example queries + schema
π€― What I Learned
π‘ How ClickHouse uses Distributed tables to query across shards
π‘ How shard weights balance load between nodes
π‘ Why horizontal scaling beats vertical scaling for OLAP workloads
π‘ How to simulate real-world database scaling locally with Docker
π Why You Should Try This
If youβre learning data engineering or databases:
πΉ Understand sharding in a safe, local environment
πΉ Practice setting up a mini ClickHouse cluster with Docker
πΉ See how queries scale across nodes
πΉ Build intuition for horizontal scaling vs vertical scaling
π Whatβs Next?
- Add replication for fault tolerance
- Benchmark query speed vs single-node setup
- Try larger datasets for performance testing
πββοΈ About Me
Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub
π§ͺ Building simple to understand the logic.
Top comments (0)