Kanishga Subramani

Posted on Jun 12

Day 14 of #100DaysOfClickHouse: Understanding Nodes, Shards, and Replicas

#clickhouse #analytics #database #devops

When people first start learning ClickHouse®, they usually focus on SQL queries, table engines, and performance optimization.

But as data grows from millions to billions of rows, architecture becomes the real game changer.

Today, I explored three core concepts that make ClickHouse scalable and reliable:

Nodes, Shards, and Replicas.

Let's understand them with a simple example.

Imagine an E-Commerce Platform

Suppose an online shopping platform generates:

500 million orders
Billions of user events
Terabytes of analytical data

Initially, all data is stored on a single ClickHouse server.

Node 1
│
├── Orders
├── Users
├── Events
└── Analytics Queries

Everything works fine.

Until the dataset grows.

Queries become slower.

Storage starts filling up.

CPU utilization increases.

At some point, one server is no longer enough.

What Is a Node?

A node is simply a ClickHouse server.

It contains:

CPU
Memory
Storage
ClickHouse process

For example:

Node 1

is a single ClickHouse server.

If that server crashes, your analytics system becomes unavailable.

This is why larger deployments move beyond a single node.

What Is a Shard?

A shard is a portion of the total dataset.

Instead of storing all orders on one server, ClickHouse can distribute them across multiple servers.

Without Sharding

Node 1

Orders:
1 Billion Rows

With Sharding

Shard 1 → Node 1
Shard 2 → Node 2
Shard 3 → Node 3

Now each server stores only part of the data.

For example:

Node 1 = Orders 1–300M
Node 2 = Orders 301M–600M
Node 3 = Orders 601M–1B

Benefits:

More storage capacity
Faster query execution
Better resource utilization

When a query arrives:

SELECT COUNT(*)
FROM orders;

ClickHouse executes it on all shards simultaneously.

Shard 1 → 300M
Shard 2 → 300M
Shard 3 → 400M

Total → 1 Billion

This parallel execution is one reason ClickHouse performs so well at scale.

What Is a Replica?

Now imagine Node 1 suddenly fails.

Without replication:

Node 1 Down

Data Lost
Queries Fail

This is where replicas help.

A replica is a copy of a shard stored on another node.

Example:

Shard 1

├── Node 1 (Primary)
└── Node 2 (Replica)

Both nodes contain identical data.

If Node 1 goes down:

Node 2 continues serving data

The application continues working.

Benefits:

High availability
Fault tolerance
Safer maintenance
Protection against hardware failures

The Difference That Finally Clicked for Me

Many beginners confuse shards and replicas.

Here's the simplest explanation:

Shards

Used for scaling.

Data A → Node 1
Data B → Node 2
Data C → Node 3

Different data on different servers.

Replicas

Used for reliability.

Data A → Node 1
Data A → Node 2

Same data on multiple servers.

A simple rule to remember:

Sharding distributes data.

Replication duplicates data.

Real Production Example

A typical ClickHouse cluster might look like this:

              Cluster

         /                \
    Shard 1            Shard 2

   /       \          /       \

Node 1   Node 2   Node 3   Node 4
Replica  Replica  Replica  Replica

In this setup:

Data is split between two shards
Each shard has two replicas
Queries run in parallel
The system survives node failures

This architecture provides both performance and reliability.

The Most Interesting Concept

What surprised me most was the role of Distributed Tables.

Applications don't usually insert data directly into shards.

Instead, they use a Distributed table:

INSERT INTO orders VALUES (...)

The Distributed table automatically decides:

User A → Shard 1
User B → Shard 2
User C → Shard 1

Similarly, when you run:

SELECT COUNT(*)
FROM orders;

ClickHouse automatically queries all shards, collects the results, and returns a single answer.

The application never needs to know where the data is actually stored.

Final Thoughts

Learning ClickHouse isn't just about writing faster SQL.

It's about understanding how data is distributed, replicated, and processed across multiple machines.

Today's biggest takeaway:

Shards help ClickHouse scale.

Replicas help ClickHouse stay available.

And together, they form the foundation of every production-grade ClickHouse deployment.

The deeper I dive into ClickHouse, the more I appreciate how elegantly it solves large-scale analytics challenges.

Have you worked with distributed databases before? What concept took the longest for you to fully understand?

DEV Community