DEV Community

Cover image for Day 14 of #100DaysOfClickHouse: Understanding Nodes, Shards, and Replicas
Kanishga Subramani
Kanishga Subramani

Posted on

Day 14 of #100DaysOfClickHouse: Understanding Nodes, Shards, and Replicas

When people first start learning ClickHouse®, they usually focus on SQL queries, table engines, and performance optimization.

But as data grows from millions to billions of rows, architecture becomes the real game changer.

Today, I explored three core concepts that make ClickHouse scalable and reliable:

Nodes, Shards, and Replicas.

Let's understand them with a simple example.

Imagine an E-Commerce Platform

Suppose an online shopping platform generates:

  • 500 million orders
  • Billions of user events
  • Terabytes of analytical data

Initially, all data is stored on a single ClickHouse server.

Node 1
│
├── Orders
├── Users
├── Events
└── Analytics Queries
Enter fullscreen mode Exit fullscreen mode

Everything works fine.

Until the dataset grows.

Queries become slower.

Storage starts filling up.

CPU utilization increases.

At some point, one server is no longer enough.

What Is a Node?

A node is simply a ClickHouse server.

It contains:

  • CPU
  • Memory
  • Storage
  • ClickHouse process

For example:

Node 1
Enter fullscreen mode Exit fullscreen mode

is a single ClickHouse server.

If that server crashes, your analytics system becomes unavailable.

This is why larger deployments move beyond a single node.

What Is a Shard?

A shard is a portion of the total dataset.

Instead of storing all orders on one server, ClickHouse can distribute them across multiple servers.

Without Sharding

Node 1

Orders:
1 Billion Rows
Enter fullscreen mode Exit fullscreen mode

With Sharding

Shard 1 → Node 1
Shard 2 → Node 2
Shard 3 → Node 3
Enter fullscreen mode Exit fullscreen mode

Now each server stores only part of the data.

For example:

Node 1 = Orders 1–300M
Node 2 = Orders 301M–600M
Node 3 = Orders 601M–1B
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • More storage capacity
  • Faster query execution
  • Better resource utilization

When a query arrives:

SELECT COUNT(*)
FROM orders;
Enter fullscreen mode Exit fullscreen mode

ClickHouse executes it on all shards simultaneously.

Shard 1 → 300M
Shard 2 → 300M
Shard 3 → 400M

Total → 1 Billion
Enter fullscreen mode Exit fullscreen mode

This parallel execution is one reason ClickHouse performs so well at scale.

What Is a Replica?

Now imagine Node 1 suddenly fails.

Without replication:

Node 1 Down

Data Lost
Queries Fail
Enter fullscreen mode Exit fullscreen mode

This is where replicas help.

A replica is a copy of a shard stored on another node.

Example:

Shard 1

├── Node 1 (Primary)
└── Node 2 (Replica)
Enter fullscreen mode Exit fullscreen mode

Both nodes contain identical data.

If Node 1 goes down:

Node 2 continues serving data
Enter fullscreen mode Exit fullscreen mode

The application continues working.

Benefits:

  • High availability
  • Fault tolerance
  • Safer maintenance
  • Protection against hardware failures

The Difference That Finally Clicked for Me

Many beginners confuse shards and replicas.

Here's the simplest explanation:

Shards

Used for scaling.

Data A → Node 1
Data B → Node 2
Data C → Node 3
Enter fullscreen mode Exit fullscreen mode

Different data on different servers.

Replicas

Used for reliability.

Data A → Node 1
Data A → Node 2
Enter fullscreen mode Exit fullscreen mode

Same data on multiple servers.

A simple rule to remember:

Sharding distributes data.

Replication duplicates data.

Real Production Example

A typical ClickHouse cluster might look like this:

              Cluster

         /                \
    Shard 1            Shard 2

   /       \          /       \

Node 1   Node 2   Node 3   Node 4
Replica  Replica  Replica  Replica
Enter fullscreen mode Exit fullscreen mode

In this setup:

  • Data is split between two shards
  • Each shard has two replicas
  • Queries run in parallel
  • The system survives node failures

This architecture provides both performance and reliability.

The Most Interesting Concept

What surprised me most was the role of Distributed Tables.

Applications don't usually insert data directly into shards.

Instead, they use a Distributed table:

INSERT INTO orders VALUES (...)
Enter fullscreen mode Exit fullscreen mode

The Distributed table automatically decides:

User A → Shard 1
User B → Shard 2
User C → Shard 1
Enter fullscreen mode Exit fullscreen mode

Similarly, when you run:

SELECT COUNT(*)
FROM orders;
Enter fullscreen mode Exit fullscreen mode

ClickHouse automatically queries all shards, collects the results, and returns a single answer.

The application never needs to know where the data is actually stored.

Final Thoughts

Learning ClickHouse isn't just about writing faster SQL.

It's about understanding how data is distributed, replicated, and processed across multiple machines.

Today's biggest takeaway:

Shards help ClickHouse scale.

Replicas help ClickHouse stay available.

And together, they form the foundation of every production-grade ClickHouse deployment.

The deeper I dive into ClickHouse, the more I appreciate how elegantly it solves large-scale analytics challenges.

Have you worked with distributed databases before? What concept took the longest for you to fully understand?

Read more... https://quantrail-data.com/clickhouse-architecture-basics-nodes-shards-replicas/

Top comments (0)