In the previous article I showed how to run ClickHouse in cluster mode using only sharding. It's enough for load distribution, but we also need to ensure fault tolerance via replication.
ZooKeeper
Apache ZooKeeper is a coordination service which provides distributed synchronization of config information across nodes. It's obligatory to have a cluster or a single node of it (above 3.4.5) if you want to enable replication on your ClickHouse cluster.
It's pretty easy to launch several nodes of ZooKeeper and assemble them into a cluster. Here's the configuration snippet for docker-compose.yml
:
zk-1:
container_name: zk_1
image: bitnami/zookeeper:3.6.1
environment:
- ZOO_SERVER_ID=1
- ALLOW_ANONYMOUS_LOGIN=yes
- ZOO_SERVERS=zk-1:2888:3888,zk-2:2888:3888
zk-2:
container_name: zk_2
image: bitnami/zookeeper:3.6.1
environment:
- ZOO_SERVER_ID=2
- ALLOW_ANONYMOUS_LOGIN=yes
- ZOO_SERVERS=zk-1:2888:3888,zk-2:2888:3888
And here's my configuration for a single node:
ch-zookeeper:
container_name: ch_zookeeper
image: zookeeper:3.6.1
ports:
- 2181:2181
Pay attention that ClickHouse can work with ZooKeeper cluster only if its nodes are based on the Docker image by bitnami.
ClickHouse does about 10 write operations on ZooKeeper for each INSERT
query.
Now you need to tell all the ClickHouse nodes about your ZooKeeper cluster. For this, the config file has a zookeeper
section:
<zookeeper>
<node index="1">
<host>zk-1</host>
<port>2181</port>
</node>
<node index="2">
<host>zk-2</host>
<port>2181</port>
</node>
</zookeeper>
Cluster configuration
I will use 1 master with 2 shards, 2 replicas for each shard. Here's the deployments configuration:
ch-master:
container_name: ch_master
image: yandex/clickhouse-server:19.14.13.4
depends_on:
- ch-zookeeper
- ch-sub-1
- ch-sub-2
- ch-sub-3
- ch-sub-4
ports:
- 9000:9000
- 8123:8123
volumes:
- type: volume
source: ch-master-data
target: /var/lib/clickhouse
- type: volume
source: ch-master-logs
target: /var/log/clickhouse-server
- ./master-config.xml:/etc/clickhouse-server/config.xml
ch-sub-1:
container_name: ch_sub_1
image: yandex/clickhouse-server:19.14.13.4
depends_on:
- ch-zookeeper
ports:
- 9001:9000
- 8124:8123
volumes:
- type: volume
source: ch-sub-1-data
target: /var/lib/clickhouse
- type: volume
source: ch-sub-1-logs
target: /var/log/clickhouse-server
- ./sub-1-config.xml:/etc/clickhouse-server/config.xml
ch-sub-2:
container_name: ch_sub_2
image: yandex/clickhouse-server:19.14.13.4
depends_on:
- ch-zookeeper
ports:
- 9002:9000
- 8125:8123
volumes:
- type: volume
source: ch-sub-2-data
target: /var/lib/clickhouse
- type: volume
source: ch-sub-2-logs
target: /var/log/clickhouse-server
- ./sub-2-config.xml:/etc/clickhouse-server/config.xml
ch-sub-3:
container_name: ch_sub_3
image: yandex/clickhouse-server:19.14.13.4
depends_on:
- ch-zookeeper
ports:
- 9003:9000
- 8126:8123
volumes:
- type: volume
source: ch-sub-3-data
target: /var/lib/clickhouse
- type: volume
source: ch-sub-3-logs
target: /var/log/clickhouse-server
- ./sub-3-config.xml:/etc/clickhouse-server/config.xml
ch-sub-4:
container_name: ch_sub_4
image: yandex/clickhouse-server:19.14.13.4
depends_on:
- ch-zookeeper
ports:
- 9004:9000
- 8127:8123
volumes:
- type: volume
source: ch-sub-4-data
target: /var/lib/clickhouse
- type: volume
source: ch-sub-4-logs
target: /var/log/clickhouse-server
- ./sub-4-config.xml:/etc/clickhouse-server/config.xml
volumes:
ch-master-data:
ch-master-logs:
ch-sub-1-data:
ch-sub-1-logs:
ch-sub-2-data:
ch-sub-2-logs:
ch-sub-3-data:
ch-sub-3-logs:
ch-sub-4-data:
ch-sub-4-logs:
The remote_servers
section in master-config.xml
will now look like this:
<remote_servers>
<example_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ch-sub-1</host>
<port>9000</port>
</replica>
<replica>
<host>ch-sub-2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ch-sub-3</host>
<port>9000</port>
</replica>
<replica>
<host>ch-sub-4</host>
<port>9000</port>
</replica>
</shard>
</example_cluster>
</remote_servers>
As you can see, the internal_replication
parameter is now specified for each shard. It's set to true
so the replicas themselves will send all their data to other replicas of the shard instead of doing it through the master node. If the parameter is set to false
, the master node will replicate all the data on it's own which may lead to an inconsistent state when all the replicas contain different data.
As for configuration files for subordinate nodes, a new section must be defined in them in order to enable replication. This section is called macros
.
<macros>
<shard>01</shard>
<replica>ch-sub-1</replica>
</macros>
It contains parameters that will be accessible through SQL queries related to the node. In the above source, shard
and replica
are parameters defined by the user, and "01"
and "ch-sub-1"
are their corresponding values. The first parameter is a cluster identifier and the second one is a replica name which coincides with the domain name specified in docker-compose.yml
.
After all the config files are set up, we can finally use scripts to create a cluster and run it. When all the database nodes are up and running, we should first execute our Python scripts for subordinate nodes. All of them look like this:
from clickhouse_driver import Client
from datetime import datetime
if __name__ == "__main__":
client = Client("127.0.0.1", port="9001")
client.execute("CREATE DATABASE IF NOT EXISTS billing")
client.execute(r'''CREATE TABLE IF NOT EXISTS billing.transactions(
timestamp DateTime,
currency String,
value Float64)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/billing.transactions', '{replica}')
PARTITION BY currency
ORDER BY timestamp''')
As you can see, the subordinate table now uses ReplicatedMergeTree
engine. Its constructor takes the path to the table records in ZooKeeper as the first parameter and the replica name as the second parameter. The path to the table in ZooKeeper should be unique. All the parameters in {}
are taken from the aforementioned macros
section of the replica config file.
When all the subordinate tables are created, it's time to create a master table. There's no difference from the previous case when only sharding was utilized:
from clickhouse_driver import Client
from datetime import datetime
if __name__ == "__main__":
client = Client("127.0.0.1", port="9000")
client.execute("CREATE DATABASE IF NOT EXISTS billing")
client.execute('''CREATE TABLE IF NOT EXISTS billing.transactions(
timestamp DateTime,
currency String,
value Float64)
ENGINE = Distributed(example_cluster, billing, transactions, rand())''')
If you set up all the things properly, you will get a working ClickHouse cluster with replication enabled. The shard is alive if at least one of its replicas is up. Table replication strengthens fault tolerance of the cluster.
All the necessary source code is here.
Top comments (1)
I'm really happy you decided to make this part II. Thanks!