DEV Community

Hannan2910
Hannan2910

Posted on

How to Shard your Database using Citus

Introduction

In previous blog we learned about sharding and its types, todays blog will be rather simple but technical as we will learn how to apply both types of sharding on our databases.

Schema-Based Sharding

By turning on the citus.enable_schema_based_sharding parameter, you can immediately begin using schema-based sharding. Once enabled, any schema you produce will be a "distributed schema".

-- Enable schema-based sharding!
set citus.enable_schema_based_sharding to on;

-- Create distributed schemas for two of my tenants
create schema tenant1;
create schema tenant2;

-- See my distributed schemas
select * from citus_schemas;

schema_name  | colocation_id | schema_size | schema_owner
-------------+---------------+-------------+--------------
 tenant1     |             3 | 0 bytes     | marco
 tenant2     |             4 | 0 bytes     | marco
(1 row)

Enter fullscreen mode Exit fullscreen mode

It is now automatic for any table generated in the tenant1 or tenant2 schema to become a "single shard" table that is "co-located" with all other tables in the schema. Single shard tables in Citus lack a distribution column in contrast to hash-distributed tables. Each table only has one shard, and all shards belonging to the same schema reside on the same node.

-- Create single shard tables in schema tenant1 (automatically co-located)
set search_path to tenant1;
create table note_categories (
   category_id bigserial primary key,
   category_name text not null);

create table notes (
   note_id bigserial primary key,
   category_id bigint references note_categories (category_id),
   message text not null);

-- Create single shard tables in schema tenant2 (automatically co-located)
set search_path to tenant2;
create table note_categories (
   category_id bigserial primary key,
   category_name text not null);

create table notes (
   note_id bigserial primary key,
   category_id bigint references note_categories (category_id),
   message text not null);

Enter fullscreen mode Exit fullscreen mode

You can see the tables where they are in the citus_shards.

select table_name, shardid, colocation_id, nodename, nodeport, shard_size 
from citus_shards where citus_table_type = 'schema';


       table_name       | shardid | colocation_id | nodename  | nodeport | shard_size
------------------------+---------+---------------+-----------+----------+------------
tenant1.note_categories |  102310 |             3 | wrk1.host |     5432 |      16384
tenant1.notes           |  102311 |             3 | wrk1.host |     5432 |      16384
tenant2.note_categories |  102312 |             4 | wrk2.host |     5432 |      16384
tenant2.notes           |  102313 |             4 | wrk2.host |     5432 |      16384
(4 rows)

Enter fullscreen mode Exit fullscreen mode

Row-Based Sharding

In Row-Based Sharding you need to call distinct citus function to disrtibute your tables in your schema. after creating your citus cluster you use the follwing to distribute your table
The create_distributed_table UDF will transparently shard your table locally or across the worker nodes:

CREATE TABLE events (
  device_id bigint,
  event_id bigserial,
  event_time timestamptz default now(),
  data jsonb not null,
  PRIMARY KEY (device_id, event_id)
);

-- distribute the events table across shards placed locally or on the worker nodes
SELECT create_distributed_table('events', 'device_id');

Enter fullscreen mode Exit fullscreen mode

To enable high performance distributed joins and foreign keys between distributed tables, distributed tables with the same distribution column can be co-located. The co-location of distributed tables is determined by the type of the distribution column by default, but you can specify co-location specifically using the colocate_with option in the create_distributed_table function.

CREATE TABLE devices (
  device_id bigint primary key,
  device_name text,
  device_type_id int
);
CREATE INDEX ON devices (device_type_id);

-- co-locate the devices table with the events table
SELECT create_distributed_table('devices', 'device_id', colocate_with := 'events');

Enter fullscreen mode Exit fullscreen mode

You can use create_reference_table to replicate a table across all cluster nodes when you require quick joins or foreign keys without the distribution column.


CREATE TABLE device_types (
  device_type_id int primary key,
  device_type_name text not null unique
);

-- replicate the table across all nodes to enable foreign keys and joins on any column
SELECT create_reference_table('device_types')
Enter fullscreen mode Exit fullscreen mode

Conclusion

Now we have learned how to set up a distributed realational database using citus. I hope this helps in understanding the basic use of the exteansion. Thats all for this time.

Top comments (0)