Sharding in Citus & it's types

#postgres #citus #apache

Introduction

High-performance and scalable databases are essential in the constantly changing world of application development. Traditional monolithic databases frequently fail to meet the requirements of contemporary applications, particularly those that are expanding quickly. Database sharding, a technology that distributes data across numerous servers to enhance performance and scalability, is one creative response to this problem. Row-Based Sharding and Schema-Based Sharding are two common sharding techniques that will be discussed in this article. We'll also go over how to switch to Citus, a robust PostgreSQL extension that makes sharding easy, and how to choose the best sharding strategy for your application.

Sharding in a Nutshell

Database sharding is a technique for managing and horizontally splitting data to improve database performance. The data is split up into smaller, easier to handle chunks known as "shards" via sharding rather than depending on a single, enormous database server. The workload is distributed among the individual servers or nodes where each shard resides, enhancing query performance.

Row-Based Sharding

Row-Based Sharding, sometimes referred to as Horizontal Sharding, is a sharding method that divides data according to the value of a particular column, usually the main key. This technique makes sure that data is distributed evenly among shards and that each shard has about the same amount of data. Applications dealing with very big databases, such IoT or time series data, are well suited for row-based sharding. To ensure effective data dissemination, however, extra data modelling processes, such as adding a "tenant ID" column to all tables, may be necessary.

Schema-Based Sharding

Schema-based sharding differs in that it organises tables from the same schema on the same node. This approach reduces network cost by allowing a single node to perform queries and transactions involving a single schema effectively. The system can scale out without difficulty because distinct schemas can live on different nodes. Schema-Based Sharding has the fewest limits on data modelling and is a great option for applications where distinct components of the data model can reside in multiple schemas. However, only tables that are part of the same schema should be involved in joins and foreign keys.

Choosing the Right Sharding Model

The choice between Row-Based Sharding and Schema-Based Sharding depends on your specific application requirements:

Row-Based Sharding: Opt for this model if you have a smaller number of large tenants (B2B) and are comfortable with additional data modeling steps. It's suitable for scenarios where very large tables, parallel distributed queries, and distributed DML (Data Manipulation Language) are required.
Schema-Based Sharding: Consider this approach if your application naturally divides into distinct groups of tables, such as multi-tenant applications (schema per tenant), microservices (schema per microservice), or vertical partitioning (grouping related tables into schemas). Schema-Based Sharding offers ease of use and is ideal for applications with a very large number of small tenants (B2C).

Conclusion

In conclusion, Citus Schema-Based Sharding offers a powerful solution for scaling and optimizing your database. Whether you opt for Row-Based or Schema-Based Sharding, the key is to align the sharding model with your application's specific needs. With Citus, you can seamlessly transition to a distributed database environment and harness the scalability and performance your application demands to thrive in today's competitive landscape.

DEV Community