William Kwabena Akoto

Posted on Jan 28

Horizontal Scaling PostgreSQL with Citus: A Practical Deep Dive

#postgres #distributedatabases #citus #systemdesign

Introduction

As databases grow beyond the capabilities of a single server, teams face a critical decision: scale vertically by adding more resources to one machine, or scale horizontally by distributing data across multiple servers. While vertical scaling eventually hits physical and economic limits, horizontal scaling offers virtually unlimited growth potential.

In this hands-on guide, we'll explore Citus—an open-source extension that transforms PostgreSQL into a distributed database. We'll build a real Citus cluster from scratch, demonstrate data distribution, implement foreign key relationships across distributed tables, and understand the principles that make horizontal scaling work.

By the end, you'll have practical experience with distributed databases and understand both their power and their limitations.

What is Horizontal Scaling?

Imagine a pizza restaurant with one chef making all the pizzas. During rush hour, orders pile up because one person can only work so fast. This is vertical scaling—you could train the chef to work faster (add more CPU), give them better tools (add more RAM), or expand their workspace (add more storage). But eventually, you hit limits.

Now imagine hiring three chefs, each handling different orders. This is horizontal scaling—you distribute the workload across multiple workers. The more customers you have, the more chefs you can add.

In database terms:

Vertical scaling: Bigger server (more CPU, RAM, storage)
Horizontal scaling: More servers working together

Citus brings horizontal scaling to PostgreSQL by:

Splitting your data across multiple worker nodes (sharding)
Keeping related data together on the same node (co-location)
Maintaining PostgreSQL's ACID guarantees and SQL compatibility

Architecture: Understanding Citus Components

A Citus cluster consists of at least three components:

1. Coordinator Node

The coordinator is like a project manager—it doesn't do the heavy lifting but knows where everything is. When you connect to a Citus cluster, you connect to the coordinator. It:

Routes queries to the appropriate workers
Combines results from multiple workers
Manages distributed transactions
Stores metadata about data distribution

2. Worker Nodes

Workers are the muscle of your cluster—they store the actual data and execute queries. Each worker:

Holds a subset of your data (shards)
Executes queries locally on its data
Communicates with other workers when needed
Functions as a full PostgreSQL instance

3. Shards

Shards are pieces of your distributed tables. Think of them as mini-tables that together form your complete dataset. When you distribute a table:

Citus creates 32 shards by default (configurable)
Shards are distributed across workers
Each row goes to exactly one shard based on a hash function
Related data can be co-located on the same worker

Setting Up a Citus Cluster on Digital Ocean

For this demonstration, we'll create a three-node cluster:

citus-01: Coordinator node
citus-02: Worker 1
citus-03: Worker 2

Step 1: Install PostgreSQL and Citus Extension

On all three droplets:

# Add PostgreSQL repository
sudo apt-get update
sudo apt-get install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh

# Install PostgreSQL 16
sudo apt-get install -y postgresql-16

# Install Citus extension
curl https://install.citusdata.com/community/deb.sh | sudo bash
sudo apt-get install -y postgresql-16-citus-12.1

# Configure PostgreSQL to load Citus
echo "shared_preload_libraries = 'citus'" | sudo tee -a /etc/postgresql/16/main/postgresql.conf

# Restart PostgreSQL
sudo systemctl restart postgresql

Step 2: Configure Network Access

Edit /etc/postgresql/16/main/postgresql.conf on all nodes:

listen_addresses = '*'

Edit /etc/postgresql/16/main/pg_hba.conf on all nodes to allow connections:

host    all             all             0.0.0.0/0               md5

Restart PostgreSQL:

sudo systemctl restart postgresql

Step 3: Initialize the Coordinator

On the coordinator node (citus-01):

-- Connect as postgres user
sudo -u postgres psql

-- Create the Citus extension
CREATE EXTENSION citus;

-- Add worker nodes to the cluster
SELECT citus_add_node('10.114.0.11', 5432);
SELECT citus_add_node('10.114.0.12', 5432);

-- Verify worker registration
SELECT * FROM citus_get_active_worker_nodes();

You should see both worker nodes listed:

  node_name  | node_port 
-------------+-----------
 10.114.0.11 |      5432
 10.114.0.12 |      5432

Step 4: Initialize Workers

On each worker node (citus-02 and citus-03):

sudo -u postgres psql
CREATE EXTENSION citus;

Your Citus cluster is now operational!

Demonstration 1: Basic Data Distribution

Let's create a simple distributed table and observe how data spreads across workers.

Creating a Distributed Table

-- On the coordinator
CREATE TABLE citus_demo (
    id SERIAL,
    user_name TEXT,
    city TEXT,
    age INT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Distribute the table by 'id' column
SELECT create_distributed_table('citus_demo', 'id');

The create_distributed_table function tells Citus to:

Create 32 shards (mini-tables) by default
Distribute these shards across available workers
Route future queries based on the distribution column (id)

Inserting Data

INSERT INTO citus_demo (user_name, city, age) VALUES
    ('Alice', 'New York', 25),
    ('Bob', 'Chicago', 30),
    ('Charlie', 'Seattle', 27),
    ('David', 'Boston', 35),
    ('Emma', 'Denver', 22),
    ('Frank', 'Portland', 29),
    ('Grace', 'Miami', 28),
    ('Henry', 'Atlanta', 31),
    ('Iris', 'Phoenix', 26),
    ('Jack', 'Dallas', 33);

Examining Distribution

Check how shards are distributed:

SELECT * FROM citus_shards WHERE table_name::text = 'citus_demo';

Result (abbreviated):

 table_name | shardid |    shard_name     | nodename    | nodeport | shard_size 
------------+---------+-------------------+-------------+----------+------------
 citus_demo |  102488 | citus_demo_102488 | 10.114.0.11 |     5432 |      16384
 citus_demo |  102489 | citus_demo_102489 | 10.114.0.12 |     5432 |      16384
 citus_demo |  102490 | citus_demo_102490 | 10.114.0.11 |     5432 |       8192
 ... (32 shards total)

We see 32 shards split between two workers. Some shards have data (16 KB), others are empty (8 KB).

Understanding Shard Assignment

See which shard each record belongs to:

SELECT id, user_name, city,
       get_shard_id_for_distribution_column('citus_demo', id) as shard_id
FROM citus_demo
ORDER BY id;

Result:

 id | user_name |   city   | shard_id 
----+-----------+----------+----------
  1 | Alice     | New York |   102489
  2 | Bob       | Chicago  |   102512
  3 | Charlie   | Seattle  |   102503
  4 | David     | Boston   |   102496
  5 | Emma      | Denver   |   102494
  6 | Frank     | Portland |   102508
  7 | Grace     | Miami    |   102496
  8 | Henry     | Atlanta  |   102488
  9 | Iris      | Phoenix  |   102516
 10 | Jack      | Dallas   |   102492

Notice that David (id=4) and Grace (id=7) share shard 102496—this happens when their IDs hash to the same shard. With only 10 records across 32 shards, collisions and empty shards are normal.

Viewing Data on Individual Workers

Connect directly to Worker 1 (10.114.0.11):

-- Query a specific shard
SELECT * FROM citus_demo_102488;

Result:

 id | user_name |  city   | age |         created_at         
----+-----------+---------+-----+----------------------------
  8 | Henry     | Atlanta |  31 | 2026-01-23 22:42:50.523619

Worker 1 only stores the data in its shards. Query other shards on this worker:

SELECT * FROM citus_demo_102512;  -- Bob
SELECT * FROM citus_demo_102496;  -- David and Grace

Connect to Worker 2 (10.114.0.12) and query:

SELECT * FROM citus_demo_102489;  -- Alice
SELECT * FROM citus_demo_102503;  -- Charlie

Key insight: Each worker only stores a subset of the data. When you query the main table through the coordinator, Citus automatically fetches data from all workers and combines the results.

Demonstration 2: Foreign Keys and Co-location

One of the biggest challenges in distributed databases is maintaining relationships between tables. Let's explore how Citus handles foreign keys through co-location.

The Challenge

Imagine you have users and their orders:

Users table: Alice, Bob, Charlie
Orders table: Order 1 (Alice's pizza), Order 2 (Bob's burger), Order 3 (Alice's soda)

In a distributed setup, what if:

Alice's user record is on Worker 1
Alice's orders are on Worker 2

When inserting an order for Alice, Worker 2 must verify Alice exists—but Alice is on Worker 1! This requires expensive cross-node communication for every foreign key check.

The Solution: Co-location

Co-location ensures related data lives on the same worker. Citus achieves this by distributing both tables using the same column.

Creating Co-located Tables

-- Create users table
CREATE TABLE users (
    user_id INT PRIMARY KEY, 
    name TEXT
);

-- Create orders table with composite primary key
CREATE TABLE orders (
    order_id SERIAL,
    user_id INT,
    item TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (order_id, user_id)  -- Must include distribution column
);

-- Distribute both tables by user_id
SELECT create_distributed_table('users', 'user_id');
SELECT create_distributed_table('orders', 'user_id');

-- Add foreign key constraint
ALTER TABLE orders 
ADD CONSTRAINT fk_user 
FOREIGN KEY (user_id) REFERENCES users(user_id);

Important: The orders primary key includes user_id because Citus requires the distribution column in unique constraints. This ensures uniqueness can be verified locally on each worker.

Inserting Related Data

INSERT INTO users (user_id, name) VALUES
    (1, 'Alice'),
    (2, 'Bob'),
    (3, 'Charlie'),
    (4, 'David'),
    (5, 'Emma');

INSERT INTO orders (user_id, item) VALUES
    (1, 'Pizza'),
    (1, 'Soda'),
    (1, 'Ice Cream'),
    (2, 'Burger'),
    (2, 'Fries'),
    (3, 'Taco'),
    (3, 'Burrito'),
    (3, 'Nachos'),
    (4, 'Pasta'),
    (5, 'Salad');

Verifying Co-location

Check which shards contain user and order data:

SELECT 
    u.user_id,
    u.name,
    o.item,
    get_shard_id_for_distribution_column('users', u.user_id) as user_shard,
    get_shard_id_for_distribution_column('orders', o.user_id) as order_shard
FROM users u
JOIN orders o ON u.user_id = o.user_id
ORDER BY u.user_id, o.order_id;

Result:

 user_id |  name   |   item    | user_shard | order_shard 
---------+---------+-----------+------------+-------------
       1 | Alice   | Pizza     |     102521 |      102553
       1 | Alice   | Soda      |     102521 |      102553
       1 | Alice   | Ice Cream |     102521 |      102553
       2 | Bob     | Burger    |     102544 |      102576
       2 | Bob     | Fries     |     102544 |      102576
       3 | Charlie | Taco      |     102535 |      102567
       3 | Charlie | Burrito   |     102535 |      102567
       3 | Charlie | Nachos    |     102535 |      102567

Wait, the shard IDs are different! This might seem wrong, but it's actually correct.

Understanding Different Shard IDs

The users table has its own set of shards (102521, 102544, 102535...), and the orders table has different shard IDs (102553, 102576, 102567...). However, what matters is that they're on the same physical worker node.

Verify this:

SELECT 
    table_name,
    shardid,
    nodename
FROM citus_shards 
WHERE shardid IN (102521, 102553)  -- Alice's user and order shards
ORDER BY nodename, table_name;

Result:

 table_name | shardid |  nodename   
------------+---------+-------------
 orders     |  102553 | 10.114.0.11
 users      |  102521 | 10.114.0.11

Both shards are on Worker 1 (10.114.0.11)! This is co-location—different shard tables, same physical location.

Think of it like two filing cabinets (users and orders) in the same office. Even though they're separate cabinets with different drawer labels, they're in the same room, so finding related information is instant.

Testing Foreign Key Constraints

Try inserting an order for a non-existent user:

INSERT INTO orders (user_id, item) VALUES (999, 'Ghost Pizza');

Error:

ERROR:  insert or update on table "orders" violates foreign key constraint "orders_user_id_fkey"
DETAIL:  Key (user_id)=(999) is not present in table "users".

Perfect! The foreign key constraint works. Because user_id=999 doesn't exist, and thanks to co-location, the worker can check this locally without network calls to other workers.

Performing JOINs

Count orders per user:

SELECT 
    u.user_id, 
    u.name, 
    COUNT(o.order_id) as num_orders
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.name
ORDER BY u.user_id;

Result:

 user_id |  name   | num_orders 
---------+---------+------------
       1 | Alice   |          3
       2 | Bob     |          2
       3 | Charlie |          3
       4 | David   |          1
       5 | Emma    |          1

This JOIN is fast because each worker can join its local user and order data without coordinating with other workers. This is the power of co-location.

Key Concepts and Best Practices

1. Distribution Column Selection

Choose your distribution column carefully:

Good choices:

user_id for multi-tenant applications (each tenant's data together)
account_id for SaaS applications
customer_id for e-commerce

Bad choices:

created_at (causes temporal hotspots)
Low-cardinality columns (poor distribution)
Columns that change frequently

Rule of thumb: Pick a column that:

Appears in most queries (for query efficiency)
Enables co-location of related tables
Has high cardinality (many unique values)
Remains stable (doesn't change)

2. Co-location Requirements

For foreign keys to work across distributed tables:

Same distribution column: Both parent and child tables must be distributed by the same column
Include in constraints: The distribution column must be part of PRIMARY KEY and UNIQUE constraints
Foreign key on distribution column: The foreign key must reference the distribution column

3. Shard Count

Default: 32 shards per table

Too few shards: Limits parallelism and future scalability
Too many shards: Increases metadata overhead and query planning time

For most applications, 32-128 shards works well. Adjust based on:

Expected data size
Number of workers (more workers = more shards beneficial)
Query patterns

4. When to Use Citus

Good use cases:

Multi-tenant SaaS applications
Real-time analytics on time-series data
High-throughput transactional workloads
Applications that exceed single-server capacity
Workloads with natural sharding keys

Not ideal for:

Small databases (< 100GB)
Workloads requiring many cross-shard JOINs
Applications with no clear distribution key
Use cases where a single PostgreSQL instance suffices

Performance Considerations

Query Routing

Citus routes queries differently based on their scope:

1. Single-shard queries (fastest):

-- Routed to one worker because user_id=1 uniquely determines shard
SELECT * FROM orders WHERE user_id = 1;

2. Co-located JOINs (fast):

-- Each worker joins its local data, then results are combined
SELECT u.name, COUNT(o.*)
FROM users u
JOIN orders o ON u.user_id = o.user_id
GROUP BY u.name;

3. Broadcast queries (slower):

-- Must query all shards and combine results
SELECT * FROM orders WHERE item = 'Pizza';

4. Cross-shard queries (slowest):

-- Requires moving data between workers
SELECT o.item, p.product_name
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.user_id = 1;
-- Only works if products is a reference table or co-located

Reference Tables

For small lookup tables (countries, products, categories), use reference tables:

CREATE TABLE countries (
    code CHAR(2) PRIMARY KEY,
    name TEXT
);

-- Replicate to all workers
SELECT create_reference_table('countries');

Reference tables are fully replicated to every worker, enabling efficient JOINs with distributed tables.

Monitoring and Maintenance

Checking Cluster Health

-- View active workers
SELECT * FROM citus_get_active_worker_nodes();

-- Check shard distribution
SELECT nodename, COUNT(*) as shard_count
FROM citus_shards
GROUP BY nodename;

-- Monitor distributed queries
SELECT * FROM citus_stat_statements;

-- Check replication lag (if using HA setup)
SELECT * FROM citus_replication_status();

Rebalancing Shards

As you add workers, rebalance shards:

-- Move shards to balance load
SELECT citus_rebalance_start();

-- Check rebalance progress
SELECT * FROM citus_rebalance_status();

Conclusion

We've successfully demonstrated horizontal scaling with Citus by:

Building a distributed cluster with one coordinator and two workers
Creating distributed tables and observing how data spreads across shards
Implementing foreign key relationships through co-location
Verifying co-location by checking that related data resides on the same workers
Testing foreign key constraints to ensure data integrity
Performing distributed JOINs that execute efficiently due to co-location

Citus transforms PostgreSQL into a horizontally scalable database while maintaining SQL compatibility and ACID guarantees. By distributing data intelligently and co-locating related tables, it achieves impressive performance even as datasets grow beyond single-server capacity.

The Elephant in the Room: Single Points of Failure

However, our current setup has a critical limitation: every node is a single point of failure.

Consider these scenarios:

Coordinator failure: If citus-01 goes down, your entire application stops. No queries can be routed, no data can be accessed, even though workers are healthy.
Worker failure: If citus-02 crashes, all data on that worker becomes unavailable. Half your users suddenly can't access their orders.
Maintenance downtime: Need to apply a security patch? You'll have to take the coordinator offline, causing application downtime.

In production environments, single points of failure are unacceptable. Your database must survive:

Hardware failures
Network issues
Planned maintenance
Software crashes
Data center outages

What's Next: High Availability with Patroni

In our next article, we'll solve these availability challenges by integrating Patroni—an open-source high availability solution for PostgreSQL. We'll transform our vulnerable single-node cluster into a highly available system:

Architecture we'll build:

Coordinator cluster: Primary + Standby (automatic failover)
Worker 1 cluster: Primary + Standby
Worker 2 cluster: Primary + Standby
etcd cluster: Distributed consensus for leader election

What you'll learn:

Setting up Patroni with etcd for distributed consensus
Configuring automatic failover for coordinators and workers
Testing failure scenarios (simulating crashes)
Monitoring cluster health and replication lag
Performing switchovers for maintenance
Understanding trade-offs between availability and consistency

By combining Citus's horizontal scalability with Patroni's high availability, you'll have a production-ready distributed database that can:

Scale to billions of rows
Handle millions of queries per second
Survive node failures automatically
Support zero-downtime maintenance
Provide strong consistency guarantees

Stay tuned for "High Availability for Citus: Implementing Automatic Failover with Patroni" where we'll make this scalable database truly resilient.

Resources

Citus Documentation: https://docs.citusdata.com
Citus GitHub: https://github.com/citusdata/citus
PostgreSQL Documentation: https://www.postgresql.org/docs/
Digital Ocean Citus Tutorial: https://docs.digitalocean.com/products/databases/postgresql/

Special thanks to the Citus team for building such an elegant distributed database solution.