Jéssica Coelho

Posted on Nov 25

🇺🇸 PostgreSQL Partitioning Strategies for Massive Databases — Part 1

#postgressql #partitioning #performance #database

Every year, companies face an exponential increase in data volume. Solutions that work perfectly for smaller tables often start to fail when the dataset reaches massive scales. That’s why we say:
“What works for 10 million rows does not necessarily scale well for 10 billion.”

When tables reach this level, database administrators must completely rethink their storage and indexing strategies.

And that’s where partitioning becomes a powerful and essential technique.

Before diving into technical details, let’s go through a bit of history.

Before PostgreSQL 10 — Inherited Tables

Partitioning was implemented using table inheritance (INHERITS):
A physical parent table was created.
Multiple child tables represented each partition.
Inserts were routed using triggers or rules, which redirected rows to the correct partition.

Disadvantages:
There was no automatic exclusion of partitions from the execution plan (partition pruning).
The optimizer had to scan all partitions, even those that were not relevant.
Operations like UPDATE and DELETE were costly and impractical.
Managing partitions (creating, dropping, and maintaining them) was labor-intensive.

The Arrival of Declarative Partitioning (PostgreSQL 10, 2017)

PostgreSQL introduced native declarative partitioning, allowing developers to declare partition structures without triggers or manual logic.

CREATE TABLE sales ( id serial, sale_date date, amount numeric ) PARTITION BY RANGE (sale_date);

Partitions are created using CREATE TABLE ... PARTITION OF.
Initially supported RANGE and LIST partitioning.

Syntax defines the method, keys, and value boundaries.

📈 Partitioning Evolution Since PostgreSQL 10

🧩 What Is Partitioning?

Partitioning is the technique of splitting a large table into smaller pieces called partitions.

For the application, it still behaves like one logical table, but internally PostgreSQL distributes the data across multiple physical tables.

Key concepts:

The parent table is logical and stores only metadata.
Child tables are real PostgreSQL tables with schemas and tablespaces.
Child tables inherit from the parent table using declarative partitioning.
Child tables can have their own indexes.
Child tables can be sub-partitioned with different keys. -PostgreSQL automatically routes INSERT/SELECT/UPDATE/DELETE to the correct partition. -You cannot add partitioning to an existing non-partitioned table. -Partitioning is beneficial mainly for huge tables (typically larger than RAM). -PK/UNIQUE constraints must include the partition key. -Indexes are local to each partition. -FKs had limitations before PostgreSQL 15. -Partition boundaries must not overlap. -There is a DEFAULT partition. If you don’t create one, when you try to insert data outside the partition ranges, you will get an error like this:

ERROR: no partition of relation “mother_table” found for row
DETAIL: Failing row contains (coluna_particao = X, …).

-Partition keys cannot be NULL.
-VACUUM/ANALYZE operate on each partition.
-Partitions can be ATTACHED or DETACHED without losing data.
-Partition sizes do not need to be identical.

Partitioning Syntax Example

Base table:

CREATE TABLE payment (
payment_id serial,
payment_date date NOT NULL,
amount numeric(10,2) NOT NULL
);

Partitioned version:

CREATE TABLE payment (
payment_id serial PRIMARY KEY,
payment_date date NOT NULL,
amount numeric(10,2) NOT NULL
) PARTITION BY RANGE (payment_date);

Creating monthly partitions:
CREATE TABLE payment_2024_01
PARTITION OF payment
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE payment_2024_02
PARTITION OF payment
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE payment_2024_03
PARTITION OF payment
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

CREATE TABLE payment_2024_04
PARTITION OF payment
FOR VALUES FROM ('2024-04-01') TO ('2024-05-01');

Inserts remain unchanged
INSERT INTO payment (payment_date, amount)
VALUES ('2024-04-20', 120.00);

PostgreSQL automatically routes the data to the correct partition.

⚡ Partition Pruning

Partition pruning is the PostgreSQL technique that skips irrelevant partitions during query execution.

Instead of scanning every child table, the optimizer reads only the partitions that match the WHERE condition.

Example:
SELECT *
FROM payment
WHERE payment_date = '2024-04-20';

Before:

With partitioning, PostgreSQL only scans the partition payment_2024_04

Smaller indexes → better caching → predictable performance
Reduced I/O and CPU usage

Partitioning transforms one single big index into many smaller, more efficient indexes.

🧰 Partitioning Methods
➤ 1. RANGE Partitioning

Divide by continuous intervals (dates, numbers, sequences).

CREATE TABLE payment (
payment_id serial PRIMARY KEY,
payment_date date NOT NULL,
amount numeric(10,2) NOT NULL
) PARTITION BY RANGE (payment_date);

CREATE TABLE payment_2024_q1
PARTITION OF payment
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE payment_2024_q2
PARTITION OF payment
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

📌 Best for: time-series tables (logs, sales, metrics)

➤ 2. LIST Partitioning

Divide by specific discrete values.

CREATE TABLE payment (
payment_id serial PRIMARY KEY,
country text NOT NULL,
amount numeric(10,2) NOT NULL
) PARTITION BY LIST (country);

CREATE TABLE payment_br
PARTITION OF payment
FOR VALUES IN ('BR');

CREATE TABLE payment_us
PARTITION OF payment
FOR VALUES IN ('US');

📌 Best for: country, product type, category

➤ 3. HASH Partitioning

(Introduced in PostgreSQL 11)

Uniform distribution via hash function.

CREATE TABLE payment (
payment_id serial PRIMARY KEY,
customer_id int NOT NULL,
amount numeric(10,2) NOT NULL
) PARTITION BY HASH (customer_id);

CREATE TABLE payment_hash_0
PARTITION OF payment
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE payment_hash_1
PARTITION OF payment
FOR VALUES WITH (MODULUS 4, REMAINDER 1);

📌 Best for: evenly distributing random data (customer IDs, hashed keys)

➤ 4. Sub-Partitioning (Composite Partitioning)

Combine methods (e.g., RANGE + LIST).

CREATE TABLE payment (
payment_id serial PRIMARY KEY,
payment_date date NOT NULL,
country text NOT NULL,
amount numeric(10,2) NOT NULL
) PARTITION BY RANGE (payment_date);

CREATE TABLE payment_2024
PARTITION OF payment
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY LIST (country);

CREATE TABLE payment_2024_br
PARTITION OF payment_2024
FOR VALUES IN ('BR');

CREATE TABLE payment_2024_us
PARTITION OF payment_2024
FOR VALUES IN ('US');

📌 Best for: combinations like
time + region
time + status
category + product

🆚 Partitioning vs. Sharding
Technique What It Does Scope
Partitioning Splits one table into smaller internal partitions within a single database server Vertical scaling
Sharding Splits data across multiple servers/databases, each storing a portion of the dataset Horizontal scaling

Summary:

Partitioning = optimize performance inside one PostgreSQL instance:
Server 1
└── Parent Table
├── Partition Jan
├── Partition Feb
└── Partition Mar

Sharding = distribute data across multiple servers to scale beyond hardware limits
Server 1
└── Shard A

Server 2
└── Shard B

Server 3
└── Shard C

What’s Coming Next — Part 2

In this first article, we explored how PostgreSQL has evolved, why partitioning is crucial for massive datasets, and which internal mechanisms the database uses to make queries faster and more efficient.

But theory alone doesn’t solve real-world problems.

In Part 2, I’ll demonstrate practical, hands-on examples:

how to partition huge tables,
how to choose between RANGE, LIST, HASH, or sub-partitioning,
how to migrate an existing table into a partitioned structure,
how to measure actual performance gains,

If you want real-world examples that work at the scale of billions of rows, don’t miss the next article.

See you there.