This article is a step by step guide to partition a postgreSQL table by range and would analyze the various query performances comparing with un-partitioned table.
In this article, we will delve into the details of PostgreSQL’s table partitioning capabilities, exploring the built-in partitioning methods it supports. Additionally, we’ll conduct an quick analysis of query performance on partitioned tables to highlight the benefits and limitations of this approach. Finally, we’ll wrap up with some important design considerations while using table partitioning in PostgreSQL.
What is partitioning and why partition a table ?
Database partitioning is a technique where we split a table to multiple smaller tables called partition and the let the database decide which partition to query based on the where clause specified in the query.
As the database size grows, the response time increase even if we have indexes. In such cases partition can help, by creating partitions we are enabling the dB engine to query a smaller subset of data which improves response times.
Dropping a partition can very quickly delete millions of records as Postgres does not have to individually delete every record.
Salient points
A partitioned table is a virtual table which does not store any data on its own. Data gets stored in the partitions which are all attached to the partitioned/virtual table.
Any index or unique constraint created on partitioned table automatically gets created on every partitions.
A partition itself is an ordinary table which stores subset of data based on the partitioning column. On insert or update, Postgres takes care of routing the data to appropriate partition based on the partitioning column.
All the partitions must have same column as their parent. However a partition can have its own indexes, constraints and default values which differ from other partitions.
Partitioning types
Postgres support following partitions:
Range: In this type data is partitioned based on a specified range of values in a column.
-- Create a table partition by range
CREATE TABLE IF NOT EXISTS orders_parts
(
id uuid DEFAULT uuid_generate_v4(),
name character varying(255) NOT NULL,
ordered_time timestamp without time zone NOT NULL,
price numeric,
country character varying(255) not null,
PRIMARY KEY (id, ordered_time)
) PARTITION BY RANGE (ordered_time);
-- Define partition
CREATE TABLE orders_w1 PARTITION OF orders_parts
FOR VALUES FROM ('2024-08-15 00:00:00') TO ('2024-08-22 00:00:00')
CREATE TABLE orders_w2 PARTITION OF orders_parts
FOR VALUES FROM ('2024-08-22 00:00:00') TO ('2024-08-29 00:00:00')
List: In this type data is partitioned based on a discrete values.
-- Create a table partition by list
CREATE TABLE IF NOT EXISTS orders_parts
(
id uuid DEFAULT uuid_generate_v4(),
name character varying(255) NOT NULL,
ordered_time timestamp without time zone NOT NULL,
price numeric,
country character varying(255) not null,
PRIMARY KEY (id, country)
) PARTITION BY LIST (country);
-- Define partition
CREATE TABLE orders_parts_us PARTITION OF orders_parts FOR VALUES IN ('US');
CREATE TABLE orders_parts_uk PARTITION OF orders_parts FOR VALUES IN ('UK');
CREATE TABLE orders_parts_in PARTITION OF orders_parts FOR VALUES IN ('IN');
Hash: The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
An error often encountered during the creation of partitioned table is ERROR: unique constraint on partitioned table must include all partitioning columns. This is a limitation with Postgres table partitioning which enforces to have all partition key columns as part of unique constraints.
Sub-partitioning
A partition may istself be defined as a partitioned table which would result in a sub-partition.
-- Create a table partition by range
CREATE TABLE IF NOT EXISTS orders_parts
(
id uuid DEFAULT uuid_generate_v4(),
name character varying(255) NOT NULL,
ordered_time timestamp without time zone NOT NULL,
price numeric,
country character varying(255) not null,
PRIMARY KEY (id, ordered_time,country)
) PARTITION BY RANGE (ordered_time);
-- Make further partition by list
CREATE TABLE orders_w1 PARTITION OF orders_parts
FOR VALUES FROM ('2024-08-15 00:00:00') TO ('2024-08-22 00:00:00') PARTITION BY list (country);
CREATE TABLE orders_w2 PARTITION OF s1.orders_parts
FOR VALUES FROM ('2024-08-22 00:00:00') TO ('2024-08-29 00:00:00') PARTITION BY list (country);
CREATE TABLE orders_w3 PARTITION OF s1.orders_parts
FOR VALUES FROM ('2024-08-29 00:00:00') TO ('2024-09-05 00:00:00') PARTITION BY list (country);
-- Define partitions
CREATE TABLE orders_w1_us PARTITION OF orders_w1 for values in ('us');
CREATE TABLE orders_w1_uk PARTITION OF orders_w1 for values in ('uk');
Let’s partition a table by range
Create an orders table partitioned by range.
CREATE TABLE IF NOT EXISTS orders_parts
(
id uuid DEFAULT uuid_generate_v4(),
name character varying(255) NOT NULL,
ordered_time timestamp without time zone NOT NULL,
price numeric,
country character varying(255) not null,
PRIMARY KEY (id, ordered_time)
) PARTITION BY RANGE (ordered_time);
Now let’s create partitions for orders_parts table. We will be creating weekly partitions on ordered time.
create table orders_w1 partition of orders_parts for values from ('2024-08-15') to ('2024-08-22');
create table orders_w2 partition of orders_parts for values from ('2024-08-22') to ('2024-08-29');
create table orders_w3 partition of orders_parts for values from ('2024-08-29') to ('2024-09-04');
create table orders_w4 partition of orders_parts for values from ('2024-09-04') to ('2024-09-11');
create table orders_w5 partition of orders_parts for values from ('2024-09-11') to ('2024-09-18');
create table orders_w6 partition of orders_parts for values from ('2024-09-18') to ('2024-09-25');
create table orders_w7 partition of orders_parts for values from ('2024-09-25') to ('2024-10-02');
create table orders_w8 partition of orders_parts for values from ('2024-10-02') to ('2024-10-09');
create table orders_w9 partition of orders_parts for values from ('2024-10-09') to ('2024-10-16');
Insert data into orders_parts . Postgres will spread data into respective partition based on the ordered created time of each record. We have approximately 2.8 million records spread across nine partitions.
-- total records
select count(*) from orders;//2848298
-- records in individual partitions
select count(*) from orders_parts //2848298
select count(*) from orders_w1; //79114
select count(*) from orders_w2; //355065
select count(*) from orders_w3; //304873
select count(*) from orders_w4; //356240
select count(*) from orders_w5; //355411
select count(*) from orders_w6; //354667
select count(*) from orders_w7; //355691
select count(*) from orders_w8; //355547
select count(*) from orders_w9; //331690
Query performance after partitioning
Case 1: When the ordered date range lies in a single partition.
#Explain analyze respone for partitioned table
Seq Scan on orders_w5 orders_parts (cost=0.00..9370.17 rows=254560 width=45) (actual time=0.020..64.817 rows=253528 loops=1)
Filter: ((ordered_time >= '2024-09-12 00:00:00'::timestamp without time zone) AND (ordered_time <= '2024-09-17 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 101883
Planning Time: 0.157 ms
Execution Time: 73.112 ms
#Explain analyze respone for un-partitioned table
Seq Scan on orders (cost=0.00..75092.36 rows=253315 width=45) (actual time=0.088..717.725 rows=253528 loops=1)
Filter: ((ordered_time >= '2024-09-12 00:00:00'::timestamp without time zone) AND (ordered_time <= '2024-09-17 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 2594770
Planning Time: 0.145 ms
Execution Time: 729.850 ms
Case 2: When the ordered date range spans across two to three partitions.
#Explain analyze respone for partitioned table
Append (cost=0.00..31648.22 rows=709937 width=45) (actual time=0.029..345.060 rows=710078 loops=1)
-> Seq Scan on orders_w5 orders_parts_1 (cost=0.00..9370.17 rows=355340 width=45) (actual time=0.027..82.652 rows=355411 loops=1)
Filter: ((ordered_time >= '2024-09-11 00:00:00'::timestamp without time zone) AND (ordered_time <= '2024-09-25 00:00:00'::timestamp without time zone))
-> Seq Scan on orders_w6 orders_parts_2 (cost=0.00..9351.01 rows=354596 width=45) (actual time=0.101..118.500 rows=354667 loops=1)
Filter: ((ordered_time >= '2024-09-11 00:00:00'::timestamp without time zone) AND (ordered_time <= '2024-09-25 00:00:00'::timestamp without time zone))
-> Seq Scan on orders_w7 orders_parts_3 (cost=0.00..9377.36 rows=1 width=45) (actual time=95.430..95.430 rows=0 loops=1)
Filter: ((ordered_time >= '2024-09-11 00:00:00'::timestamp without time zone) AND (ordered_time <= '2024-09-25 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 355691
Planning Time: 1.056 ms
Execution Time: 372.142 ms
#Explain analyze respone for un-partitioned table
Seq Scan on orders (cost=0.00..75092.36 rows=715656 width=45) (actual time=0.060..751.562 rows=710078 loops=1)
Filter: ((ordered_time >= '2024-09-11 00:00:00'::timestamp without time zone) AND (ordered_time <= '2024-09-25 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 2138220
Planning Time: 0.639 ms
Execution Time: 780.560 ms
We can conclude that if we query the table based on partitioned column the execution time decreases significantly.
As the date range increases, in this case it spans across 3 partitions. The query planning time increases significantly, however the execution time is still less compared with unpartitioned table.
Design considerations
Most important design consideration is the choice of column or columns by which we want to partition our data. The columns which appears most in WHERE clauses should be the preferred choice.
Planning the optimal number of partitions is important.Having too many partitions can cause longer query planning times and higher memory consumption during both query planning and execution.
Deleting an old partition or archiving a partition that is not used frequently is easy and can we be done very quickly.
If there is an update that results in movement of records across partition, this will cause performance degradation.
PS
pgAdmin 4 (version 8.10)
PostgreSQL 15.8
Top comments (0)