Partitioning is important technique to ensure good performance for databases with large tables. When a table size grows over time each operation cost on the table will increase as well. Partitioning divide a large table into smaller tables. The idea is to allow query engine to scan much smaller tables and indexes to find the needed data. Partitions improve performance on a table, when it's done right. Partitioning a table by range values is commonly used with date fields.
In this blog I will demonstrate how partitioning works in practice.
Assume there is monolithic table my_table.
postgres=# \d my_table
Table "public.my_table"
Column | Type | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('my_table_id_seq'::regclass)
a | character varying(100) | | |
b | timestamp without time zone | | not null |
c | smallint | | |
postgres=#
It has 200234 records.
postgres=# select count(1) from my_table;
count
--------
200234
(1 row)
Let's create a new table of the same structure to be partitioned by range by timestamp column b
create table my_table_partitioned(id serial, a varchar(100), b timestamp not null, c smallint) partition by range(b);
Declare partitions
create table my_table_partitioned_before_1900 partition of my_table_partitioned for values from (MINVALUE) to (timestamp '1900-01-01 00:00:00');
create table my_table_partitioned_1900_1920 partition of my_table_partitioned for values from (timestamp '1900-01-01 00:00:00') to (timestamp '1920-01-01 00:00:00');
create table my_table_partitioned_1920_1940 partition of my_table_partitioned for values from (timestamp '1920-01-01 00:00:00') to (timestamp '1940-01-01 00:00:00');
create table my_table_partitioned_1940_1960 partition of my_table_partitioned for values from (timestamp '1940-01-01 00:00:00') to (timestamp '1960-01-01 00:00:00');
create table my_table_partitioned_1960_1980 partition of my_table_partitioned for values from (timestamp '1960-01-01 00:00:00') to (timestamp '1980-01-01 00:00:00');
create table my_table_partitioned_1980_2000 partition of my_table_partitioned for values from (timestamp '1980-01-01 00:00:00') to (timestamp '2000-01-01 00:00:00');
create table my_table_partitioned_2000_2020 partition of my_table_partitioned for values from (timestamp '2000-01-01 00:00:00') to (timestamp '2020-01-01 00:00:00');
create table my_table_partitioned_2020_2040 partition of my_table_partitioned for values from (timestamp '2020-01-01 00:00:00') to (timestamp '2040-01-01 00:00:00');
create table my_table_partitioned_after_2040 partition of my_table_partitioned for values from (timestamp '2040-01-01 00:00:00') to (MAXVALUE);
Check table definition to ensure how partitions are defined
postgres=# \d+ my_table_partitioned
Partitioned table "public.my_table_partitioned"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------------+-----------+----------+--------------------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('my_table_partitioned_id_seq'::regclass) | plain | |
a | character varying(100) | | | | extended | |
b | timestamp without time zone | | not null | | plain | |
c | smallint | | | | plain | |
Partition key: RANGE (b)
Partitions: my_table_partitioned_1900_1920 FOR VALUES FROM ('1900-01-01 00:00:00') TO ('1920-01-01 00:00:00'),
my_table_partitioned_1920_1940 FOR VALUES FROM ('1920-01-01 00:00:00') TO ('1940-01-01 00:00:00'),
my_table_partitioned_1940_1960 FOR VALUES FROM ('1940-01-01 00:00:00') TO ('1960-01-01 00:00:00'),
my_table_partitioned_1960_1980 FOR VALUES FROM ('1960-01-01 00:00:00') TO ('1980-01-01 00:00:00'),
my_table_partitioned_1980_2000 FOR VALUES FROM ('1980-01-01 00:00:00') TO ('2000-01-01 00:00:00'),
my_table_partitioned_2000_2020 FOR VALUES FROM ('2000-01-01 00:00:00') TO ('2020-01-01 00:00:00'),
my_table_partitioned_2020_2040 FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2040-01-01 00:00:00'),
my_table_partitioned_after_2040 FOR VALUES FROM ('2040-01-01 00:00:00') TO (MAXVALUE),
my_table_partitioned_before_1900 FOR VALUES FROM (MINVALUE) TO ('1900-01-01 00:00:00')
Populate the newly partitioned table with values from the original table.
insert into my_table_partitioned select * from my_table;
postgres=# insert into my_table_partitioned select * from my_table;
INSERT 0 200234
To list the populated table rows together with corresponding partition name run the query:
select tableoid::regclass, * from my_table_partitioned;
For example:
tableoid | id | a | b | c
--------------------------------+--------+------------------------------------------------------------------------------------------------------+----------------------------+-------
my_table_partitioned_1900_1920 | 20836 | spfgyjksbxxsbfyixbhygzqizeiisldfcjssvwdaxdpkrwjsmhwqvozfkcipissbgi | 1915-06-16 12:08:08.978399 | 6590
my_table_partitioned_1900_1920 | 20840 | qvuvlftidrwitheqywzvcvhvmvmkxelkvxyufjfdkvybeyajnmoldqkuwxioyoiykeoibdvnbdbyyaufdqpjlrdgbbkvsy | 1900-11-18 17:56:54.171598 | 8399
my_table_partitioned_1900_1920 | 20846 | spbmrtqrvwtrlhtcdhdcvxfroqatioolpyvyheeeihghfutxwheoumlnrxwgesjhtyljgjivhvwcrwjnvol | 1904-10-15 00:48:24.337989 | 9747
my_table_partitioned_1900_1920 | 20849 | aviijgbcnbksondtxpojfuxtfqpvq | 1910-11-27 04:56:39.963994 | 5884
my_table_partitioned_1900_1920 | 20857 | dvjkgkjlbtxzzmabjtvkgugtexwxlkeqfqqtjrcochgjufrbxnkcjetwvoiqqfsayjfxnkqzcmdiudtpkyijpuxbfez | 1909-03-16 21:01:03.800415 | 7844
my_table_partitioned_1900_1920 | 20858 | vhgrxithjsoivwnpmufkyftnoiwqryoynwgykwrbbo | 1902-08-10 19:55:57.279215 | 9266
my_table_partitioned_1900_1920 | 20862 | tlhhtrkfxwmkaptfkuqpzgcircqbycjqntxgutbhvfhblijvhlgkindwxylbnrcoolthfkomdxyshiopvkrfncnmfjej | 1913-07-13 01:12:05.635982 | 3081
my_table_partitioned_1900_1920 | 20864 | jnhyohxcrckhlwosccwtkjcwgbycvhzrovjutcxwvfdkdyhmfcaqtjmjfbsavfpukcmxnud | 1900-01-12 09:32:13.167057 | 7161
my_table_partitioned_1900_1920 | 20867 | cozekrpmhgbltwfjhkeiqkzqudf | 1900-08-09 09:17:28.052677 | 8761
my_table_partitioned_1900_1920 | 20877 | owfkhqoowgvbjblnawcjdjfeswauzlmmwupynfzotoerqiycvpeqlhpyacizhdpztuansngevvdpowucoxybjrtmt | 1905-01-16 02:23:07.414942 | 4036
my_table_partitioned_1900_1920 | 20879 | pctajauffzkcosxmoqnpsxqprmu | 1910-07-07 02:47:59.318048 | 2058
my_table_partitioned_1900_1920 | 20886 | r | 1910-04-18 01:59:33.513781 | 6626
my_table_partitioned_1900_1920 | 20892 | jasgnjkeygwmtpacddchndmrxsohefthponnjvlsdrzglfc | 1903-04-08 04:38:25.310612 | 4759
my_table_partitioned_1900_1920 | 20898 | yuhedraxgbmew | 1913-10-27 13:58:09.46713 | 2877
Run an example query on the partitioned table and examine the corresponding explain plans.
analyze my_table_partitioned;
ANALYZE
explain (analyze, buffers) select count(1) from my_table_partitioned where b between timestamp '2022-01-01 01:23:45' and '2024-05-06 07:08:09';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=139.24..139.25 rows=1 width=8) (actual time=1.355..1.356 rows=1 loops=1)
Buffers: shared hit=62
-> Seq Scan on my_table_partitioned_2020_2040 my_table_partitioned (cost=0.00..135.20 rows=1615 width=0) (actual time=0.015..1.157 rows=1616 loops=1)
Filter: ((b >= '2022-01-01 01:23:45'::timestamp without time zone) AND (b <= '2024-05-06 07:08:09'::timestamp without time zone))
Rows Removed by Filter: 3264
Buffers: shared hit=62
Planning:
Buffers: shared hit=23
Planning Time: 0.330 ms
Execution Time: 1.389 ms
(10 rows)
Run the same query on the original not partitioned table and examine how its explain plan is different.
postgres=# analyze my_table;
ANALYZE
postgres=# explain (analyze, buffers) select count(1) from my_table where b between timestamp '2022-01-01 01:23:45' and '2024-05-06 07:08:09';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=5307.29..5307.30 rows=1 width=8) (actual time=20.173..23.885 rows=1 loops=1)
Buffers: shared hit=2538
-> Gather (cost=5307.18..5307.29 rows=1 width=8) (actual time=20.042..23.878 rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=2538
-> Partial Aggregate (cost=4307.18..4307.19 rows=1 width=8) (actual time=15.897..15.898 rows=1 loops=2)
Buffers: shared hit=2538
-> Parallel Seq Scan on my_table (cost=0.00..4304.77 rows=964 width=0) (actual time=0.021..15.779 rows=808 loops=2)
Filter: ((b >= '2022-01-01 01:23:45'::timestamp without time zone) AND (b <= '2024-05-06 07:08:09'::timestamp without time zone))
Rows Removed by Filter: 99309
Buffers: shared hit=2538
Planning:
Buffers: shared hit=3
Planning Time: 0.190 ms
Execution Time: 23.931 ms
(16 rows)
From the output we can see the partitioned table has better cost and better execution time.
Conclusion
In this blog I've demonstrated partitioning a table by range in PostgreSQL database.
Top comments (0)