DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

Partitioning a table by range in PostgreSQL database.

Partitioning is important technique to ensure good performance for databases with large tables. When a table size grows over time each operation cost on the table will increase as well. Partitioning divide a large table into smaller tables. The idea is to allow query engine to scan much smaller tables and indexes to find the needed data. Partitions improve performance on a table, when it's done right. Partitioning a table by range values is commonly used with date fields.

In this blog I will demonstrate how partitioning works in practice.

Assume there is monolithic table my_table.

postgres=# \d my_table
                                      Table "public.my_table"
 Column |            Type             | Collation | Nullable |               Default
--------+-----------------------------+-----------+----------+--------------------------------------
 id     | integer                     |           | not null | nextval('my_table_id_seq'::regclass)
 a      | character varying(100)      |           |          |
 b      | timestamp without time zone |           | not null |
 c      | smallint                    |           |          |

postgres=#
Enter fullscreen mode Exit fullscreen mode

It has 200234 records.

postgres=# select count(1) from my_table;
 count
--------
 200234
(1 row)
Enter fullscreen mode Exit fullscreen mode

Let's create a new table of the same structure to be partitioned by range by timestamp column b

create table my_table_partitioned(id serial, a varchar(100), b timestamp not null, c smallint) partition by range(b);
Enter fullscreen mode Exit fullscreen mode

Declare partitions

create table my_table_partitioned_before_1900 partition of my_table_partitioned for values from (MINVALUE) to (timestamp '1900-01-01 00:00:00');
create table my_table_partitioned_1900_1920 partition of my_table_partitioned for values from (timestamp '1900-01-01 00:00:00') to (timestamp '1920-01-01 00:00:00');
create table my_table_partitioned_1920_1940 partition of my_table_partitioned for values from (timestamp '1920-01-01 00:00:00') to (timestamp '1940-01-01 00:00:00');
create table my_table_partitioned_1940_1960 partition of my_table_partitioned for values from (timestamp '1940-01-01 00:00:00') to (timestamp '1960-01-01 00:00:00');
create table my_table_partitioned_1960_1980 partition of my_table_partitioned for values from (timestamp '1960-01-01 00:00:00') to (timestamp '1980-01-01 00:00:00');
create table my_table_partitioned_1980_2000 partition of my_table_partitioned for values from (timestamp '1980-01-01 00:00:00') to (timestamp '2000-01-01 00:00:00');
create table my_table_partitioned_2000_2020 partition of my_table_partitioned for values from (timestamp '2000-01-01 00:00:00') to (timestamp '2020-01-01 00:00:00');
create table my_table_partitioned_2020_2040 partition of my_table_partitioned for values from (timestamp '2020-01-01 00:00:00') to (timestamp '2040-01-01 00:00:00');
create table my_table_partitioned_after_2040 partition of my_table_partitioned for values from (timestamp '2040-01-01 00:00:00') to (MAXVALUE);
Enter fullscreen mode Exit fullscreen mode

Check table definition to ensure how partitions are defined

postgres=# \d+ my_table_partitioned

                                                    Partitioned table "public.my_table_partitioned"
 Column |            Type             | Collation | Nullable |                     Default                      | Storage  | Stats target | Description
--------+-----------------------------+-----------+----------+--------------------------------------------------+----------+--------------+-------------
 id     | integer                     |           | not null | nextval('my_table_partitioned_id_seq'::regclass) | plain    |              |
 a      | character varying(100)      |           |          |                                                  | extended |              |
 b      | timestamp without time zone |           | not null |                                                  | plain    |              |
 c      | smallint                    |           |          |                                                  | plain    |              |
Partition key: RANGE (b)
Partitions: my_table_partitioned_1900_1920 FOR VALUES FROM ('1900-01-01 00:00:00') TO ('1920-01-01 00:00:00'),
            my_table_partitioned_1920_1940 FOR VALUES FROM ('1920-01-01 00:00:00') TO ('1940-01-01 00:00:00'),
            my_table_partitioned_1940_1960 FOR VALUES FROM ('1940-01-01 00:00:00') TO ('1960-01-01 00:00:00'),
            my_table_partitioned_1960_1980 FOR VALUES FROM ('1960-01-01 00:00:00') TO ('1980-01-01 00:00:00'),
            my_table_partitioned_1980_2000 FOR VALUES FROM ('1980-01-01 00:00:00') TO ('2000-01-01 00:00:00'),
            my_table_partitioned_2000_2020 FOR VALUES FROM ('2000-01-01 00:00:00') TO ('2020-01-01 00:00:00'),
            my_table_partitioned_2020_2040 FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2040-01-01 00:00:00'),
            my_table_partitioned_after_2040 FOR VALUES FROM ('2040-01-01 00:00:00') TO (MAXVALUE),
            my_table_partitioned_before_1900 FOR VALUES FROM (MINVALUE) TO ('1900-01-01 00:00:00')
Enter fullscreen mode Exit fullscreen mode

Populate the newly partitioned table with values from the original table.

insert into my_table_partitioned select * from my_table;

postgres=# insert into my_table_partitioned select * from my_table;
INSERT 0 200234
Enter fullscreen mode Exit fullscreen mode

To list the populated table rows together with corresponding partition name run the query:

select tableoid::regclass, * from my_table_partitioned;
Enter fullscreen mode Exit fullscreen mode

For example:

            tableoid            |   id   |                                                  a                                                   |             b              |   c
--------------------------------+--------+------------------------------------------------------------------------------------------------------+----------------------------+-------
 my_table_partitioned_1900_1920 |  20836 | spfgyjksbxxsbfyixbhygzqizeiisldfcjssvwdaxdpkrwjsmhwqvozfkcipissbgi                                   | 1915-06-16 12:08:08.978399 |  6590
 my_table_partitioned_1900_1920 |  20840 | qvuvlftidrwitheqywzvcvhvmvmkxelkvxyufjfdkvybeyajnmoldqkuwxioyoiykeoibdvnbdbyyaufdqpjlrdgbbkvsy       | 1900-11-18 17:56:54.171598 |  8399
 my_table_partitioned_1900_1920 |  20846 | spbmrtqrvwtrlhtcdhdcvxfroqatioolpyvyheeeihghfutxwheoumlnrxwgesjhtyljgjivhvwcrwjnvol                  | 1904-10-15 00:48:24.337989 |  9747
 my_table_partitioned_1900_1920 |  20849 | aviijgbcnbksondtxpojfuxtfqpvq                                                                        | 1910-11-27 04:56:39.963994 |  5884
 my_table_partitioned_1900_1920 |  20857 | dvjkgkjlbtxzzmabjtvkgugtexwxlkeqfqqtjrcochgjufrbxnkcjetwvoiqqfsayjfxnkqzcmdiudtpkyijpuxbfez          | 1909-03-16 21:01:03.800415 |  7844
 my_table_partitioned_1900_1920 |  20858 | vhgrxithjsoivwnpmufkyftnoiwqryoynwgykwrbbo                                                           | 1902-08-10 19:55:57.279215 |  9266
 my_table_partitioned_1900_1920 |  20862 | tlhhtrkfxwmkaptfkuqpzgcircqbycjqntxgutbhvfhblijvhlgkindwxylbnrcoolthfkomdxyshiopvkrfncnmfjej         | 1913-07-13 01:12:05.635982 |  3081
 my_table_partitioned_1900_1920 |  20864 | jnhyohxcrckhlwosccwtkjcwgbycvhzrovjutcxwvfdkdyhmfcaqtjmjfbsavfpukcmxnud                              | 1900-01-12 09:32:13.167057 |  7161
 my_table_partitioned_1900_1920 |  20867 | cozekrpmhgbltwfjhkeiqkzqudf                                                                          | 1900-08-09 09:17:28.052677 |  8761
 my_table_partitioned_1900_1920 |  20877 | owfkhqoowgvbjblnawcjdjfeswauzlmmwupynfzotoerqiycvpeqlhpyacizhdpztuansngevvdpowucoxybjrtmt            | 1905-01-16 02:23:07.414942 |  4036
 my_table_partitioned_1900_1920 |  20879 | pctajauffzkcosxmoqnpsxqprmu                                                                          | 1910-07-07 02:47:59.318048 |  2058
 my_table_partitioned_1900_1920 |  20886 | r                                                                                                    | 1910-04-18 01:59:33.513781 |  6626
 my_table_partitioned_1900_1920 |  20892 | jasgnjkeygwmtpacddchndmrxsohefthponnjvlsdrzglfc                                                      | 1903-04-08 04:38:25.310612 |  4759
 my_table_partitioned_1900_1920 |  20898 | yuhedraxgbmew                                                                                        | 1913-10-27 13:58:09.46713  |  2877
Enter fullscreen mode Exit fullscreen mode

Run an example query on the partitioned table and examine the corresponding explain plans.

analyze my_table_partitioned;
ANALYZE

explain (analyze, buffers) select count(1) from my_table_partitioned where b between timestamp '2022-01-01 01:23:45' and '2024-05-06 07:08:09';

                                                                        QUERY PLAN                                             
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=139.24..139.25 rows=1 width=8) (actual time=1.355..1.356 rows=1 loops=1)
   Buffers: shared hit=62
   ->  Seq Scan on my_table_partitioned_2020_2040 my_table_partitioned  (cost=0.00..135.20 rows=1615 width=0) (actual time=0.015..1.157 rows=1616 loops=1)
         Filter: ((b >= '2022-01-01 01:23:45'::timestamp without time zone) AND (b <= '2024-05-06 07:08:09'::timestamp without time zone))
         Rows Removed by Filter: 3264
         Buffers: shared hit=62
 Planning:
   Buffers: shared hit=23
 Planning Time: 0.330 ms
 Execution Time: 1.389 ms
(10 rows)

Enter fullscreen mode Exit fullscreen mode

Run the same query on the original not partitioned table and examine how its explain plan is different.

postgres=# analyze my_table;
ANALYZE

postgres=# explain (analyze, buffers) select count(1) from my_table where b between timestamp '2022-01-01 01:23:45' and '2024-05-06 07:08:09';

                                                                      QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5307.29..5307.30 rows=1 width=8) (actual time=20.173..23.885 rows=1 loops=1)
   Buffers: shared hit=2538
   ->  Gather  (cost=5307.18..5307.29 rows=1 width=8) (actual time=20.042..23.878 rows=2 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         Buffers: shared hit=2538
         ->  Partial Aggregate  (cost=4307.18..4307.19 rows=1 width=8) (actual time=15.897..15.898 rows=1 loops=2)
               Buffers: shared hit=2538
               ->  Parallel Seq Scan on my_table  (cost=0.00..4304.77 rows=964 width=0) (actual time=0.021..15.779 rows=808 loops=2)
                     Filter: ((b >= '2022-01-01 01:23:45'::timestamp without time zone) AND (b <= '2024-05-06 07:08:09'::timestamp without time zone))
                     Rows Removed by Filter: 99309
                     Buffers: shared hit=2538
 Planning:
   Buffers: shared hit=3
 Planning Time: 0.190 ms
 Execution Time: 23.931 ms
(16 rows)
Enter fullscreen mode Exit fullscreen mode

From the output we can see the partitioned table has better cost and better execution time.

Conclusion

In this blog I've demonstrated partitioning a table by range in PostgreSQL database.

Top comments (0)