Franck Pachot for AWS Heroes

Posted on Mar 1

PostgreSQL global statistics on partitionned table require a manual ANALYZE

#postgres #sql #database #partitioning

PostgreSQL auto-analyze collects statistics on tables with rows. For partitioned tables, it excludes the parent as it has no rows by itself. So how does the query planner estimate cardinality when a query spans multiple partitions?

Some statistics are easy to derive: if it knows each partition’s row count, the global count is the total. Column statistics are trickier, especially with the number of distinct values, a key factor to estimate cardinalities with predicates or aggregates. Even with the number of distinct values per partition, it still doesn’t know how much those values overlap across partitions. The global distinct count therefore lies between the maximum per-partition distinct count and the sum of all per-partition counts.

Here is an example:


create table history (
   year int,
   num serial,
   x   int,
   y   int,
   primary key (year, num)
) partition by list (year)
;

create table history_2024 partition of history for values in (2024);
create table history_2025 partition of history for values in (2025);
create table history_2026 partition of history for values in (2026);
create table history_2027 partition of history for values in (2027);

insert into history select 
 extract(year from ( date '2026-01-02' - interval '1 minute' * num ))::int as year
 ,num             -- NDV ≈ rows (unique key, density ≈ 1 / rows)
 ,(num % 2) as x  -- NDV = 2 per partition (very high density: ~50% per value)  
 ,(num / 2) as y  -- NDV ≈ rows / 2 per partition (moderate density: ~2 rows per distinct value) 
 from generate_series(1,1e6) num
;

Here is the real data distribution:

select count(*), year
  , count(distinct x) as "distinct x"
  , min(x) as "min x"
  , max(x) as "max x"
  , count(*)::float / nullif(count(distinct x), 0) as density_x
  , count(distinct y) as "distinct y"
  , min(y) as "min y"
  , max(y) as "max y"
  , count(*)::float / nullif(count(distinct y), 0) as density_y
 from history group by grouping sets ((),year)
;

  count  | year | distinct x | min x | max x | density_x | distinct y | min y  | max y  | density_y
---------+------+------------+-------+-------+-----------+------------+--------+--------+-----------
  472960 | 2024 |          2 |     0 |     1 |    236480 |     236481 | 263520 | 500000 | 1.999991
  525600 | 2025 |          2 |     0 |     1 |    262800 |     262801 |    720 | 263520 | 1.999992
    1440 | 2026 |          2 |     0 |     1 |       720 |        721 |      0 |    720 | 1.997226
 1000000 |      |          2 |     0 |     1 |    500000 |     500001 |      0 | 500000 | 1.999996

(4 rows)

Immediately after the insert, there's no statistics:

select relname, relpages, reltuples, relkind, relhassubclass, relispopulated, relispartition  
 from pg_class where relname like 'history%' order by relkind desc, relname
;

      relname      | relpages | reltuples | relkind | relhassubclass | relispopulated | relispartition
-------------------+----------+-----------+---------+----------------+----------------+----------------
 history_2024      |        0 |        -1 | r       | f              | t              | t
 history_2025      |        0 |        -1 | r       | f              | t              | t
 history_2026      |        0 |        -1 | r       | f              | t              | t
 history_2027      |        0 |        -1 | r       | f              | t              | t
 history           |        0 |        -1 | p       | t              | t              | f
 history_2024_pkey |        1 |         0 | i       | f              | t              | t
 history_2025_pkey |        1 |         0 | i       | f              | t              | t
 history_2026_pkey |        1 |         0 | i       | f              | t              | t
 history_2027_pkey |        1 |         0 | i       | f              | t              | t
 history_num_seq   |        1 |         1 | S       | f              | t              | f
 history_pkey      |        0 |         0 | I       | t              | t              | f
(11 rows)

After a while, autovacuum’s auto-analyze has gathered statistics for the partitions:

select relname, relpages, reltuples, relkind, relhassubclass, relispopulated, relispartition  
 from pg_class where relname like 'history%' order by relkind desc, relname
;

      relname      | relpages | reltuples | relkind | relhassubclass | relispopulated | relispartition
-------------------+----------+-----------+---------+----------------+----------------+----------------
 history_2024      |     2557 |    472960 | r       | f              | t              | t
 history_2025      |     2842 |    525600 | r       | f              | t              | t
 history_2026      |        8 |      1440 | r       | f              | t              | t
 history_2027      |        0 |        -1 | r       | f              | t              | t
 history           |        0 |        -1 | p       | t              | t              | f
 history_2024_pkey |     1300 |    472960 | i       | f              | t              | t
 history_2025_pkey |     1443 |    525600 | i       | f              | t              | t
 history_2026_pkey |        6 |      1440 | i       | f              | t              | t
 history_2027_pkey |        1 |         0 | i       | f              | t              | t
 history_num_seq   |        1 |         1 | S       | f              | t              | f
 history_pkey      |        0 |         0 | I       | t              | t              | f
(11 rows)

However, there are still no global statistics (the "history" table itself). The available column statistics only cover partitions and include the number of distinct values for the two columns:

select tablename, attname, n_distinct, null_frac
 from pg_stats where tablename like 'history%'
;
  tablename   | attname | n_distinct  | null_frac
--------------+---------+-------------+-----------
 history_2025 | year    |           1 |         0
 history_2025 | num     |          -1 |         0
 history_2025 | x       |           2 |         0
 history_2025 | y       | -0.49914953 |         0
 history_2024 | year    |           1 |         0
 history_2024 | num     |          -1 |         0
 history_2024 | x       |           2 |         0
 history_2024 | y       |  -0.4907878 |         0
 history_2026 | year    |           1 |         0
 history_2026 | num     |          -1 |         0
 history_2026 | x       |           2 |         0
 history_2026 | y       | -0.50069445 |         0
(12 rows)

For queries spanning multiple partitions, the optimizer gets good cardinality estimates per partition. However, SELECT DISTINCT over multiple partitions cannot be estimated from the statistics.

For column x it estimates to rows=200 instead of rows=2.00:

explain (analyze, buffers off, summary off)
 select distinct x from history
;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22949.38..22951.38 rows=200 width=4) (actual time=263.739..263.742 rows=2.00 loops=1)
   Group Key: history.x
   Batches: 1  Memory Usage: 32kB
   ->  Append  (cost=0.00..20444.75 rows=1001850 width=4) (actual time=0.010..149.330 rows=1000000.00 loops=1)
         ->  Seq Scan on history_2024 history_1  (cost=0.00..7286.60 rows=472960 width=4) (actual time=0.009..37.586 rows=472960.00 loops=1)
         ->  Seq Scan on history_2025 history_2  (cost=0.00..8098.00 rows=525600 width=4) (actual time=0.016..41.565 rows=525600.00 loops=1)
         ->  Seq Scan on history_2026 history_3  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.020..0.132 rows=1440.00 loops=1)
         ->  Seq Scan on history_2027 history_4  (cost=0.00..28.50 rows=1850 width=4) (actual time=0.002..0.002 rows=0.00 loops=1)
(8 rows)

For column y it estimates to rows=200 instead of rows=500000.00:

postgres=# explain (analyze, buffers off, summary off)
 select distinct y from history
;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22949.38..22951.38 rows=200 width=4) (actual time=341.476..527.623 rows=500000.00 loops=1)
   Group Key: history.y
   Batches: 5  Memory Usage: 8265kB  Disk Usage: 16048kB
   ->  Append  (cost=0.00..20444.75 rows=1001850 width=4) (actual time=0.009..151.806 rows=1000000.00 loops=1)
         ->  Seq Scan on history_2024 history_1  (cost=0.00..7286.60 rows=472960 width=4) (actual time=0.009..38.840 rows=472960.00 loops=1)
         ->  Seq Scan on history_2025 history_2  (cost=0.00..8098.00 rows=525600 width=4) (actual time=0.011..43.265 rows=525600.00 loops=1)
         ->  Seq Scan on history_2026 history_3  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.011..0.131 rows=1440.00 loops=1)
         ->  Seq Scan on history_2027 history_4  (cost=0.00..28.50 rows=1850 width=4) (actual time=0.003..0.003 rows=0.00 loops=1)
(8 rows)

Here, the misestimate isn’t an issue by itself, but it could become catastrophic if the query needed to join other tables. Even though the two columns have different value distributions, PostgreSQL estimated both as rows=200. When no statistics are available, PostgreSQL falls back to generic defaults. It assumes a small, fixed number of distinct values - even when there are more in one partition.

Autovacuum’s auto-analyze collects statistics for partitions but not for the partitioned parent. To gather global statistics on the parent, you must run analyze manually: ANALYZE ONLY analyzes just the parent (no recursion), while ANALYZE also recursively analyzes each partition:


ANALYZE history
;

select relname, relpages, reltuples, relkind, relhassubclass, relispopulated, relispartition  
 from pg_class where relname like 'history%' order by relkind desc, relname
;

      relname      | relpages | reltuples | relkind | relhassubclass | relispopulated | relispartition
-------------------+----------+-----------+---------+----------------+----------------+----------------
 history_2024      |     2557 |    472960 | r       | f              | t              | t
 history_2025      |     2842 |    525600 | r       | f              | t              | t
 history_2026      |        8 |      1440 | r       | f              | t              | t
 history_2027      |        0 |        -1 | r       | f              | t              | t
 history           |       -1 |     1e+06 | p       | t              | t              | f
 history_2024_pkey |     1300 |    472960 | i       | f              | t              | t
 history_2025_pkey |     1443 |    525600 | i       | f              | t              | t
 history_2026_pkey |        6 |      1440 | i       | f              | t              | t
 history_2027_pkey |        1 |         0 | i       | f              | t              | t
 history_num_seq   |        1 |         1 | S       | f              | t              | f
 history_pkey      |        0 |         0 | I       | t              | t              | f
(11 rows)

select tablename, attname, n_distinct, null_frac
 from pg_stats where tablename like 'history%'
;

  tablename   | attname | n_distinct  | null_frac
--------------+---------+-------------+-----------
 history      | year    |           3 |         0
 history      | num     |          -1 |         0
 history      | x       |           2 |         0
 history      | y       |        -0.5 |         0
 history_2025 | year    |           1 |         0
 history_2025 | num     |          -1 |         0
 history_2025 | x       |           2 |         0
 history_2025 | y       | -0.49914953 |         0
 history_2024 | year    |           1 |         0
 history_2024 | num     |          -1 |         0
 history_2024 | x       |           2 |         0
 history_2024 | y       |  -0.4907878 |         0
 history_2026 | year    |           1 |         0
 history_2026 | num     |          -1 |         0
 history_2026 | x       |           2 |         0
 history_2026 | y       | -0.50069445 |         0
(16 rows)

Now the query planner knows there are one million rows in total (1e+06 in reltuples), with 2 distinct values for x (2 in n_distinct) and 50% distinct values for y (-0.5 in n_distinct, the negative sign is used to mean a relative number that should scale with the table size).

Now the execution plans on the same queries get better estimates:

postgres=# explain (analyze, buffers off, summary off) select distinct x from history;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22907.01..22907.03 rows=2 width=4) (actual time=265.727..265.729 rows=2.00 loops=1)
   Group Key: history.x
   Batches: 1  Memory Usage: 32kB
   ->  Append  (cost=0.00..20407.01 rows=1000001 width=4) (actual time=0.009..150.276 rows=1000000.00 loops=1)
         ->  Seq Scan on history_2024 history_1  (cost=0.00..7286.60 rows=472960 width=4) (actual time=0.008..37.620 rows=472960.00 loops=1)
         ->  Seq Scan on history_2025 history_2  (cost=0.00..8098.00 rows=525600 width=4) (actual time=0.015..41.585 rows=525600.00 loops=1)
         ->  Seq Scan on history_2026 history_3  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.018..0.132 rows=1440.00 loops=1)
         ->  Seq Scan on history_2027 history_4  (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.003 rows=0.00 loops=1)
(8 rows)

postgres=# explain (analyze, buffers off, summary off) select distinct y from history;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=74157.06..87067.10 rows=509753 width=4) (actual time=344.505..532.590 rows=500000.00 loops=1)
   Group Key: history.y
   Planned Partitions: 4  Batches: 5  Memory Usage: 8249kB  Disk Usage: 16048kB
   ->  Append  (cost=0.00..20407.01 rows=1000001 width=4) (actual time=0.009..153.913 rows=1000000.00 loops=1)
         ->  Seq Scan on history_2024 history_1  (cost=0.00..7286.60 rows=472960 width=4) (actual time=0.008..39.108 rows=472960.00 loops=1)
         ->  Seq Scan on history_2025 history_2  (cost=0.00..8098.00 rows=525600 width=4) (actual time=0.011..43.762 rows=525600.00 loops=1)
         ->  Seq Scan on history_2026 history_3  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.011..0.130 rows=1440.00 loops=1)
         ->  Seq Scan on history_2027 history_4  (cost=0.00..0.00 rows=1 width=4) (actual time=0.003..0.003 rows=0.00 loops=1)
(8 rows)

The estimation for x is exact (rows=2) and the one for y is close to the actual number (rows=509753). The reason for the approximate number is because of sampling as analyze do not real the full table.

Global statistics are not collected automatically because auto-analyze triggers only after a modification threshold relative to total rows is reached, and the partitioned table itself is empty. Still, you should maintain statistics.

Another option, when the global cardinality is known and stable, is to explicitly override the inherited estimate in PostgreSQL:

ALTER TABLE history ALTER COLUMN x SET (n_distinct_inherited = 2);
ALTER TABLE history ALTER COLUMN y SET (n_distinct_inherited = 500000);
ANALYZE ONLY HISTORY;

This tells the planner the exact global number of distinct values for columns "x" and "y" when planning queries on the partitioned table. It replaces a probabilistic estimate with deterministic knowledge and is particularly useful for large DISTINCT, GROUP BY, and join operations.

When the number of distinct values increases with new rows it is preferrable to set them with a relative factor, set with a negative number:

ALTER TABLE history ALTER COLUMN y SET (n_distinct_inherited = -0.5);
ANALYZE ONLY HISTORY;

In summary, PostgreSQL estimates distinct values via the following rules:

if uniqueness is proven by the query (unique index, DISTINCT, GROUP BY), it uses the number of non‑NULL rows
otherwise pg_statistic is used: stadistinct > 0 is an absolute count, stadistinct < 0 is relative to the total number of tuples (from pg_class)
if no statistics exist, it applies some hard‑coded rules, for example 2 for booleans
finally, it fall back to 200

Autovacuum skips partitioned tables entirely. It only processes partitions. When you ANALYZE a partitioned table, PostgreSQL skips the non-recursive scan for the parent (since it stores no rows) and performs a recursive inherited scan that samples rows from all partitions to compute inheritance statistics used by the planner. The planner uses the parent’s pg_statistic entries (built from the inherited sample) for estimates when planning queries that treat the partitioned table as a single entity. For distinct values, ANALYZE cannot know overlap across partitions, so the estimate is approximate. You can set a per-column n_distinct_inherited value to provide a better global distinct estimate.

DEV Community

PostgreSQL global statistics on partitionned table require a manual ANALYZE

Top comments (0)