DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

How do you check a distribution of values that PostgreSQL has collected on some column of a table?

To check the distribution of values that PostgreSQL has collected on some columns of a table, you can use the pg_stats system catalog table. This table contains statistics about columns in tables, including information about the distribution of values.

Here’s a query to check the distribution of values for a specific column (your_column) in a table (your_table):

SELECT
 most_common_vals AS most_common_values,
 most_common_freqs AS most_common_frequencies,
 histogram_bounds AS histogram_bounds
FROM
 pg_stats
WHERE
 tablename = 'your_table'
 AND attname = 'your_column';
Enter fullscreen mode Exit fullscreen mode

This query will return three columns:

most_common_values:
Contains an array of the most common values found in the column.

most_common_frequencies:
Contains an array of the frequencies (number of occurrences) of the most common values.

histogram_bounds:
Contains an array of histogram bounds, which represent the ranges of values used to estimate the distribution of values in the column.

By examining these values, you can get an understanding of how the data is distributed in the column and make informed decisions about query optimization and indexing.

For example:

create table some_table (a bigint);

insert into some_table values(-1);
insert into some_table values(-1);
insert into some_table values(-1);
insert into some_table values(10);
insert into some_table values(10);
insert into some_table values(20);
insert into some_table values(30);
insert into some_table values(100);

analyze some_table;

SELECT
 most_common_vals AS most_common_values,
 most_common_freqs AS most_common_frequencies,
 histogram_bounds AS histogram_bounds
FROM
 pg_stats
WHERE
 tablename = 'some_table'
 AND attname = 'a';

Enter fullscreen mode Exit fullscreen mode

Output:
{-1,10} {0.375,0.25} {20,30,100}

The default value for the number of sampled records for a table in PostgreSQL is determined by the default_statistics_target parameter. This parameter specifies the default statistics target to use when analyzing columns of tables.

By default, default_statistics_target is set to 100, meaning that PostgreSQL will sample approximately 100 rows when analyzing a column for statistics. However, this value can be adjusted at the database or session level.

To view the current value of default_statistics_target, you can use the following query:

SHOW default_statistics_target;

Output:
100

OR

select * from pg_settings where name = 'default_statistics_target';

To increase the number of sampled records for a table in PostgreSQL, you can adjust the default_statistics_target parameter.

You can change the statistics target parameter for a specific table using the ALTER TABLE command. Here’s how you can do it:

ALTER TABLE your_table ALTER COLUMN your_column SET STATISTICS 1000;

Replace your_table with the name of your table and your_column with the name of the column for which you want to increase the sampled records. In this example, 1000 is the new value for the statistics target. Adjust this value according to your needs.

Keep in mind that increasing the statistics target can lead to more accurate query plans but may also increase the time and resources required for analyzing the table. It’s recommended to monitor the performance impact of the PostgreSQL DB Server after making such changes.

ask_dima@yahoo.com

Top comments (0)