How to check the distribution of timestamps in a PostgreSQL table?

#postgres #sql #database #devops

There is the PostgreSQL table my_table that has the following structure:

postgres=# \d my_table
                                      Table "public.my_table"
 Column |            Type             | Collation | Nullable |               Default
--------+-----------------------------+-----------+----------+--------------------------------------
 id     | integer                     |           | not null | nextval('my_table_id_seq'::regclass)
 a      | character varying(100)      |           |          |
 b      | timestamp without time zone |           | not null |
 c      | smallint                    |           |          |

The table is populated with some random data:

select * from my_table limit 10;

id   |                                                  a                                                  |             b              |  c
-------+-----------------------------------------------------------------------------------------------------+----------------------------+------
 20835 | lysurmsbdfwrrasnnlmzujwdtuw                                                                         | 1978-09-05 01:45:25.148229 | 8642
 20836 | spfgyjksbxxsbfyixbhygzqizeiisldfcjssvwdaxdpkrwjsmhwqvozfkcipissbgi                                  | 1915-06-16 12:08:08.978399 | 6590
 20837 | kxibatprlkckkotcckqfmmroflbhvakoxipuqwjmc                                                           | 1992-10-11 08:50:10.587357 |  606
 20838 | ifjekeirgkodhqakselytruphlxsnnlblwdyjkhrjxibpftiexqrsdtomjhuuntozifcwahdsrekhfbaecwbvow             | 1995-06-09 15:44:00.125167 | 8600
 20839 | duwdzmvzwkcygyjqokctszlffcnpclojzvweseaibvemghlgentemvboyhszzlmdmkfrugkxkdkowoyeyogercckpygfuukluth | 2009-06-22 07:26:56.044023 | 3569
 20840 | qvuvlftidrwitheqywzvcvhvmvmkxelkvxyufjfdkvybeyajnmoldqkuwxioyoiykeoibdvnbdbyyaufdqpjlrdgbbkvsy      | 1900-11-18 17:56:54.171598 | 8399
 20841 | bpxdytnu                                                                                            | 1981-11-06 08:09:35.857618 | 9508
 20842 | phwsewlpmerayuovgakjtbzflggeqqqxsqetxufuoe                                                          | 1933-12-02 02:50:20.117185 | 5099
 20843 | nwxjdoksxwsvkjmpfyvayvwqwckyeyqxrlagn                                                               | 1999-11-12 01:56:10.847324 | 7316
 20844 | ksjvvwpisjfpsapwccvdpcfgchjyrhwqof                                                                  | 1958-08-06 07:56:47.420669 | 5673
(10 rows)

Let’s check the distribution of the timestamp column data.

Distribution records in the table by year; let’s find the top years with the table my_table records based on the timestamp column:

select 
    (extract('year' from b)) the_year, 
    count(1) num_of_recs
from 
    my_table 
group by 1 
order by 2 desc, 1;

For example:

the_year | num_of_recs
----------+-------------
     1977 |         205
     2018 |         195
     1929 |         191
     1913 |         188
     1935 |         185
     1969 |         185
     1997 |         185
     1905 |         184
     1964 |         184
     1910 |         183
     1966 |         183
     ...

Percentage of the total records by year in the table my_table based on the timestamp column:

select 
    to_char(b, 'YYYY') yyyy,
    count(1) num_of_recs,
    100 * round((count(1) / (sum(count(1)) over() )), 5)  percentage
from 
    my_table
group by 1 
order by 3 desc, 1;

For example:

yyyy | num_of_recs | percentage
------+-------------+------------
 1977 |         205 |    1.01300
 2018 |         195 |    0.96400
 1929 |         191 |    0.94400
 1913 |         188 |    0.92900
 1935 |         185 |    0.91400
 1969 |         185 |    0.91400
 1997 |         185 |    0.91400
 1905 |         184 |    0.90900
 1964 |         184 |    0.90900
 1910 |         183 |    0.90400
 1966 |         183 |    0.90400
 ...

Distribution records in the table by (year-month); let’s find the top (year-month)-es with the table my_table records based on the timestamp column:

select 
    to_char(b, 'YYYY-MM') the_year_and_month, 
    count(1) num_of_recs
from 
    my_table 
group by 1 
order by 2 desc, 1;

For example:

the_year_and_month | num_of_recs
--------------------+-------------
 1919-08            |          25
 2000-10            |          25
 2004-08            |          25
 1917-06            |          24
 1933-07            |          24
 1949-06            |          24
 1959-08            |          24
 1977-03            |          24
 1977-05            |          24
 1980-08            |          24
 1989-02            |          24
 1997-03            |          24
 1998-11            |          24
 2010-05            |          24
 2020-12            |          24
 1901-10            |          23
 ...

Percentage of the total records by (year-month) in the table my_table based on the timestamp column:

select 
    to_char(b, 'YYYY-MM') yyyy_mm,
    count(1) num_of_recs,
    100 * round((count(1) / (sum(count(1)) over() )), 5)  percentage
from 
    my_table
group by 1 
order by 3 desc, 1;

For example:

yyyy_mm | num_of_recs | percentage
---------+-------------+------------
 1919-08 |          25 |    0.12400
 2000-10 |          25 |    0.12400
 2004-08 |          25 |    0.12400
 1917-06 |          24 |    0.11900
 1933-07 |          24 |    0.11900
 1949-06 |          24 |    0.11900
 1959-08 |          24 |    0.11900
 1977-03 |          24 |    0.11900
 1977-05 |          24 |    0.11900
 1980-08 |          24 |    0.11900
 1989-02 |          24 |    0.11900
 1997-03 |          24 |    0.11900
 1998-11 |          24 |    0.11900
 2010-05 |          24 |    0.11900
 2020-12 |          24 |    0.11900
 1901-10 |          23 |    0.11400
 1903-09 |          23 |    0.11400
 ...

Distribution records in the table by (year-month-day); let’s find the top (year-month-day)s with the table my_table records based on the timestamp column:

select 
    to_char(b, 'YYYY-MM-DD') yyyy_mm_dd, 
    count(1) num_of_recs
from 
    my_table 
group by 1 
order by 2 desc, 1;

For example:

yyyy_mm_dd | num_of_recs
------------+-------------
 1914-10-05 |           5
 1929-01-09 |           5
 1904-01-10 |           4
 1906-04-06 |           4
 1906-09-01 |           4
 1907-02-26 |           4
 1907-10-16 |           4
 1911-06-02 |           4
 1918-06-13 |           4
 1924-07-30 |           4
 1925-06-02 |           4
 1926-03-31 |           4
 1927-02-04 |           4
 1928-08-17 |           4
 1931-01-03 |           4
 1931-04-07 |           4
 1932-03-20 |           4

Conclusion

In this blog, I’ve demonstrated a few approaches how to check the distribution of timestamps in a PostgreSQL table.

ask_dima@yahoo.com