DEV Community

loading...

Counting large tables in PostgreSQL

André
Originally published at andreligne.io on ・1 min read

Imagine you have a really large table (> 100m rows). How do you figure out how many rows there are in that table within a reasonable time?

If you try and do a SELECT COUNT(id) FROM my_large_table, it will end up taking a lot of time since the database will have to scan throw all the rows to count them.

$ heroku pg:ps
 pid | state | source | running_for | transaction_start | waiting | query
------+--------+------------------+-----------------+-------------------------------+---------+--------------------------------------------------
 6926 | active | psql interactive | 00:10:12.549096 | 2020-10-01 08:22:18.090989+00 | t | SELECT COUNT(id) FROM pageviews;
(1 row)
I gave up after waiting for 10 minutes...

A faster way is to get an approximate value by looking at statistics from the catalog table pg_catalog.pg_class.

DATABASE=> SELECT reltuples::bigint FROM pg_catalog.pg_class WHERE relname = 'pageviews';
 reltuples
-----------
 136032896
(1 row)

This took less than a millisecond, compared to the 9 minutes that we waited for the COUNT query to finish. It may not be as accurate, but it's enough useful to set expectations for adding new indices to the table or doing any large operations on it.

Discussion (0)