Counting large tables in PostgreSQL

#postgres #todayilearned

Imagine you have a really large table (> 100m rows). How do you figure out how many rows there are in that table within a reasonable time?

If you try and do a SELECT COUNT(id) FROM my_large_table, it will end up taking a lot of time since the database will have to scan throw all the rows to count them.

$ heroku pg:ps
 pid | state | source | running_for | transaction_start | waiting | query
------+--------+------------------+-----------------+-------------------------------+---------+--------------------------------------------------
 6926 | active | psql interactive | 00:10:12.549096 | 2020-10-01 08:22:18.090989+00 | t | SELECT COUNT(id) FROM pageviews;
(1 row)

I gave up after waiting for 10 minutes...

A faster way is to get an approximate value by looking at statistics from the catalog table pg_catalog.pg_class.

DATABASE=> SELECT reltuples::bigint FROM pg_catalog.pg_class WHERE relname = 'pageviews';
 reltuples
-----------
 136032896
(1 row)

This took less than a millisecond, compared to the 9 minutes that we waited for the COUNT query to finish. It may not be as accurate, but it's enough useful to set expectations for adding new indices to the table or doing any large operations on it.

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

DEV Community

Counting large tables in PostgreSQL

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Top comments (0)

Read next

New AI Method Makes Language Models Smarter Through Adversarial Context Training

Study Shows Spatial Label Noise Severely Impacts AI Object Detection Performance

Breakthrough Study Reveals How AI Speech Models Scale Across Multiple Languages

New AI Method Captures Complex Group Relationships in Networks, Boosting Accuracy by 20%