when you run a query on a large table in postgres, the performance difference is rarely about the query syntax, it’s about how postgres chooses to access the data underneath to understand this better, i tested on my local machine the same query on a table with 5 million rows under different execution strategies and the results were drastically different
What Really Happens When PostgreSQL Scans Your Entire Table
when you do the query
select * from table_name where id = 17
postgres can do a full sequential scan if there is no usable index or if it thinks scanning everything is cheaper. so what do you mean by full sequential scan ?
well you see the data of the table is stored in form of pages, which are 8kb by default in postgres, and these pages are stored in the heap. heap here just means the rows are not stored in any sorted order.
so when it runs the sequential scan it goes page by page through the whole heap and checks every row to find the one you asked for
How PostgreSQL Avoids Scanning Everything With Indexes
but this doesn't happens mostly because the primary key are always indexed, postgres automatically creates an index on primary key. well what is an index ? well as i said the data is stored in heap but for faster retrieval we make indexes, indexes are key value pair arranged in a B+ Tree, where keys are sorted and each key points to the exact location of the row in the heap using a tuple id
so the next time you run query on a indexed value it goes into the b+ tree, finds that id quickly using binary search like traversal, and then jumps directly to the heap location instead of scanning everything. this is called an index scan
Index Only Scan: Eliminating Heap Access Entirely
now here’s where things get interesting....even with an index scan, postgres still has to go to the heap to fetch the actual row data after finding the pointer in the index, but what if the query can be answered entirely from the index itself?
this is where index only scan comes in, in an index only scan, postgres does not jump to the heap at all, it reads everything directly from the index structure this removes one of the most expensive parts of the lookup which is random heap access
however, this is only possible when all required columns are present in the index (covering index) and postgres can safely skip heap visibility checks (via mvcc visibility map)
just a way to eliminate heap access when the conditions allow it, also depending on the workload, index only scan can be faster than index scan but not always, especially for single-row lookups
Let’s See How PostgreSQL Actually Executes These
now enough theory, lets actually see this in action with some real benchmarks
Benchmark 1: Forced Sequential Scan
i created a table with around 5 million rows
CREATE TABLE users (
id serial primary key,
name text,
email text,
age int,
city text
);
and ran the same query under different conditions to see how postgres behaves internally
first i forced postgres to not use any index so that it is forced to do a full scan
SET enable_indexscan = OFF;
SET enable_bitmapscan = OFF;
EXPLAIN ANALYZE
SELECT * FROM users WHERE id = 4000000;
this resulted in a parallel sequential scan where postgres scanned almost the entire table
execution time was around ~277 ms
you can also notice it used multiple workers because my machince can afford it, so even when doing a bad operation postgres tries to optimize it using parallelism, but still it had to go through millions of rows just to find one
now enabling index scan back
Benchmark 2: Index Scan
SET enable_indexscan = ON;
SET enable_bitmapscan = ON;
EXPLAIN ANALYZE
SELECT * FROM users WHERE id = 4000000;
this time postgres used an index scan on the primary key
execution time dropped to around ~0.05 ms
this is a massive difference because instead of scanning the whole table, it directly used the b+ tree to locate the exact row and then fetched it from the heap
now taking it one step further with index only scan
Benchmark 3: Index Only Scan
i created a covering index so that all required columns are already present inside the index
CREATE INDEX idx_users_covering
ON users (id) INCLUDE (name, email, age, city);
then ran:
VACUUM ANALYZE users; -- enables index-only scan by marking pages as all-visible
EXPLAIN ANALYZE
SELECT id, name, email, age, city
FROM users
WHERE id = 4000000;
this resulted in an index only scan
execution time was around ~0.07 ms
and most importantly:
Heap Fetches: 0
which means postgres did not even touch the heap and served the query entirely from the index
but here is something interesting
even though index only scan avoids heap access, in this case it was slightly slower than index scan
why?
because we are fetching a single row using a primary key, and the heap access is extremely cheap (often already cached in memory), so removing heap access does not provide a meaningful gain here
index only scan really is best when you are doing something like reading multiple columns like this
Where Index Only Scan Actually Starts to Matter
run this to see the benefit of index only scan
EXPLAIN ANALYZE
SELECT id, name, email, age, city
FROM users
WHERE id BETWEEN 1000000 AND 2000000;
here postgres is using an index scan on the primary key for a range query (~1M rows), but it still has to fetch each row from the heap after locating it via the index
this results in higher execution time (~161 ms) because of repeated heap accesses, even though the lookup itself is efficient, this is exactly the scenario where an index only scan can outperform index scan by eliminating these heap fetches entirely
Understanding What Really Impacts Query Execution
in the end, what this really shows is that query performance in postgres is not about writing complex sql, but about understanding how the database actually accesses data. whether it scans the entire table, navigates through an index, or avoids the heap altogether, each approach is just a different trade-off based on cost. once you start thinking in terms of how much data is being touched and how it is being accessed, optimizing queries becomes far more intuitive and predictable.






Top comments (0)