DEV Community

Cover image for Millions record query - with and without index
Pham Duc Minh
Pham Duc Minh

Posted on

Millions record query - with and without index

Just found out the movie page imdb database on imdb
It's a large dataset that contains Movie information use PosgresSQL

Quickly import the name_basics table, curious about what it looks like

image

A simple table with People's names and some other information

Count the table return 12.382.531 records, but what makes me surprise is table has no primary key or index

Wow, it will take a long time to search...

Right away try to search for my favorite actor Jean Reno, think about taking a coffee in the waiting time (find one in 12 million)

SELECT *
    FROM name_basics
WHERE "primaryName" = 'Jean Reno'
  AND "primaryProfession" = 'actor';
Enter fullscreen mode Exit fullscreen mode

The result is another surprise

1 row retrieved starting from 1 in 923 ms (execution: 908 ms, fetching: 15 ms)

Is it because the table is simple? Or postgres has some cache?
Query a non-index text field in 12 million just take 1 second

What do we need index for? This is a really good performance

But I also want to see how performance go with index, so I add a simple index on primaryName field

create index "primaryName_index" on name_basics ("primaryName");
Enter fullscreen mode Exit fullscreen mode

And re-try the select query

1 row retrieved starting from 1 in 34 ms (execution: 6 ms, fetching: 28 ms)

image

It's a huge different 900ms vs 6ms (for SQL execute), 150 times faster without index

Conclusion

The Index is magic! but it is not free, when query for index size

SELECT i.relname "Table Name",indexrelname "Index Name",
 pg_size_pretty(pg_total_relation_size(relid)) As "Total Size",
 pg_size_pretty(pg_relation_size(relid)) as "Table Size",
 pg_size_pretty(pg_indexes_size(relid)) as "Index Size",
 reltuples::bigint "Estimated table row count"
 FROM pg_stat_all_indexes i JOIN pg_class c ON i.relid=c.oid
 WHERE i.relname='name_basics'
Enter fullscreen mode Exit fullscreen mode

image

It takes ~ 400MB storage (a single-column index)
and also memory when processing.

Anyway, the performance it brings is no doubt.

[UPDATE]
After some research, I found some tips about performance

What is a good latency for API?
Generally, APIs that are considered high-performing have an average response time between 0.1 and one second. At this speed, end users will likely not experience any interruption. At around one to two seconds, users begin to notice some delay.

So take one second only for query database is a need to improve

My machine: Apple M1, Postgres 11 on docker

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay