Jakub Andrzejewski

Posted on Sep 30, 2024

Improving Performance of PostgreSQL queries

#postgres #sql #performance #webperf

I recently wanted to implement some raw SQL queries and work on the efficiency of them so that returning data from database will be faster. After reading few articles and implementing some of the changes myself I decided to summarize them here so that you could use it in your projects :)

Now, let's talk about PostgreSQL. It's a super cool and flexible open-source relational database management system. While PostgreSQL is already pretty awesome at what it does, there's always room to make those queries zip along even faster! Want to know how? We've got some nifty strategies to supercharge your PostgreSQL queries.

Enjoy!

Limit the number of rows returned by a query

In case of small tables, we may not need to take care about the number of returned rows (while I highy suggest to limit them anyway!) in case of bigger tables, this can be extremely useful for improving performance of queries.

We can achieve that in SQL by using the keyword LIMIT with following query:

SELECT * FROM products LIMIT 10

Use `JOIN` instead of subqueries

If there are many records in the table, running a query with subqueries inside may not be performant - instead we could use the JOIN keyword which can be significantly faster.

Instead of using subqueries like following:

SELECT * FROM orders WHERE product_id IN (SELECT id FROM products WHERE name = 'Phone');

We could use JOIN keyword like this:

SELECT orders.* FROM orders JOIN products ON orders.product_id = products.id WHERE products.name = 'Phone';

This can be much faster if your table has big amount of rows.

Use efficient data types for columns

For rows where there are frequent calculations like average it can be beneficial to check what is the datatype for the column. For example, it will be more efficient to use smallint rather than integer. You can change it like following:

ALTER TABLE products ALTER COLUMN price TYPE smallint;

Use `PREPARE` for frequently executed queries

If you know that some queries will be executed frequently, for example a Fetch_Popular_Products_Query, you can use a keyword PREPARE like following:

PREPARE get_products_by_name (text) AS SELECT * FROM products WHERE name = $1;
EXECUTE get_products_by_name('Phone');

By using this approach, the query will be prepared once and the execution will be reused to improve performance.

Using `INDEX` for frequent queries

If you frequently search for products by their name, you can create an index of the name column like following:

CREATE INDEX products_name_idx ON products (name);

It will be especially effective if you would use multiple WHERE or JOIN keywords in your regular query.

Not use wildcard characters

Let's say that we have a query that will search for all products to find those whose name ends with phone:

SELECT * FROM products WHERE name LIKE '%phone';

This query will perform a sequential scan of the entire products table, which can be slow if the table is large. To optimize this query, you can rewrite it like this:

SELECT * FROM products WHERE name LIKE 'phone%';

This query will use the index created in previous point and perform a much faster index scan.

Debugging queries with `EXPLAIN` keyword

The EXPLAIN command will output the execution plan for the query, which can help you identify any performance issues and you can use it like following:

EXPLAIN SELECT COUNT(*) FROM products JOIN orders ON products.id = orders.products_id;

Use full text search service for faster searching by query/ID

Searching through milions of records to find a specific record by a name, or more difficult case - part of name, can be really slow. To solve this problem, you could implement a full test search service like ElasticSearch that can be much faster.

How this would work is that you will store key/value pairs inside your ElasticSearch database in a form of POSTGRESQL_ID/RECORD_NAME, then you will search through the values to find a full or part of the name and assigned PostgreSQL ID, and finally you will perform a PostgreSQL query WHERE id = 'POSTGRESQL_ID' instead of WHERE name LIKE 'RECORD_NAME' which can be significantly more performant as you will be querying for specific record rather than records that match certain search criteria.

This approach looks amazing for performance and optimizing queries, but you have to take into consideration that it will add additional layer of complexity and another service to your infrastructure that you would need to maintain.

📖 Learn more

If you would like to learn more about Vue, Nuxt, JavaScript or other useful technologies, checkout VueSchool by clicking this link or by clicking the image below:

It covers most important concepts while building modern Vue or Nuxt applications that can help you in your daily work or side projects 😉

✅ Summary

Well done! You have just learned how to check if a slot is empty in Vue.

Take care and see you next time!

And happy coding as always 🖥️

Top comments (15)

Steven Thomson • Sep 30 '24

I appreciate people trying to put out content that will help others learn but feel like I need to provide more details/clarify some of the points made here. I am going to assume that you are not specialized in SQL databases or query optimizations since your profile appears to be JS related (Side note: you might want to look at your summary since it looks like it came from another article). This is not to say I think you are not knowledgeable in tech, but I would recommend spending more time on a topic before posting.

Limit the number of rows returned by a query
Limiting can be helpful but you should only really ever do this while developing or analyzing data. The main case where limiting would be required/helpful would be paginating results for a web application (even then other strategies are much faster, see youtube.com/watch?v=zwDIN04lIpc). Otherwise, limiting your results for performance reasons wouldn't really be feasible.

Use JOIN instead of subqueries
This is almost never an issue with modern RDBMS systems. Your database's query optimizer will likely modify your query to convert it to a JOIN query before executing. You can actually see/check this by running an EXPLAIN on your query to see the plan it generates.

Use efficient data types for columns
I see this mentioned a lot when it comes to performance but it's mostly a myth that will likely never be required by 99.999% of developers. Although there might be a case for storage size with BILLIONS of rows of data, the difference between a 32bit integer and a 16bit integer computational will almost never matter and is generally a sign of premature optimizations.

Use PREPARE for frequently executed queries
This can be true if you are regularly accessing your data directly through a driver that does not provide a prepared statement mechanism or does not implicitly create prepared statements in the database. Chances are, your library will always provide a mechanism to do this and you should not try to prepare statements yourself. Although this can be an area of exploration once all other optimization routes have been considered.

CREATE INDEX products_name_idx ON products (name);
This advice is good and should be your first thought when you have verified your query is sound from a SQL perspective. However, I have seen people abuse indexes way too often. The thing to remember is that many indexes on a single table can hurt the performance of tables that are modified often and when you query a table, you can only access it through 1 index at a time. So it doesn't matter if you add indexes to a table if your query is getting complex and won't use the index you just created.

Not use wildcard characters
Not sure what this section is about? The 2 queries are fundamentally different since they search for rows that end with and start with 'phone' respectively. Yes, querying against a pattern will always be slower since you need to do a perform a scan of sorts rather than a seek. I think what you got here is that you noticed that using the filter name LIKE 'phone%' was faster. This is likely because it is checking the start of the name field (which has an index from your previous section) and thus able to perform a range scan of the table rather than a full scan. If that is true, the database would know what chunk of the index your values reside (indexes are by default sorted in ascending order so knowing the first few characters of a values to find helps) so it doesn't need to search the entire table.

Debugging queries with EXPLAIN keyword
Definitely solid advice for anyone trying to dig deeper into why queries are slower and what needs to change about it. I won't go super deep into this topic but it's DEEP and should be a last resort when conventional advice to improve your query has not yielded much of a performance improvement.

Jakub Andrzejewski • Oct 1 '24

Hey, thanks for the comment and I do appreciate your feedback here. As you mentioned, I specialize in JS but from time to time, I like to do some research and write about different topics.

Thanks for sharing additional context that can be useful for others :)

vikas rathod • Oct 1 '24

I'm working as a backend developer since 2 years and still afraid by looking at complex queries which includes multiple joins n all so want to get expertise in that so please anyone any suggestions n cource that will assist in as much as simplified way @clasicrando

Jonathan DS • Oct 1 '24

honestly, it's just practice, create a simulated ecommerce database with orders, items, products and costumers then try to create a dashboard with pure SQL , the engine matters little

George Johnson • Sep 30 '24

Oracle DBA for 20+ years and the one thing we DBAs hate is people seeing databases a "black boxes", always ready to support any dev willing to do some basic performance tuning to get their queries up to scratch. Great article!

Jakub Andrzejewski • Oct 1 '24

Thanks for the kind words! :)

Doug Wilson • Oct 1 '24

Great article with good explanations and examples!

Kudos especially for rolling up your sleeves and working with SQL directly. WAY too many developers turn all this over to object relational mappers (ORMs) and service- or even presentation-layer data manipulation libraries in order to avoid learning SQL, which can always lead to trouble.

If you're interested in seeing some more useful tricks PostgreSQL can do, please check out my Adaptív Application Foundation (AAF) Data Layer (AAF Data) project, which includes BASH and SQL scripts to create roles, schemas, tables, functions, and scripted lookup/reference data.

I'd also like to express my thanks for the high level of usefulness in these posts and for the respectful tone of even those who may disagree. It's refreshing, positive, and helpful and reminds me of one of the main reasons I got into this game ... back when dinosaurs roamed the earth. ;)

My sincere thanks.

Jakub Andrzejewski • Oct 1 '24

Thank you so much for the kind comment! I am certainly not an expert but I have tried these these techniques and they helped in my project so I wanted to share it with others. And I completely agree that there might be better solutions and I may be wrong somewhere so I value each comment :)

Doug Wilson • Oct 1 '24

Sharing what we learn with others is a foundational building block of our communities. It takes courage as well as time to do it, so thanks again.

Please keep it up. Looking forward to reading more and trading experience and lessons learned. It's how we all get better at what we do.

Cheers, Jakub!

Martin Baun • Sep 30 '24

Start with the basics: ANALYZE, EXPLAIN, and add indexes. Great guideline! Thanks for sharing

Jakub Andrzejewski • Sep 30 '24

Thanks for the hint. Appreciate it! :)

Saleem Basit • Sep 30 '24

I think, phone% and %phone are meant to return different results.

Or am I missing something?

Finndersen • Sep 30 '24

Yes they're completely different queries so can't be used interchangeably. This article has a lot of misleading information

nafg • Sep 30 '24

Have you benchmarked these rules? Or do you have a reliable source?

Jakub Andrzejewski • Oct 1 '24

Hey there, I don't have any benchmarks as these techniques I took from the projects I was working on. In every area (frontend/backend) I try to create a list of things that I see are valuable and could help in improving performance and then, write about them in an article so someone could try it on their own :)