DEV Community

fabriciomsdev
fabriciomsdev

Posted on

If you see that a SQL SELECT query is slow - what would you do to improve it?

If a SQL SELECT query is slow, there are some simple steps we can take to identify the cause and improve its performance.

I will explain the above step by step:

1 Step - First see the plan for the execution of the query:

To do that you can use EXPLAIN or EXPLAIN ANALYZE to understand how this query is working behind the scenes.

For example, I built a simple book stores system, with 3 entities, Books, Authors, and Publisher, I ran a simple query with a merge between Books and Authors to show how it works

Image description

As you can see in the table Result 1 this command can show the query execution plan with all process in the query running it can be used to see the bottlenecks in the process, I will proposital add a bottleneck in the process for example adding a table in the query merge without foreign keys.

CREATE TABLE books_publisher (
    publisher_id SERIAL PRIMARY KEY,
    publisher_name VARCHAR(100) NOT NULL
);

ALTER TABLE books_book
ADD COLUMN publisher_id INTEGER;

INSERT INTO books_publisher (publisher_name) VALUES ('Penguin');
INSERT INTO books_publisher (publisher_name) VALUES ('Harper Collins');
INSERT INTO books_publisher (publisher_name) VALUES ('Oxford University Press');
INSERT INTO books_publisher (publisher_name) VALUES ('Pearson');

UPDATE books_book
SET publisher_id = (SELECT publisher_id FROM books_publisher ORDER BY RANDOM() LIMIT 1);
Enter fullscreen mode Exit fullscreen mode

Now I run the query doing a join with publisher:

Image description

Note: The time of execution is 11.180 ms

2 Step - Verify foreign keys and indexes:

I will add a foreign key index to improve the algorithm when we run the query:

ALTER TABLE books_book
ADD CONSTRAINT fk_publisher
FOREIGN KEY (publisher_id)
REFERENCES books_publisher(publisher_id);

CREATE INDEX idx_publisher_id
ON books_book (publisher_id);

CREATE INDEX books_book_btree_publisher_id ON public.books_book USING btree (publisher_id);
Enter fullscreen mode Exit fullscreen mode

After improving in DDL structure:

Image description

Note: The time of execution is 4.168 ms

3 Step - Verify what business needs, and filter data:

For example to count how many books a publisher send for each author we should do the query:

select
    books_publisher.publisher_name,
    books_author."name"  as author,
    count(*) qty
from books_book
inner join books_author 
on books_book.author_id = books_author.id
inner join books_publisher 
on books_publisher.publisher_id = books_book.publisher_id
group by books_publisher.publisher_name, books_author."name";
Enter fullscreen mode Exit fullscreen mode

This query took:

Image description

We can remove the books where we did not have publishers to avoid unnecessary loops I will add a filter:

  where books_book.publisher_id is not null
Enter fullscreen mode Exit fullscreen mode

It is the result:

Image description

It is a very small example, on the large datasets other things we can do are:

  • remove necessary data from attribute selection
  • do views or materialized views with your subqueries to preprocess data
  • organize data to follow normalization rules or unfollow (it can work too), depending on the kind of feature you want to provide to your customer
    • Verify resources of the database, such as: connections opened, connections idle, CPU and RAM use.

Top comments (0)