DEV Community

7 Bad Practices to Avoid When Writing SQL Queries for Better Performance 🚀

Abdelrahman Mohamed Allam on May 19, 2023

When working with databases, optimizing SQL queries is essential to achieve better performance and reduce response times. In this post, we'll cover...

Read full post

Mike Talbot ⭐ • May 19 '23

ORDER BY and LIMIT still sorts all the records, otherwise you'd never be able to do paged sorted lists.

Not sure how it can be "bad practice" to use a %searchitem% if that's how the data is! Presumably a good idea to use other ANDed statements to try to reduce the list so that the final scanned set is as limited as possible - presuming such limitations are viable in the scenario.

I'm confused about your DISTINCT example, definitely not true for MySQL as MySQL can optimize DISTINCT queries by using an index to quickly find and remove duplicate rows. This can be a significant performance improvement for queries that return a large number of rows.

MySQL cannot optimize GROUP BY queries in the same way, because the GROUP BY clause requires MySQL to calculate aggregate functions for each group of rows. This can be a significant performance penalty for queries that return a large number of rows.

In general, you should use the DISTINCT clause whenever you only need to return unique rows from a table. You should use the GROUP BY clause whenever you need to calculate aggregate functions for groups of rows.

Abdelrahman Mohamed Allam • May 19 '23

Simply the database engine gets the first 10 rows ASC/DESC, so no need to sort 100,000 rows, when the SQL engine will find 10 will stop
limit-optimization
MySQL is really smart and can figure out when you only want a few rows from a big table. When you use LIMIT, sometimes MySQL can use a special trick to find those rows really fast, If you use LIMIT with an ORDER BY clause, MySQL will stop sorting the rows as soon as it finds enough rows to show you. This is really fast if the table is already sorted in a certain way

DISTINCT and GROUP BY - Sometime the scenario change, but the behavior and implementation is different - DISTINCT uses temp table, while GROUP BY uses Hash
distinct-optimization
group by and distinct

@miketalbot

Kraemer • May 23 '24

ORDER + LIMIT sounds like it would be of limited use. Usually with ORDER, I'm looking for the "Top X" of something, but always out of the ENTIRE population. I can't think of a scenario where I'd be interested in the "Top X" of a random sample of results. So the query may be optimized, no argument there - but I would not offer this as standard advice because the utility is so limited outside of testing scenarios.

Matteo Milesi • Sep 18 '24 • Edited

In my experience ORDER + LIMIT is of common use and it's a good practice, if the sorting column is an ordered primary key (which is a very good practice). For example, if you need the records created by a user in the last week, but not more than 10, instead of

SELECT * 
FROM table 
WHERE user_id = ? AND created_at >= DATE_ADD(NOW(), INTERVAL -7 DAY)

and then filter out the extra results in the backend code, you can do

SELECT * 
FROM table 
WHERE user_id = ? AND created_at >= DATE_ADD(NOW(), INTERVAL -7 DAY)
ORDER BY id DESC
LIMIT 10

It's not only more efficient because it skips the records not needed, but using the DESC order it scans the rows from the last created and it stops when it reach the 10th matching result. Without the DESC order it has to scan all the first records created by the user. The only downside is that the results ale reversed, but if you need the in the ASC order you can order them with a parent query or the backend code side.

Bonus tip
If you don't need to limit the results, this example doesn't work well, because the DB engine does not know that created_at follows the same order of the id column. The best thing you can do in this case is to split the extraction in 2 queries:

SELECT * 
FROM table 
WHERE id > (
    SELECT id 
    FROM table 
    WHERE user_id = ? AND created_at < DATE_ADD(NOW(), INTERVAL -7 DAY)
    ORDER BY id DESC
    LIMIT 1
) AND user_id = ?

The subquery extracts the last record id that doesn't meet the desired time range.
The main query directly uses the primary key to skip all the not needed records in a very efficient way.