DEV Community

Cover image for How to work with SQL query optimization
DbVisualizer
DbVisualizer

Posted on • Originally published at dbvis.com

How to work with SQL query optimization

In this article, you will find out what SQL query optimization is and why it is so important. You will also see some tips for building fast SQL queries and learn what SQL has to offer when it comes to analyzing a query for optimization. SQL query optimization is critical to writing more efficient SQL queries. This ensures that SQL queries are executed quickly, making applications that rely on that data faster. In this article, you will dig into the concept of SQL query optimization, understand its importance, and how to apply it. You will also learn some best practices for writing efficient SQL queries. It’s time to improve your SQL query writing skills!


Tools used in this tutorial
DbVisualizer, top rated database management tool and SQL client


What is SQL Query Optimization?

SQL query optimization is the process of making SQL queries faster and more efficient. The idea behind query optimization is to identify and address performance bottlenecks. This typically involves changing the query, defining indexes, or modifying the database schema. The ultimate goal of SQL query optimization is to reduce the time and resources required to execute a query. This way, the performance of applications that rely on that query will improve accordingly.

Note that the performance of a query depends on the hardware and load of the database server, but also on how you write the query. At the same time, most popular DBMS technologies come with a query optimizer. This reorders or rewrites queries behind the scene to improve performance. Specifically, the DBMS optimizes the query execution plan devised to run the query. Thus, two different queries can be translated into the same execution plan.

Why Is SQL Query Optimization Important?

There are at least three good reasons to optimize queries to make them more efficient. Let’s see them all.

Improved performance

SQL performance tuning helps to reduce the amount of time and resources required by the database server. Since many applications may depend on the same database server, that means improving the performance of several applications.

Scaling becomes easier

Slow queries become even slower as the volume of data increases. In contrast, SQL query optimization allows you to ensure that queries continue to execute efficiently as the amount of data grows.

Cost reduction

Optimizing SQL queries can help to reduce the CPU and memory usage required to run a database server. This means saving money on server hosting.

5 Tips for SQL Performance Tuning

Let’s now see some tips and common mistakes to avoid for writing efficient SQL queries.

1. Select the exact number of columns

One of the most common mistakes is to use the * operator in SELECT statements, even when you are only interested in retrieving a few columns. Using * makes your queries easier to write, but also less efficient.

Example of query where all columns get selected:

SELECT * FROM users
Enter fullscreen mode Exit fullscreen mode

When writing the SELECT clause, use only the number of columns you need to select. This will speed up your query.

Example of a query where only the columns you need are selected:

SELECT id, name, surname FROM users
Enter fullscreen mode Exit fullscreen mode

2. Avoid useless WHERE conditions

Often, SQL queries are written without thinking too much or having the underlying data structure well in mind. For example, let’s say you want to retrieve all users with a value in the points field. You will write the following query:

SELECT id, name, surname 
FROM users
WHERE points IS NOT NULL
Enter fullscreen mode Exit fullscreen mode

This works like a charm and would return what you expect. However, if you had inspected the users table before writing the query, you would have noticed that points is a non-nullable integer with 0 as the default value. In other terms, points is always NOT NULL. So, the WHERE condition you added will only make your query unnecessarily slower.

This is just an example. Yet, before writing WHERE conditions, you should always check that what you are adding is actually useful for filtering the data.

3. Avoid Negative Searches

When it comes to query optimization, you have to take into account even the small details. For example, using the NOT operator in a WHERE clause can make a query slower. This is because the DBMS may have to check all the rows in the table and exclude the ones that do not match the condition, which takes more time and resources than just selecting the rows that match the condition. So, positive queries are typically more efficient than equivalent negative queries.

Example of a negative query:

SELECT id, name, surname 
FROM users 
WHERENOT name = "Jhon"
Enter fullscreen mode Exit fullscreen mode

Example of the equivalent query in positive form:

SELECT id, name, surname 
FROM users 
WHERE name != "Jhon"
Enter fullscreen mode Exit fullscreen mode

Not considering the low-level optimizations made by most DBMS, the second query is faster than the first one.

4. Use temporary tables

You can use temporary tables, also known as temp tables, for SQL query optimization. Specifically, temp tables are useful to store and manipulate intermediate results within a query. This can help improve performance by reducing the amount of data to be processed. Also, temp tables are automatically created when running ALTER TABLE queries on large tables.

For example, you can use a temp table to store the results of a JOIN operation. Then, use that table in the following queries to filter some data. This can be more efficient than performing the JOIN and filtering the data in a single query. Also, you can add indexes to a temp table to improve performance.

5. Avoid the DISTINCT keyword

In SQL, the DISTINCT keyword forces the DBMS to return unique values from a query. When used in a SELECT statement, DISTINCT removes duplicate rows from the result set. This requires an extra operation, which makes your queries slower. So, if you do not really need to use the DISTINCT keyword, try to avoid it.

Example of a query with DISTINCT:

SELECT DISTINCT name, surname
FROM users
WHERE country = 'USA';
Enter fullscreen mode Exit fullscreen mode

This query selects all unique first and last name combinations of users living in the United States.

Let’s rewrite an equivalent query without DISTINCT:

SELECT name, surname
FROM users
WHERE country = 'USA';
GROUP BY name, surname;
Enter fullscreen mode Exit fullscreen mode

Both of these queries will return the same result set. In detail, the first query uses DISTINCT to avoid duplicates, whereas the second query uses GROUP BY to accomplish the same goal.

How EXPLAIN works in SQL

In SQL, the EXPLAIN command shows you the execution plan of a query. In particular, EXPLAIN provides information about how the query optimizer will execute the query. This includes:

  • The order in which the tables will be accessed.
  • Whether and which indexes will be used.
  • Information about the computational cost of each operation that will be executed.

Note that this info changes based on the DBMS specific implementation of EXPLAIN.

You can run the EXPLAIN command as follows:

EXPLAIN
<YOUR_SQL_QUERY>
Enter fullscreen mode Exit fullscreen mode

Replace <YOUR_SQL_QUERY> with the SQL code of your query, as in the example below:

EXPLAIN
SELECT"name", "matchId"
FROM "Events"
WHERE "goldenPeriodId" IS NOT NULL
GROUp BY "name", "matchId"
Enter fullscreen mode Exit fullscreen mode

When running an EXPLAIN query, this returns a set of rows, each of which represents a step in the execution plan.


Note the rows returned by the EXPLAIN query run in DbVisualizer.

Note the rows returned by the EXPLAIN query run in DbVisualizer.

Keep in mind that EXPLAIN does not run the query. For this reason, If you want to get some info on the query execution time, you need to use EXPLAIN ANALYZE.


Note the time info returned by the EXPLAIN ANALYZE query run.

Note the time info returned by the EXPLAIN ANALYZE query run.

As you can see, EXPLAIN ANALYZE also returns info about the time spent planning and the time required to run the query. Note that only PostgreSQL supports EXECUTE ANALYZE. However, other DBMSs may offer the same feature under different commands.

EXPLAIN can be especially useful when you are trying to optimize the performance of a query. By examining the execution plan, you can identify any steps that may be causing poor performance, such as full table scans or inefficient JOIN operations. At the same time, analyzing and understanding the information returned by EXPLAIN is not easy. This is where an advanced SQL client such as DbVisualizer comes into play!

Explain Plan in DbVisualizer

DbVisualizer comes with an Explain Plan feature, which allows you to visually analyze how a query is processed by the database. Specifically, the Explain Plan executes your query and records the execution plan that the database devised to perform it.

By looking at the plan, you can find out if the database is using the right indexes and joining your tables in the most efficient way. This allows you to understand what the DBMS does behind the scene and helps you optimize your queries accordingly. Note that the Explain Plan is available for both single queries and SQL scripts.

To analyze a query with the Explain Plan in DbVisualizer:

  1. Write the query in the “SQL Commander” editor.
  2. Click “Execute Explain Plan” button in the toolbar.
  3. Examine the result.



Note the time info returned by the EXPLAIN ANALYZE query run.

Running a query with the Explain Plan in DbVisualizer.

Let’s now learn how to use DbVisualizer’s Explain Plan in a complete example.

Step #1: Running the raw query

Let’s assume you want to run the following query in a PostgreSQL database:

SELECT * 
FROM "Events" E
WHERE E."matchId" IS NOT NULL
ORDER BY "matchId" ASC, "timePosition" ASC
Enter fullscreen mode Exit fullscreen mode

Events is a table with nearly 1 million records containing event data related to Judo matches.

As you can see, this query does not follow any SQL performance tuning tips presented earlier. Let’s run it in the Explain Plan.


Exploring the results of the Explain Plan in DbVisualizer.

Exploring the results of the Explain Plan in DbVisualizer.

In the “Graph View” section, you can see the execution plan devised by the DBMS as a chart. This helps you visually understand how your query will be executed. While in the “Tree View” part, you can see all the information in a more compact form. In detail, note that the Total Cost is 223563.23

If you are not familiar with this concept, Total Cost refers to the overall computational cost required to execute a query. This involves the amount of memory used, the number of CPU cycles required, and/or the number of disk I/O operations performed. The lower this value is, the more efficient a query is.

Step #2: Applying the optimization tips

First, let’s change the query to select only the required columns:

SELECT E."id", E."matchId", E."timePosition", E."name"
FROM "Events" E
WHERE E."matchId" IS NOT NULL
ORDER BY "matchId" ASC, "timePosition" ASC
Enter fullscreen mode Exit fullscreen mode

This significantly reduces Total Cost to 136423.13.

Also, matchId is defined as a non-nullable field. So, let’s remove the WHERE condition:

SELECT E."id", E."matchId", E."timePosition", E."name"
FROM "Events" E
ORDER BY "matchId"ASC, "timePosition" ASC
Enter fullscreen mode Exit fullscreen mode

Total Cost falls to 136435.12.

Step #3: Adding an index

By analyzing the result provided by the Explain Plan, we can notice that no index is used. So, let’s create an index with the following query:

CREATE INDEX idx_events_complete 
ON "Events"("matchId"ASC, "timePosition" ASC);
Enter fullscreen mode Exit fullscreen mode



Running the index query in DbVisualizer.

Running the index query in DbVisualizer.

Now, it is time to execute the query again in the Explain Plan:

SELECT E."id", E."matchId", E."timePosition", E."name"
FROM "Events" E
ORDER BY "matchId" ASC, "timePosition" ASC
Enter fullscreen mode Exit fullscreen mode



The results of the optimization process.

The results of the optimization process.

As you can see, the execution plan now involves only a single step using the index defined above. Total Cost is now 66522.0 and has more than halved from the previous step.

By applying the query optimization tips presented above and using the Explain Plan feature of DbVisualizer, we have been able to take a query from a Total Cost of 223563.23 to 66522.0. This is a threefold improvement!

Et voilà! SQL query optimization has never been easier!

Conclusion

As you learned here, SQL query optimization is a crucial aspect to make queries faster. By understanding the principles of query optimization and applying best practices, you can write efficient SQL queries. In detail, here you saw five useful tips for SQL performance tuning. Also, you understood what the EXPLAIN command is and how it can help you analyze the execution plan devised by the DBMS to run a query.

The EXPLAIN command is powerful but sometimes difficult to use and understand. Fortunately, DbVisualizer comes with the Explain Plan feature! This allows you to visually explore the steps taken by the DBMS to execute the query, the indexes used, and where to intervene to improve the performance. This is just one of the many features offered by DbViualizer that can help you make a query more efficient. Try DbVisualizer for free today!

About the author

Antonello Zanini is a software engineer, and often refers to himself as a technology bishop. His mission is to spread knowledge through writing.

Top comments (0)