DEV Community

Cover image for How is an SQL Query Statement executed
kbws13
kbws13

Posted on • Edited on

How is an SQL Query Statement executed

Hello everyone! This is my first article.

In this article, I will introduce how an sql query statement is executed

Below is the MySQL architecture diagram:

MySQL architecture diagram

In general, MySQl can be divided into two parts: the server and the storage engine layer.

The server layer includes the connector, query cache, parser, optimizer, executor, etc., and contains most of the MySQL's core service functions, as well as all build-in functions (such as date, time, math and encryption functions). All cross-storage engine features, such as stored procedures, triggers and views are implemented at this layer.

The storage engine layer is responsible for data storage and retrieval. Its architecture is plugin-based, supporting multiple storage engines such as InnoDB, MyISAM, Memory. Staring from MySQL 5.5.5,InnoDB became the default storage engine for MySQL.

You can specify the memory engine wneh create a table by using the create table statement with engine=memory.

Different storage engines share the same Server layer

Connector

The first step is to connect the database, which requires the connector. The connector is responsible for establishing a connection with the client, obtaining permissings and maintaing and managing the connection. The connection command is:

mysql -h$ip -P$port -u$user -p
Enter fullscreen mode Exit fullscreen mode

This command is used to establish a connection with the server. After completing the classtic TCP handshake, the connector will use the provider username and password to authenticate your identity.

  • If the username or password is incorrect, you will receive an Access denied for user error, and the client program will terminate.
  • If authentication is successful, the connector will retrieve the permissions of the current account from the permissions table. All permission checks during this connection rely on this initial retrieval.

This means that once a connection is successfully established, any changes made by the administrator to the user's permissions will not affect the existing connection's permissions. Only new connections will use the updated permission settings.

After the connection is established, if there is no subsequent action, the connection enters an idle state, which can be viewed using the show processlist command:

processlist

If the client remains inactive for too long, the connector will automatically disconnect. The duration is controlled by the wait_timeout parameter, which defaults to 8 hours.

If the connection is terminated and the client sends a request, it will receive an error message: Lost connection to MySQL server during query. To continue, you need to reconnect and then execute the request.

In databases, a persistent connection refers to one where the client maintains the same connection for continuous requests after successfully connecting. A short connection refers to disconnecting after a few queries and reconnecting for subsequent queries.

Since the connection process is complex, it's recommended to minimize the creation of connections during development, i.e., use persistent connections whenever possible.

However, when using persistent connections, MySQL's memory usage may increase significantly because temporary memory used during execution is managed within the connection object. These resources are released only when the connection is terminated. If persistent connections accumulate, it may lead to excessive memory usage, causing the system to forcefully terminate MySQL (OOM), resulting in an unexpected restart.

Solutions:

  1. Periodically disconnect persistent connections. After using a connection for a while or executing a query that consumes excessive memory, disconnect and reconnect for subsequent queries.
  2. If you are using MySQL 5.7 or later, you can use mysql_reset_connection after executing a resource-intensive operation to reinitialize connection resources. This process does not require reconnection or re-authentication but resets the connection to its just-created state.

Query Cache

Note: Starting from MySQL 8.0, the query cache feature has been completely removed because its disadvantages outweigh its advantages.

When MySQL receives a query request, it first checks the query cache to see if this query has been executed before. Queries that have been executed before and their results are cached in memory as key-value pairs. The key is the query statement, and the value is the result. If the key is found in the query cache, the value is returned directly to the client.

If the query is not found in the query cache, the process continues.

Why does the query cache do more harm than good?

Query cache invalidation occurs very frequently. Any update to a table will clear all query caches related to that table, resulting in a very low cache hit rate, unless the table is a static configuration table.

MySQL provides an "on-demand" method for using the query cache. By setting the parameter query_cache_type to DEMAND, SQL statements will not use the query cache by default. To use the query cache, you can explicitly specify SQL_CACHE:

select SQL_CACHE * from T where ID=10;
Enter fullscreen mode Exit fullscreen mode

Parser

If the query cache is not hit, the statement execution process begins. MySQL first needs to understand what to do, so it parses the SQL statement.

The parser first performs lexical analysis. The input SQL statement, consisting of strings and spaces, is analyzed by MySQL to identify what each part represents. For example, select is identified as a query statement, T as a table name, and ID as a column.

After lexical analysis, syntax analysis is performed. Based on the results of lexical analysis, the syntax analyzer determines whether the SQL statement conforms to MySQL's syntax rules.

If there is a syntax error, an error message like You have an error in your SQL syntax will be displayed. For instance, in the following query, the select keyword is misspelled:

mysql> elect * from t where ID=1;

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'elect * from t where ID=1' at line 1
Enter fullscreen mode Exit fullscreen mode

Optimizer

After parsing, MySQL knows what you want to do. Next, the optimizer determines how to do it.

The optimizer decides which index to use when a table has multiple indexes or the order of table joins when a query involves multiple tables. For example, in the following query:

select * from t1 join t2 using(ID) where t1.c=10 and t2.d=20;
Enter fullscreen mode Exit fullscreen mode

The query can start by retrieving values from t1 or t2. Both approaches yield the same logical result, but their performance may differ. The optimizer's role is to choose the most efficient plan.

After the optimization phase, the process proceeds to the executor.

Executor

The executor starts executing the query.

Before execution, it first checks whether the current connection has permission to query the table. If not, an error indicating insufficient permissions is returned. (Permission checks are also performed when returning results from the query cache.)

If permission is granted, the table is opened, and execution continues. During this process, the executor interacts with the storage engine based on the table's engine definition.

For example, suppose table T has no index on the ID column. The executor's execution process would be as follows:

  1. Call the InnoDB engine interface to fetch the first row of the table and check whether the ID value is 10. If not, skip it; if yes, add it to the result set.
  2. Call the engine interface to fetch the "next row," repeating the same logic until all rows are checked.
  3. The executor returns the accumulated result set to the client.

At this point, the query is complete.

For indexed tables, the process involves using the engine's pre-defined methods to fetch the "first matching row" and "next matching rows" iteratively.

In the slow query log, the rows_examined field indicates the number of rows scanned during query execution. This value accumulates every time the executor calls the engine to retrieve a data row.

In some cases, a single call to the executor may involve scanning multiple rows internally within the engine. Therefore, the number of rows scanned by the engine does not necessarily equal rows_examined.

End

Thank you for reading!I hope the article can be helpful to you.

Top comments (0)