Step by step: How RDBMS executes SQL statement

#database #sql #programming

Motivation

SQL and RDBMS are most used tools in software industry, and most of performance issues are related of bad storing or bad retrieving of the data, knowing how database stores and retrieves the data can make you avoid a lot of performance issues, we will talk here about how RDBMS executes SQL statement.

Before SQL execution

DBMS do some several steps before executing the SQL statement here are:

Parsing the SQL statement and break it into keywords/tokens.
Validating the SQL statement syntax (all keywords valid and in right order).
Validating the SQL statement against schema (is tables and columns used in SQL exists).
Generating execution plan (regarding table properties and statistics DBMS decides what is best way to execute SQL, using the index or jumping to the table directly and so on).
Executing the execution plan.

How RDBMS execute SQL statement

let’s say below is our data set Movies table with Box-office-rating table

and we want to display the movies with its rating in Box-Office

So, this will be our query

First of all, SQL clauses can be described as a pipes each pipe is processing the data with its own logic then transfer the result to the next clause (pipe) for processing.

First pipe type is FROM clause which is get the data set which we should work on. We can say that FROM clause is equivalent to creating a stream from list of values in Java.

FROM clause prepare the data set to work on, and this is the result.

Then FROM pipe transfer the data to JOIN pipe which do cartesian product between the main table data and join table data.

So, result till now something like this

And depending on JOIN type (INNER/LEFT/RIGHT) and ON clause the result of multiplication formed, as in our example we use INNER JOIN with this on m.id = bo.movie_id the pipe will filter the data and the result as below.

In case a lot of joins each join will work as separate pipe and do it’s own logic and transfer the result set to next one.

Note: you can group joins by parenthesis '()' and force ordering as parenthesis calculated first.

Let’s continue, the next pipe is WHERE clause which work for filtering the data based on the condition and move the result to next one, so after executing WHERE clause this will be the result.

The next pipe should be GROUP BY and Then HAVING clauses but we not have these clauses in our example so will be skipped.

Then next pipe is SELECT clause which here grep the columns we needed and calculate aggregate functions and so on, it works same as map() method of stream in Java, in our example we use ‘*’ to get all columns.