DEV Community

Judy
Judy

Posted on

SQL looks like English is a well-intentioned error

We know that SQL is very similar to English, and simple SQL statements can be read directly as English. Except for SQL, other major programming languages do not have this feature, and even if there are English words in the syntax, they are only used as mnemonics for certain concepts or operations, and what is written is a formal program statement rather than an English sentence. But SQL is different. It will write the entire sentence in a form that conforms to English habits, and also add many unnecessary prepositions, such as FROM being the operation subject of the statement but have to be written in the behind, and after GROUP, an extra BY needs to be written.

Why is this happening? The easy reason to think of is to hope that non programmers can also use it. Users can write SQL to query data as long as they can read and write English. This is clearly a good intention, but the result is not satisfactory. The vast majority of business personnel only know how to write very simple queries using SQL, and for such queries, there are now powerful BI software that can provide a more convenient and intuitive visual interface to assist, without the need for handwritten statements. This design intention loses its meaning. On the contrary, the vast majority of users that frequently use SQL for calculations are still programmers, and SQL is still a programming language, like or unlike English, it does not have much difference in understanding for programmers, on the contrary, it can bring considerable difficulties.

In fact, SQL is a language with very strict syntax, and any non compliant part of a statement will be rejected by the database. Users must study and follow its syntax rules carefully, which is no different from other programming languages. The true advantage of natural language lies in its ambiguity, which allows it to accept less strict syntax to a certain extent. However, SQL does not support this feature, and it was not possible to achieve this feature in the era when SQL was invented.

The benefits of looking like English cannot be reflected, but there are many drawbacks. Designing grammar like natural language may seem easy to master, but in fact, the opposite is true.

The main downside of being like natural language is non procedural. Program logic is generally executed step by step, using variables to record intermediate results for use in subsequent steps. But natural language is not like this, the citation relationship between two sentences is depended on a small number of fixed pronouns, which is imprecise and inconvenient. So it will have to spell actions targeting the same subject into one sentence as much as possible, so there is no need to use pronouns. The corresponding representation in SQL is to have multiple actions in one statement, such as SELECT, WHERE, GROUP, which were originally unrelated actions. In other programming languages, they are usually designed as multiple functions, but in SQL, they are all designed as clauses of one statement. Moreover, words like “WHERE” and “HAVING” have the same meaning, only targeting different objects, when spelled into one sentence, two words must be used to indicate the difference, which is confusing (many beginners may be confused about HAVING).

Complex situations that cannot be described in a single sentence can be described using clauses in natural language. This is manifested as subqueries in SQL, and there may also be multiple layers of nested subqueries, which is not common in other programming languages. Moreover, subqueries should also be like natural language, with a SELECT…FROM every time, it will make people feel very verbose and the code will become long.

Step by step is an effective way to reduce the difficulty of understanding and executing. Something that is originally quite simple to do in a few steps, but if you don’t take them step by step, it will be very complicated to implement. It can be imagined that if the teacher asked elementary school pupils to solve practical problems only in one equation, the children would be very distressed (of course, some smart children can handle it).

For example, if we want to find customers whose sales exceed twice the average, the natural way of thinking is to first calculate the average sales, and then find customers whose sales exceed twice this value, implementing with two statements. The writing of SQL requires using subqueries to write one longer sentence. This example is relatively easy to understand, with only two layers. The difficulty of using natural language clauses to describe the relationship within  two layers is generally acceptable, but in reality, complex queries involving three or five layers are common, which seriously increases the difficulty of understanding.

Not advocating for step-by-step can lead to long single SQL statement. The complex SQL statements faced by programmers are rarely counted in lines, often in Ks. However, for the same 100 lines of code, whether it is divided into 100 statements or only 1 statement, its complexity is not at the same level at all. This type of code is very difficult to understand, and once it was finally written, after two months, the programmer himself couldn’t understand it. Moreover, the single long statement without steps is difficult to debug, and the development cycle is also longer.

About procedurality, there is a saying in the industry that SQL is a declarative language, and users only need to care about what they want without caring about how to do it; the database will automatically find a solution, and this language does not need to support procedurality. We have already criticized this statement earlier.

Database vendors may have also seen the lack of procedurality of SQL, so they later added CTE syntax to compensate, which is equivalent to providing intermediate variables that can be named. Stored procedures are also equivalent to being able to execute SQL in steps, with branching loops and even subroutines. The result is still to return to the old path of procedural language, so this is not as good as designing it like this from the beginning.

For programming languages, the ease of use brought by a good step-by-step computing mechanism far exceeds what looks like natural language.

Top comments (5)

Collapse
 
franckpachot profile image
Franck Pachot • Edited

SQL haters usually misunderstand what is an SQL statement and think it is like an execution instruction, with a subject and an action, like when you say:

The corresponding representation in SQL is to have multiple actions in one statement, such as SELECT, WHERE, GROUP, which were originally unrelated actions.

or:

FROM being the operation subject of the statement but have to be written in the behind

However, there is no subject, action verb or operation in SQL. It is the user specification, describing the expected result. That is why FROM is after the projection which is the declaration of the result set.

For example, select name from airports where country='US' means: I want to see the name of the airports in US. You see that FROM is in the right place when specifying the result you want. Only when you look at the execution plan you can see how it is executed and this is not English but a tree of operations on row sets.

This is also not correct:

added CTE syntax to compensate, which is equivalent to providing intermediate variables that can be named.

Common Table Expressions are still declarative. For better readability they declare subqueries in WITH clause but it is the same behavior as when inlined. It is not an intermediate relational variable except if the execution plan decides to materialize it (and you can control this with optimizer hints).

It is ok to prefer coding the data access with loops, local variables, buffers, sorted or hashed stuctures, and build tests for all branching combinations, but then don't use SQL, or do simple CRUD through ORM. But SQL developers prefer to put efforts on the business logic and call a service that will find the optimal access to their data. SQL is the API for this service. Further optimisation is also declarative in SQL databases, defining indexes for you access patterns without changing the application code.

Collapse
 
stanleyugwu profile image
Stanley Ugwu

Thanks for sharing this

Collapse
 
kennell123 profile image
Jacob J. Kennell

Thanks For sharing

Collapse
 
esproc_spl profile image
Judy

You're welcome. I'm glad you like it

Collapse
 
connor_mc_d profile image
Connor McDonald 🍸☕️

Respectively this feels more like a (valid) commentary on COBOL. COBOL was indeed designed to be a human-readable expression of programming constructs.

But SQL is not about programming, it is about intent, which is why the statement you make below is worth challenging:

The true advantage of natural language lies in its ambiguity, which allows it to accept less strict syntax to a certain extent

I would contend the true advantage is that natural language allows it to be self-evident was the goal of the task is. Show someone an SQL, and even if they have no idea how the result would be computed, they'll have a good chance of knowing what the intention is.

Don't get me wrong, SQL is not by means utopia for that, but present a novice (developer or otherwise) with a SQL statement to get average salaries by department versus the equivalent in C++, and I'll wager they can digest the SQL version far more quickly.