Let’s clear the air quick — SQL is awesome. It’s the most consistently used query language for well over two decades. There’s a ton of love for SQL from thousands of analysts and developers around the world. And honestly, I probably wouldn’t even have a job if it weren’t for SQL simply based on how ubiquitous it is and how much it helps building products.
So before getting into my rant below, I want to say THANK YOU to the people who’ve worked on SQL and SQL systems. You all must have moved mountains to make it all happen, and I deeply appreciate it.
All that said… SQL can sometimes SUCK. Here’s a bunch of reasons why.
Can you spot the errors in the following query?
cool_films as ( select title, "Description: " + description, length, release_year, from film where description like '%cool%' where title like '%cool%' ), select * from cool_films
How many did you find? Because there are five errors!
- There’s a missing
withat the beginning for the common table expression
- Trailing comma after
- … and another trailing comma after the closing parens
- Double quotes don’t refer to strings, they refer to identifiers (usually column identifiers)
- You can’t use
wheretwice; you have to use
andif you want to specify more filtering conditions
I guarantee you most people who write SQL for work run into at least one of these errors once a week or more. These errors are incredibly frustrating and they pile onto what is already a high learning curve for SQL.
These syntax rules also don’t add much. Is the above SQL that difficult to read? Or even difficult to parse before executing? No! Let me run my query!
IF YOU’RE LOOKING AT QUERY LIKE THIS:
SELECT something FROM a_table WHERE im_not_yelled_at ORDER BY my_sanity ASC
DO YOU REALLY FIND THIS MORE READABLE THAN:
select something from a_table where im_not_yelled_at order by my_sanity asc
WHO WANTS THEIR QUERIES TO YELL AT THEM?
HAVE I MADE MY POINT YET??
Suppose we have the following query:
select title, description, (publish_year / 100) as century from books where century between 16 and 19
This will return an error because the alias
century can’t be used in the
where clause. Some SQL engines will allow you to use
having here instead, but it’s not standard SQL.
The historical reason for this is that everything in the
select block is supposed to happen last — meaning the SQL engine will process the line
where century like between 16 and 19 first, and it’s going to be like, “huh, I haven’t seen this
I’m no expert on building SQL engines, but I don’t really understand why this has to be a rule. Is there a logic error or such a big performance hit that it’s impossible for SQL engines to look for aliases in the
select clause before executing the
where clause? Is doing a quick scan of some text truly that hard to do?
When I’m iterating and in the middle of working on a query, I often want to do something like this:
with something_i_copy_pasted ( select ... 50 different columns from a_big_table ), -- probably many other CTEs with complicated joins... select *, except column "foo", and format column "bar" better from something_i_copy_pasted
In these cases, I end up having to copy/paste the
select statements across multiple CTEs and modify them slightly. This ends up being a tedious task when I really want to change something small and see what the results look like.
The oft recommended way to share SQL query snippets is through Views. I’m sure some teams use Views correctly and don’t run into issues, but I haven’t seen one.
My main issue with Views is that they can’t take input variables. For instance, it’d be amazing to have a View where you can modify a specific
where condition or change/remove a
join, or be able to slot in a CTE somewhere. These types of adjustments are nearly impossible without some very complicated Stored Procedures, which are yet another common source of issues.
Unlike a programming language where I can often copy/paste things from Stack Overflow or use a package manager and utilize a library, in SQL I can do none of those things. There are no SQL “packages” that just do a
cross join unnest(x), or do some gnarly window functions. Above all, there’s no built-in support for a collaborative codebase — most teams have to roll their own version of shared SQL snippets in Github. All this sucks!
Yes yes, variables exist in some SQL dialects, but they’re not standard.
Here’s an example where variables could be helpful:
select title, description, (publish_year / 100) as century, case when century < 0 then (century * -1) || ' B.C.' when century > 0 then century || ' A.D.' as formatted_century from books
Alas, aliases can’t be reused! This is another situation where I need CTE, or write some really ugly SQL. In my opinion, both are less than ideal.
Every time I get an error from my SQL query, I want to put my head through a wall. Often times, the errors should be simple or easy to fix, but SQL error messages give little information on how.
Here’s a forced example:
select first_name from actor a join actor b on a.actor_id = b.actor_id #=> Query Error: error: column reference "first_name" is ambiguous
The error is clear on what’s wrong — there are two columns called
first_name, and the engine doesn’t know which one you want. Makes sense! But would it be SO HARD to tell me how to fix this simple error?
What I really want is this:
select first_name from actor a join actor b on a.actor_id = b.actor_id #=> Query Error: Column reference "first_name" is ambiguous. Available columns are a.first_name, b.first_name.
Easy peasy! What about misspellings?
select first_namee from actor a #=> Query Error: error: column "first_namee" does not exist -- nope! this is what i want: select first_namee from actor a #=> Query Error: Column "first_namee" does not exist. Did you mean "first_name"?
Did someone say “syntax error”?
select first_name from actor a order by last_name where first_name = 'Brad' #=> Query Error: error: syntax error at or near "where" -- "where" do i begin? no thanks! just tell me what the problem is: select first_name from actor a order by last_name where first_name = 'Brad' #=> Query Error: Syntax error at or near "where". "where" should come before "order by"
I could go on and on.
There are many errors to sift through, and I’m sure some have ambiguous or unclear solutions, and I know some of you will shout “use an IDE!”, etc. etc.
(side note: most SQL IDE’s are terrible, but I digress…)
If SQL engines returned with helpful error messages in the first place, no one would need a workaround, or bug their coworkers, or furiously google for fixable issues. Better error messages would be the single biggest UX improvement of SQL that I can think of.
In many programming languages, there’s built-in support for a dropping debug statements, stopping execution, and being able to poke around and see what your program is doing. This is immensely helpful whether you’re building something new or debugging a problem in your code.
In SQL, there’s no equivalent. Beginners usually don’t know what to do when they’re faced by query that “isn’t working” or — worse yet — a query that silently gives unexpected results and you didn’t know.
Over time, you build an intuition on how to manually “debug” a query, doing things like:
- Starting with a
where id = '123'to limit your results and iterate quickly
- Doing a
left join other on a = other.b where a = nullto surface join issues
- Running CTEs one by one so you know how the query is “building” from one CTE to the next
I’m sure there’s other methods of “debugging” out there. But these feel like workarounds to me. What I really want is a way to “step through” my query and see how the SQL engine is processing it, and what types of results it gets along the way.
Believe me, I get that building a debugger into a SQL engine would be ridiculously difficult. You’re usually not running your query directly on the database server — you’re sending your query over the network to be validated, parsed, and executed, often on huge datasets. You can’t quite “peek” into how a query is getting executed like you could in a programming language.
… unless there’s a genius out there who can figure it out! I don’t know what’s actually possible, but any progress here would be an immense help.
I have to give credit where credit’s due — relational algebra is great, and it’s really useful for a query language to describe how to slice and dice data.
Sometimes, though, I just want a for loop. Just let me drop into a lambda and modify some things here and there. In a similar vein to my other complaints, functions/lambdas do exist in some SQL dialects, but they’re not part of ANSI SQL.
As a result, when you’re locked into relational algebra, it can often feel like a square peg in a round hole. Funky joins, compounding window functions, complex sub-selects — all of which are valid solutions that can, at the same time, feel very “advanced” and sometimes even hacky, even for a seasoned SQL user.
In reality, I want SQL to be as easy as Excel, or as smooth as dropping into a REPL to dig around what’s possible. Users aren’t going to get their queries right from step one, and the nature of relational algebra is that you’re going to get your queries wrong A LOT before you get them right. It’s so demotivating to go through it, and in reality it doesn’t have to be this way.
Of course, the most common complaint of all.
With PostgreSQL, MySQL, Spark SQL, Presto SQL, SQL Server, Oracle SQL, and probably many others… we just can’t get a break! Each of these SQL implementations have slight idiosyncrasies that make it very painful to transfer queries from one system to another.
This issue truly rears its head when you want to use Google or Stack Overflow for solutions. You can’t always search for “how do I do X in SQL”, because some of the answers may be in a specific SQL dialect and won’t help you. For someone new to SQL, this is brutal, and it can take a long time to resolve simple issues.
I don’t know what the root problem here is. There is indeed an accepted SQL Standard, but nonetheless there are many deviations from it. Perhaps if there was recognized implementation of the SQL Standard as a standalone SQL Parser library that any application can utilize, then the industry could converge on the standard naturally.
But alas, here we are.
My hope is that at least some of this resonated with you. Maybe if enough people feel strongly about what can improve, then SQL really will improve.
We spend so much time about the UX of web and mobile products, but not enough, IMO, on developer or analyst products. One day, though, one day.
Thanks for listening to my TED talk.