DEV Community

Paul Brabban
Paul Brabban

Posted on • Originally published at tempered.works on

GROUP BY ALL solves a really annoying SQL problem

hero image

Does your SQL still copy most of your columns from SELECT after GROUP BY?

Behold: GROUP BY ALL.

The problem

A simple example of the problem looks like this. I have a table of page views, one row per view. I want to know how many downloads I had each day, so I write some SQL like this:

SELECT
    view_date,
    COUNT(1) num_views
FROM the_raw_views_table
GROUP BY
    view_date
Enter fullscreen mode Exit fullscreen mode

See how I have to repeat view_date in the GROUP BY clause? It's required, and it's pretty much the only appropriate simple value. I must add any columns that I'm not aggregating (I used the aggregate function COUNT() here) to the GROUP BY clause for the query to be valid.

Grouping is something we do all the time. It's a minor irritation when there are only a couple of columns to add, but I've seen queries where there are tens, maybe even hundreds of columns that have to be carefully kept synchronised.

A chunkier example from GitHub:

GROUP BY the_date, countryname, twitter_trend, google_trend, latcent, longcent
Enter fullscreen mode Exit fullscreen mode

The poor solution

I've seen a bad solution around, where you don't need to actually name the columns but can instead use the column's ordinal number. The previous query would look like:

GROUP BY
    1
Enter fullscreen mode Exit fullscreen mode

This is a bad idea for readability, and now you have a list of sequential numbers to keep in sync instead. Here's an example I found on GitHub for how silly it can get:

GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, i.indnatts
Enter fullscreen mode Exit fullscreen mode

The good solution

On 23 May 2024, Google made GROUP BY ALL generally available, and I totally missed it. Now, I can just say what I mean 🎉.

SELECT
    view_date,
    COUNT(1) num_views
FROM the_raw_views_table
GROUP BY ALL
Enter fullscreen mode Exit fullscreen mode

It doesn't matter if you have one plain select column or 100. GROUP BY ALL infers the list. The full documentation explains the specifics and how the inference works.

Supporting platforms

I actually found GROUP BY ALL first on the Databricks platform.

I don't think Trino (and by extension AWS Athena) have GROUP BY ALL.

Top comments (0)