Group By, Grouping Sets, Cube, and Rollup Queries in SQL

#sql #database

I came across these advanced queries in SQL when going through the Postgres Docs, and I couldn't wrap the concepts across my head easily. These queries are used to give you a different angle while processing data in SQL. This article will explain these queries in a fairly simple language as you were to explain to a five-year-old, and then we will go through a code example and where you can use these queries.

Explain like I am 5 years old ...
Now Imagine you have your different toys, cars, lego, and balls. And these toys are of different colors like blue, red, green etc.

Group By. Grouping is like putting all the toys of the same color together. So, you put all the red cars in one pile, all the blue cars in another pile, and so on. Group By helps you organize your toys by a specific characteristic, like color, so that you can see and count how many of each.

Grouping Set: Sometimes, you want to group your toys differently. For example, you may want to group them by color and type (cars and balls). So, you create multiple groups, like one for all the red cars, one for all the blue cars, one for all the red balls, and so on. Grouping sets allow you to do this and see the data from different angles.

Roll Up: Think of this like making a summary. You have your toys grouped by color, and you want to know how many toys you have in total for each color. So, you start with individual colors, like red and blue, and then add them up to get the total number of cars and balls. It's like rolling up the numbers from smaller to bigger groups to see the bigger picture.

Cube: This is like a 3D puzzle. You have toys grouped by color and type, but you also want to know the total count for combinations of these characteristics. It's like saying, "How many red cars do I have? How many blue balls do I have?" and so on. Cubes in PostgreSQL help you explore your data in all these different combinations.

Now we have that explanation out of the way, let's now deep dive in a code implementation. I will be using a GUI in performing the SQL queries, you can get your here if you're interested.

We will start by creating a table salaries and populate it with a few data.

CREATE TABLE salaries (

    id int,

    salary int,

    revised_salary int,

    old_salary  int,

    leave  int,

    joining_date date

);

insert into salaries values(10,5000,4000,1000,5,'2019-11-15');

insert into salaries values(10,6000,9000,2000,1,'2019-11-16');

insert into salaries values(10,7000,3000,4400,2,'2019-11-17');

insert into salaries values(20,8000,4000,6600,6,'2019-11-15');

insert into salaries values(20,9000,9400,8800,10,'2019-11-16');

insert into salaries values(20,2000,7800,9400,23,'2019-11-17');

insert into salaries values(30,4400,6600,4200,44,'2019-11-15');

insert into salaries values(30,1500,3600,4300,66,'2019-11-16');

insert into salaries values(30,2000,2600,4500,77,'2019-11-17');

Now using the select command we can see the data we inserted in the salaries table.

Now using the Group By command, we are going to group together the salary, revised_salary and the leave.

select id, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary,
sum(old_salary) as total_old_salary,
sum(leave) as total_leave
from salaries
GROUP BY id
ORDER BY id;

What's happening here is that we're selecting the salary, revised_salary, old_salary and leave and summing them up, then we want to organize the information by the names of the people (in this case, each name has an id), and finally we want to sort the information in alphabetical order and with that we use the order by clause.

Similarly, we can use the joining_date column together with the group by clause to generate a report.

select joining_date, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary,
sum(old_salary) as total_old_salary,
sum(leave) as total_leave
from salaries
GROUP BY joining_date
ORDER BY joining_date;

Grouping Set:

Now we can combine the above 2 queries into one query.

select id, joining_date, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary,
sum(old_salary) as total_old_salary,
sum(leave) as total_leave
from salaries
GROUP BY GROUPING SETS (id, joining_date)
ORDER BY id;

Cube:

Cube is mostly used for report generation and extends the Group By and Grouping Sets clauses. Cube(x,y,z) will give you all the grouping sets as follows: (X, Y, Z) (, Y, Z) (X, Y, ) (X, ,Y) (X) (Y) (Z) ().

In our example, we can create all the possible combinations of id and joining_date.

select id, joining_date, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary,
sum(old_salary) as total_old_salary,
sum(leave) as total_leave
from salaries
GROUP BY CUBE (id, joining_date)
ORDER BY id;

And if you have a hard time understanding the query, here is the explanation to a five year old. This is like making a big list of people where we want to see how much money they make, how their salary changed, and how many days they took off. We also want to know when they joined, so we're looking at that too. But we're not just looking at each person alone; we want to see the money stuff for each person and for when they joined, and even all the combinations of those. Then, we want to put the list in order by the names of the people. So, we can see all this information in different ways and in order, like in a big, organized list.

Roll Up.

ROLLUP lets you create hierarchical subtotals in your query results. It's like creating a summary report with subtotals at different levels of detail. For instance, you can use ROLLUP to get subtotals not just for individual combinations of id and joining_date, but also for higher-level groupings like the total salary for each id and a grand total for all the data. This helps you analyze and summarize salary information at various levels of grouping.

select id, joining_date, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary,
sum(old_salary) as total_old_salary,
sum(leave) as total_leave
from salaries
GROUP BY ROLLUP (id, joining_date)
ORDER BY 1, 2;