In this post, I want to focus on GROUP BY ALL , which can simplify queries significantly, especially when you’re experimenting and constantly adapting your analyses. This can be extremely useful when you want to quickly aggregate data without manually listing all the columns, especially in exploratory queries where the SELECT list changes frequently.
- Supported in: Databricks Runtime 12.2 LTS and above
- Documentation: Databricks SQL Reference
Example:
Here is an example using the Databricks sample dataset bakehouse. We want to sum all transactions by continent, country, state, year, month, and day.
Without GROUP BY ALL, we need to list all the columns explicitly in the GROUP BY clause:
select
c.continent,
c.country,
c.state,
year(tx.dateTime) as year,
month(tx.dateTime) as month,
day(tx.dateTime) as day,
sum(totalPrice) as sumTotalPrice
from samples.bakehouse.sales_customers c
join samples.bakehouse.sales_transactions tx
on c.customerID = tx.customerID
group by all
order by year, month, day
As expected this will give us the follwoing output.

Top comments (0)