Databricks SQL Essentials - GROUP BY ALL

#databricks #sql

In this post, I want to focus on GROUP BY ALL , which can simplify queries significantly, especially when you’re experimenting and constantly adapting your analyses. This can be extremely useful when you want to quickly aggregate data without manually listing all the columns, especially in exploratory queries where the SELECT list changes frequently.

Supported in: Databricks Runtime 12.2 LTS and above
Documentation: Databricks SQL Reference

Example:

Here is an example using the Databricks sample dataset bakehouse. We want to sum all transactions by continent, country, state, year, month, and day.

Without GROUP BY ALL, we need to list all the columns explicitly in the GROUP BY clause:

select 
  c.continent, 
  c.country, 
  c.state,  
  year(tx.dateTime) as year,
  month(tx.dateTime) as month,
  day(tx.dateTime) as day,
  sum(totalPrice) as sumTotalPrice
  from samples.bakehouse.sales_customers c 
  join samples.bakehouse.sales_transactions tx 
    on c.customerID = tx.customerID 
  group by all
  order by year, month, day

As expected this will give us the follwoing output.

DEV Community

Databricks SQL Essentials - GROUP BY ALL

Top comments (0)