DEV Community

Matt Rueedlinger
Matt Rueedlinger

Posted on • Originally published at rueedlinger.ch on

Databricks SQL Essentials - GROUP BY ALL

In this post, I want to focus on GROUP BY ALL , which can simplify queries significantly, especially when you’re experimenting and constantly adapting your analyses. This can be extremely useful when you want to quickly aggregate data without manually listing all the columns, especially in exploratory queries where the SELECT list changes frequently.

Example:

Here is an example using the Databricks sample dataset bakehouse. We want to sum all transactions by continent, country, state, year, month, and day.

Without GROUP BY ALL, we need to list all the columns explicitly in the GROUP BY clause:

select 
  c.continent, 
  c.country, 
  c.state,  
  year(tx.dateTime) as year,
  month(tx.dateTime) as month,
  day(tx.dateTime) as day,
  sum(totalPrice) as sumTotalPrice
  from samples.bakehouse.sales_customers c 
  join samples.bakehouse.sales_transactions tx 
    on c.customerID = tx.customerID 
  group by all
  order by year, month, day
Enter fullscreen mode Exit fullscreen mode

As expected this will give us the follwoing output.

Example output

Top comments (0)