π This post is part of my series Databricks SQL Essentials
Why use CTEs
In this post, I want to focus on CTEs, which can significantly simplify SQL queries by making complex logic
easier to reason about and maintain. A CTE (Common Table Expression) is a temporary, named result set defined
using the WITH clause. It exists only for the duration of a single query and provides a clean way to organize multi-step transformations.
CTEs are especially well suited for a divide-and-conquer approach to SQL.
In this context, divide and conquer means:
- Breaking a complex query into smaller, logical steps
- Solving each step independently
- Combining the results in a final query
Each step is expressed as its own CTE, resulting in clearer, more readable, and easier-to-debug SQL.
Regarding querying data, CTEs can improve your SQL queries in two main ways:
-
CTE definition β define a named, reusable result set at the beginning of your query using the
WITHclause. - Subquery replacement β use a CTE instead of an inline subquery to improve readability and clarity.
Next, we will explore each type of CTE in detail and show how to use them effectively.
For more details and the official syntax, see the Databricks SQL CTE documentation.
CTE Definition at the Beginning of a Query
A Common Table Expression (CTE) defines a named, reusable result set at the start of a query using the WITH clause.
This approach is ideal for organizing multi-step logic and breaking complex queries into manageable parts.
This example demonstrates how CTEs simplify data processing with the Wanderbricks dataset:
-
countriesβ selects country, continent, and country code information. -
usersβ selects user details including their country. -
user_countriesβ combines users with country info, filling any missing continent or country code with"UNKNOWN".
The final query aggregates total booking amounts by country, continent, and booking status, making it easy to analyze bookings across different regions while handling missing data.
-- CTE for countries info
with countries as (
select c.country, c.continent, c.country_code
from samples.wanderbricks.countries c
),
-- CTE for users info
users as (
select u.user_id, u.name, u.user_type, u.country
from samples.wanderbricks.users u
),
-- Combine users with their country info
user_countries as (
select
u.user_id,
u.name,
u.user_type,
u.country,
coalesce(c.continent, "UNKOWN") as continent, -- fill missing continent
coalesce(c.country_code, "UNKOWN") as country_code -- fill missing country code
from users u
left join countries c on u.country = c.country
)
-- Final aggregation
select
u.country,
u.continent,
b.status,
round(sum(b.total_amount), 2) as total_amount -- sum of bookings per group
from user_countries u
join samples.wanderbricks.bookings b
on u.user_id = b.user_id
group by all -- group by country, continent, and status
order by country, status
This will give us the follwoing output.
Subquery Replacement with a CTE
Use a CTE instead of an inline subquery to improve readability. By giving the subquery a name, the purpose of the logic becomes clearer, and the overall query is easier to understand and maintain.
We can rewrite the previous query by defining the CTE in place of the subquery:
-- Aggregate total booking amounts per country, continent, and booking status
-- using the prepared user-country subquery
select
u.country,
u.continent,
b.status,
round(sum(b.total_amount), 2) as total_amount -- sum of bookings per group
from (
-- subquery with CTE
with countries as (
select c.country, c.continent, c.country_code
from samples.wanderbricks.countries c
),
-- CTE for users info
users as (
select u.user_id, u.name, u.user_type, u.country
from samples.wanderbricks.users u
)
-- Combine users with their country info
select
u.user_id,
u.name,
u.user_type,
u.country,
coalesce(c.continent, "UNKOWN") as continent, -- fill missing continent
coalesce(c.country_code, "UNKOWN") as country_code -- fill missing country code
from users u
left join countries c on u.country = c.country
) u
join samples.wanderbricks.bookings b
on u.user_id = b.user_id
group by all -- group by country, continent, and status
order by country, status
Final Thoughts
Using CTEs in Databricks SQL is a powerful way to break complex queries into clear, manageable steps. They help you structure your logic, improve readability, and make debugging much easier.
Overall, CTEs support a divide-and-conquer approach, making your SQL both cleaner and more maintainable.
From my experience, learning to use CTEs transformed the way I write SQL. What used to be long, nested queries now feels organized and much easier to follow.
My advice: donβt hesitate to break your queries into smaller, named stepsβit not only helps you understand your logic but also makes it easier for others to read and maintain your code.

Top comments (0)