SQL 201 (8 Part Series)
Last week I was explaining to a junior analyst how CTEs (common table expressions) work. They’re persisted temporary data sets, that allow you to store a single query to go back to later in your script. They're underrated compared to the subquery, that seems to be what most analysts around me use. Here's why I prefer to use CTEs when building SQL queries.
In the example below I've used a CTE to:
- grab all the data from the Email Delivered table, told SQL to hold that in memory,
- then grabbed what I need from the Unsubscribe table, told SQL to hold that in memory too,
- then joined them together with flags from the Customers table
Where you see a WITH, is the CTE starting and then I'm naming them 'delivered' and 'unsubs' before starting to tell the CTE what I want to return:
; WITH delivered as -- start the CTE with a semicolon to terminate anything above (SELECT emailaddress, emailid, senddate FROM marketing.emaildelivered WHERE senddate between '2018-01-01' and '2018-01-31' ), -- add a comma if you need to add a subsequent CTE unsubs as (SELECT emailaddress, emailid, senddate FROM marketing.emailunsubscribe WHERE senddate between '2018-01-01' and '2018-01-31' ) -- no comma for the last CTE SELECT 'January' as monthdelivered, c.country, delivered.emailid, count(distinct delivered.emailaddress) as [countofdelivered], count(distinct unsubs.emailaddress) as [countofunsubd] FROM delivered LEFT JOIN marketing.customers c on delivered.email = unsubs.emailaddress LEFT JOIN unsubs on delivered.email = unsubs.email and delivered.emailid = unsubs.emailid GROUP BY delivered.emailid, c.country ORDER BY delivered.emailid
When I do the final 'join everything together' part I'm joining fields from the 'delivered' dataset such as 'delivered.email'.
Here is an example of a Subquery. I don't use them often because my brain doesn't work that way. I would rather get all my datasets separately then join them all together.
The way I get my head around reading it is thinking about it from the inside out. It's nesting everything you need together, but in my opinion, it tends to get ugly really quickly.
- The first step is to run the query in the centre starting 'SELECT AccountID ...' to get all orders greater than 30 from the OrderHistory table.
- Then JOIN on the Account table to look up which Accounts were from New Zealand.
- Then the top SELECT runs to return all the fields from the ord dataset and the three columns I want to see from the Account table.
SELECT ord.*, acc.Country acc.City acc.CreatedDateUTC FROM ( SELECT AccountID, OrderID, OrderValue FROM Sales.OrderHistory WHERE OrderValue > '30' ) ord JOIN Sales.Account acc ON ord.AccountID = acc.AccountID WHERE acc.Country = 'New Zealand'
You can use them multiple times throughout your script and they are readable, you can return what you need then reference it later.
If you don’t have write permissions this may not be possible and if it's only used for this query your DBA might not be thrilled with you creating one-off tables.
CTEs don’t last forever and can only be used in the query you’re currently in, unlike temp tables or views that can survive outside the current script.
SQL server will always decide for you, via the query planner, the best way to execute your query. If you ask your friendly DBA which strategy to use, they will tell you 'it depends' because it does. The CTE is all about readability, so if it works for you give it a try.
Which do you prefer? The CTE, subquery or just creating a table?
This post first appeared on helenanderson.co.nz
Photo by Pixabay on Pexels