loading...
Cover image for Why I use SQL CTEs

Why I use SQL CTEs

helenanders26 profile image Helen Anderson Updated on ・3 min read

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Common Table Expressions (CTEs) are underrated compared to the subquery. Here’s why I prefer to use CTEs when building SQL queries


How do CTEs work?

In the example below I’ve used a CTE to:

  • get all the data I need from the Email Sent table, told SQL to hold that in memory,
  • then get what I need from the Unsubscribe table, told SQL to hold that in memory too,
  • then joined them together with flags from the Customers table. Where you see a WITH, is the CTE starting and then I’m naming them 'sent' and ‘unsubs’ before starting to tell the CTE what I want to return:
; with sent as -- start the CTE with a semicolon to terminate anything above

(select 
  emailaddress,
  emailid, 
  senddate
from
  marketing.emailsent
where
  senddate between '2018-01-01' and '2018-01-31'
),  -- add a comma if you need to add a subsequent CTE

unsubs as

(select 
  emailaddress,
  emailid, 
  senddate
from
  marketing.emailunsubscribe
where
  senddate between '2018-01-01' and '2018-01-31'
) -- no comma for the last CTE

select
  'January' as monthdelivered,
  c.country, 
  count(distinct sent.emailaddress) as countofdelivered, 
  count(distinct unsubs.emailaddress) as countofunsubd
from sent
left join 
  marketing.customers c on sent.email = unsubs.emailaddress
left join  
  unsubs on sent.email = unsubs.email 
  and sent.emailid = unsubs.emailid

When I do the final 'join everything together' part I’m joining fields from the ‘sent’ dataset such as 'sent.email'.


Why not just use a subquery?

The way I get my head around reading a subquery is by reading from the inside out. It's nesting everything you need together, but in my opinion, it tends to get ugly really quickly.

  • The first step is the query in the centre starting 'SELECT accountID …' to get all orders greater than 30 from the "orderhistory" table.
  • Then JOIN on the "account" table to look up which Accounts were from New Zealand.
  • Then the top SELECT runs to return all the fields from the ord dataset and the three columns I want to see from the "account" table.
select
  ord.*,
  acc.country
  acc.city
  acc.createddate
from (
  select 
    accountid, 
    ordered, 
    ordervalue
  from 
    sales.orderhistory
  where ordervalue > '30'
      ) ord
join 
  sales.account acc on ord.accountid = acc.accountid
where 
  acc.country = 'New Zealand'

Why are they so great?

You can use them multiple times throughout your script and they are readable, you can return what you need then reference it later.


Why not just create a table?

If you don’t have write permissions this may not be possible. And if it’s only used for this query your DBA might not be thrilled with you creating one-off tables.


Are there any negatives?

CTEs don’t last forever and can only be used in the query you’re currently in, unlike temp tables or views that can survive outside the current script.


What about performance?

The database will always decide the best way to execute your query. If you ask your friendly DBA which strategy to use, they can fill you in on which strategy works best for your database. The CTE is all about readability, so if it works for you give it a try.


Read more


This post originally appeared on helenanderson.co.nz

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Posted on by:

helenanders26 profile

Helen Anderson

@helenanders26

Making applications go faster at Raygun, AWS Data Hero, and tag moderator on Dev.to. Database concept you don’t understand? Let me know, I’ll write a post!

Discussion

markdown guide
 

Other drawbacks:

  • You are transferring an application logic from an app server to DB server. If your DB server has 1-2 CPUs do not expect improvement in real world scenarios.
  • Scaling application servers (by deploying another staleless node behind a load balancer) is waaaay much easier than scaling a DB horizontally in case you experience performance issue due to all calculations happening on the DB server. In case of scaled DB service you would need to synchronize the data and make sure a transaction does not finish unless all nodes in a cluster have written data to disk or at least a transaction log) causing unnecessary latency for all DB queries, even those not related to your CTEs.
  • You may actually cause a transfer of much more data than if would do the operation in a memory of an application server. If you pull 2 tables, 1000 rows each and do the join on the app server your data transfer equals roughly 2000 * avg row size. Do that on the server and the result may be in the worst case a cartesian product of all rows for each JOIN operation, that might be 10 000 rows for one JOIN.
 

I think those are valid concerns for application development when users will be hitting your database with this query 100's or 1000's of times. In that case, you probably have the right to ask for another table or some other optimization.

The case that @helenanders26 is talking about here is more for an analyst that may not have to write privileges to the database and is not likely hammering it with the power of every user of the application. The main point is that it greatly improves readability.

Both are great points but from different perspectives.

 

Sure, this would be of concern if you need to implement reporting on top of a live database performing OLTP workloads. If you have the benefit of a dedicated database/server these points, except the data transfer amplification, should not have any negative impact.

 

Great article @helenanders26 . I use table views more often than CTE's, but do have some places without write access that they are very handy to use. Or if it is a one time use and you don't want to confuse others with something they likely shouldn't use.

 

Other benefits:
You can test a CTE indendently.from the rest of the code
You can 'chain' them together so you can break down complex business rules into discreet steps
They support window functions such as RANK() OVER() so you can use the windowed value in the where clause.
They can be self referencing which is great for hierarchical data.

 

Thanks for added detail, even more reason to use CTEs in complex queries :)

 

I didn't know about CTEs but luckily now I do.

I'm glad you also mentioned that CTEs are all about readability. I was worried that I'd be missing out on some great functionality because of pre-defined ORMs in the frameworks I work with keeping me from having real decision making in the way queries are structured.

I already have some real-life use cases in which I could apply CTEs to make the temporary queries I have to write for some kind of export, debugging purposes or any other reason more readable. It'll help both myself and my colleagues a lot, thank you!

 

I guess you should add what a CTE is and what it's full form is?

This came up in top of my feed, I have no idea what it is. It'll make this more approachable for a wider audience!