DEV Community

Tim
Tim

Posted on

T-SQL: How To Write A Self Join and Why

As we've seen with SQL joins, we can link data to other data using relationships between data, such as a relationship between the history of a sales' price for a particular house and the details of the particular house. In these cases, the data have relationships with other related data (a home, its details, its price, etc). We use the variety of joins to tie these related data together based on the conditions that we specify. Specific data points also relate to themselves, even if they don't relate to other data. For instance, if we only had data on the history of a particular home sales' price, we would have current and past sales' values. This means that a particular datum from that set would be a value after a previous value and possibly with a next value (if the specific data point is the most recent). When we reflect over this example, we'll see that it applies in a similar manner to every data set. In the video, SQL Basics: How To Use A Self Join and Why, we look at joining a table on to itself and the situations where we would consider using this technique for relationships of data among themselves.

Some questions that are answered in the video:

  • For this video, we use a self join table since our join will only involve one table. Since we join the table onto itself, any column that we use in the join condition will match provided that we don't change any part of the condition (which we do in our example and I show why).
  • As a quick reminder, not all SQL languages may support the exact syntax shown.
  • In the example, what problem are we solving with a self join? How is this useful with any data where we're looking at the data set alone (without any external relationships)?
  • Consider a simple example of where we can see this: in a workout where we do 5 sets of squats, the amount of repetitions may differ. With a self join, we can compare the percent change (if applicable) of how many repetitions we are able to complete relative to other sets. So, if we do 10 repetitions in our 1rst set and 12 repetitions in our 2nd set, we saw an increase in 20% from the 1rst to the 2nd set. The relationship between those data points is a previous-next relationship.

All data have relationships to even the data itself; even data as simple as weather can be looked at from the view of the weather today versus the weather yesterday.

As we see with the self join, the condition matters. We can vary the condition to compare to the previous, latter or other values. We do not need to use a self join if adding derived columns from an existing column - we can simply add those in our SQL statement (I've seen some examples where this is done, but it's not needed).

Top comments (0)