DEV Community

Tim
Tim

Posted on

Solving Data Differentials With LEFT JOINs

One challenge we face with data involves new and existing data from a source data set to a destination data set. The source and destination may be managed by us or one of them could be managed by us. When we have these data sets, we will often get "updates" to these data - in some cases, these updates will involve new data that don't already exist or may involve existing data that need to be updated. In rarer cases (but still frequent), we may need to remove data that no longer exist in an update. This latter case is rarer because most "differentials" will involve adding new data and updating existing data. For the benefit of these "delete" scenario, I also cover it in the video, Using LEFT JOINs To Solve New and Existing Data Differentials, along with updating and adding new data. This means that this video shows how we can use LEFT JOINs to solve all three of these scenarios. These scenarios come up frequently in business so it's useful to know how we can use a join type to solve them. Keep in mind, this is a tutorial video to show you how this can be done, not a recommendation of whether you should use this or not.

Some discussion points mentioned in the video:

  • Relative to what's required for our differentials, why does the order matter? I show this in the demo and mention this warning early, as something you want to consider. Using the demo, imagine if I re-ordered what operation happened first. What would happen to our data?
  • Assume we need to delete, update and insert. What should happen first? Why?
  • When it comes to data differentials, which of the 3 operations occurs less frequently? What is an example I use of this type of data differential?
  • Based on the sample data set, what should occur with the final outcome of our data? Why should we run a test first?
  • Why do I update before I insert?
  • T-SQL is a set-based language. When it comes to performance and adding new data or updating existing data, how should we consider the fact that it's a set-based language in our operations?

When we look at solving data differential with LEFT JOINs, those above points and discussions are worth considering. If we don't order the operations correctly, as an example, we may end up with incorrect data or updating records that were just inserted, poorly affecting our performance.

Can we only solve data differentials with LEFT JOINs? No. We can use merge operations, match data on the application layer, use except functionality supported in some SQL languages, etc. In this tutorial, we see the capability of LEFT JOINs, but they are not the only tool we can use to solve this problem. When we consider solving this problem, we want to consider two key points in development: performance and maintainability. Not every developer may be familiar with the tools we use to solve this problem - we'll want to consider how our solution is maintained. Likewise, we want to consider performance. Relative to our architecture, using LEFT JOINs may not be the most optimal solution. Unfortunately, there is no hard rule about how to solve this problem. But since we see how we can use LEFT JOINs to solve this problem, we have another application of using LEFT JOINs and where they may be appropriate.

Top comments (0)