re: SQL 101: All about SQL JOINs VIEW POST

FULL DISCUSSION
 
 

My wild guess would be that’s because FULL JOIN is an evidence of a design flaw. If you find yourself doing FULL JOIN your database structure requires in an immediate surgery.

 

I have a use case for FULL JOIN.

FULL JOINs are fantastic when, for one reason or another (perhaps you built the database just as a way to combine some external data sources ad-hoc), your database has acquired data in two different tables that includes different fact tables about the same real-world entities.

See my example in my recent blog post "Every SQL Join You'll Ever Need".

You'll see that it's easy to make a table that looks like this:

name_lf company name_fl fav_food
Amjit, Anush Apple
Borges, Benita Boiron
Combs, Cathy CVS Cathy Combs Carrots
Daher, Darweesh Dell Darweesh Daher Doritos
Ellis, Ezra EDF Ezra Ellis Endives
Fulvia, Frances Firestone
Frances Fulvia Fries
Grace Gao Garlic
Helen Hopper Hummus

Out of a table that looks like this:

ssn name_lf ph em company
A1A1A1 Amjit, Anush 1111111 111@example.com Apple
B2B2B2 Borges, Benita 2222222 222@example.com Boiron
C3C3C3 Combs, Cathy 3333333 333@example.com CVS
D4D4D4 Daher, Darweesh 4444444 444@example.com Dell
E5E5E5 Ellis, Ezra 5555555 555@example.com EDF
F6F6F6 Fulvia, Frances 6666666 666@example.com Firestone

And a table that looks like this:

social name_fl phone email fav_food age
C3C3C3 Cathy Combs 3333333 ccc@example.com Carrots 33
D4D4D4 Darweesh Daher 4444444 444@example.com Doritos 44
E5E5E5 Ezra Ellis 5555555 555@example.com Endives 55
FFF666 Frances Fulvia 6666666 666@example.com Fries 66
G7G7G7 Grace Gao 7777777 777@example.com Garlic 77
H8H8H8 Helen Hopper 8888888 888@example.com Hummus 88

It may not be a terribly common business need in a well-normalized long-term data store (or, as you say, one that doesn't "require immediate surgery"), but it's a really common one in ETL / "set up the latest database management wants" work. :-)

I could certainly also see myself using a FULL JOIN in the "stations and sites" types of "what sort of data I'm dealing with" situations Dian described. Other people have a point that often times when a "business need" requires two such tables to be joined so often, one would think they might have attributes of their relationships that need to be stored and therefore need a junction table, so that's worth thinking about as one designs their database! But as Dian points out ... not necessarily.

I'd give that it definitely does seem to be far more common, business-question-wise, to want to see the intersection of two data sets or to see "definitely thing A, but only things B/C/D/etc. if applicable", than it is to want to see "whatever -- but put it together on the same line if it seems related"! :-) But I wouldn't call having a business need to look for such joins a "flaw" by any stretch of the imagination.

 

It's not always right. Sometime I using this. Common in big complex query with joining subquery

I do it when trying to debug legacy databases. I don't do it in production code though.

 

Aleksei …

FULL JOIN will do basically a LEFT JOIN and a RIGHT JOIN, correct?

Yes, you are correct in that I have never found too many practical uses as … no databases would normally have this situation in production use. But I still think it should be in the list.

There ARE limited cases like linking all reps to all products and finding which reps have NOT sold what products.

Dale

The beauty of development ... there is never a right and wrong! Some things are logical until the day that they aint ... and vice-versa.

Dale

 

That's not at all true. They're very rare in my experience, but even GOTO has its place and there's been much less harm wrought by full joins.

For an example, a data model I'm currently working with involves a 1:1 relationship between stations and sites: a station can be deployed to a site, a site may be empty (no station), or a station may be held in reserve (no site). I want to see all my stations and all my sites, with deployed stations correctly lined up with their sites. A full outer join between stations and sites on stations.site_id produces exactly the desired output. Anything else would be clumsy at best, and the only remotely appropriate alternative model simply moves the foreign key to the other side.

Well, I am not a SQL architect by any mean, but a join table sites_to_stations would not only allow to avoid FULL JOIN but also attach some attributes to links (like linked_at or like) in the future for free.

I already get metadata for free (it's event sourced), so a junction table even with the appropriate constraints to enforce 1:1 vs m:m would just take up space and make other queries more complicated.

Got it. Thanks for the explanation, I was wrong. In this case it indeed sounds as the best solution.

Also, the junction table approach still doesn't give me what I want, since the un-junctioned records collide on the join criteria:

[local] dian#dian= select * from left_table;
 id   val  
────┼───────
  1  one
  2  two
  3  three
(3 rows)

Time: 0.559 ms
[local] dian#dian= select * from right_table;
 id  val 
────┼─────
  1  1
  2  2
  3  3
(3 rows)

Time: 0.587 ms
[local] dian#dian= select * from left_right;
 left_id  right_id 
─────────┼──────────
       1         3
       3         1
(2 rows)

Time: 0.533 ms
[local] dian#dian= select * from left_table l
left outer join left_right lr on lr.left_id = l.id
right outer join right_table r on r.id = lr.right_id;
   id     val    left_id  right_id  id  val 
────────┼────────┼─────────┼──────────┼────┼─────
      1  one           1         3   3  3
      3  three         3         1   1  1
 (null)  (null)   (null)    (null)   2  2
(3 rows)

Time: 0.984 ms

To get unaffiliated lefts and rights to show up, you have to run the query twice and union the results:

[local] dian#dian= select * from left_table l
dian#dian- left outer join left_right lr on lr.left_id = l.id
dian#dian- left outer join right_table r on r.id = lr.right_id
dian#dian- union
dian#dian- select * from left_table l
dian#dian- left outer join left_right lr on lr.left_id = l.id
dian#dian- right outer join right_table r on r.id = lr.right_id;
   id     val    left_id  right_id    id     val   
────────┼────────┼─────────┼──────────┼────────┼────────
      3  three         3         1       1  1
 (null)  (null)   (null)    (null)       2  2
      2  two      (null)    (null)  (null)  (null)
      1  one           1         3       3  3
(4 rows)

Time: 0.697 ms

Indeed. This is a perfect example of how blinkered are we (I :) might be with our background, experience and expertise.

Say, 20 times in my life I saw FULL JOIN in the legacy DBs, that I always could have refactored to the joined tables. I never met the example of a kind you just shared (btw, thank you again and again for taking time doing that, I really appreciate and value it.)

And—voilà—I stupidly made the wrong assumption based on my experience, expertise and all that crap.

Very enlightening.

You're welcome! :) I should say that the case I mentioned with sites and stations is, to my memory, one of maybe two or three times I've actually written a full join in application code over more than a decade of designing databases and writing SQL. It's really rare, but an important tool nonetheless.

code of conduct - report abuse