What is a left outer join and when should it be used?
A left outer join (or just left join) is used to query a table based on matching and non-matching entries from a related table. Contrarily to an inner join, the left outer join will always return all the entries of the left table even if the join condition does not find any match on the right table.
Let's again follow along with a couple of exercises:
Here's our domain model:
class Accommodation < ApplicationRecord
has_many :bookings
end
class Booking < ApplicationRecord
belongs_to :accommodation
end
Accommodation.all
Id | Name |
---|---|
1 | 'Beach House' |
2 | 'Charming House' |
3 | 'Lisbon Flat' |
4 | 'Le Petit Chalet' |
5 | 'Farm House' |
Booking.all
Id | Accommodation Id | Check In | Check Out |
---|---|---|---|
1 | 1 | 20 Jun 2021 | 23 Jun 2021 |
2 | 1 | 07 Aug 2021 | 08 Aug 2021 |
3 | 2 | 22 May 2021 | 28 May 2021 |
4 | 4 | 01 Apr 2021 | 02 Apr 2021 |
Get all accommodations with their bookings count
To return a list of accommodations we need to query the accommodations
table. But we need to return that list with a calculation derived from a related table bookings
. This calls for a join.
Recapping Part I of this series, let's first try using ActiveRecord's joins
method:
Accommodation.joins(:bookings)
In SQL this is translated to an INNER JOIN
:
SELECT "accommodations".* FROM "accommodations" INNER JOIN "bookings" ON "bookings"."id" = "accommodations"."booking_id"
The inner joined table
accommodations.id | accommodations.name | bookings.id | bookings.accommodation_id | bookings.check_in | bookings.check_out |
---|---|---|---|---|---|
1 | 'Beach House' | 1 | 1 | 20 Jun 2021 | 23 Jun 2021 |
1 | 'Beach House' | 1 | 1 | 07 Aug 2021 | 08 Aug 2021 |
2 | 'Charming House' | 2 | 2 | 22 May 2021 | 28 May 2021 |
4 | 'Le Petit Chalet' | 4 | 4 | 01 Apr 2021 | 02 Apr 2021 |
this returns the following accommodation entries:
Id | Name |
---|---|
1 | 'Beach House' |
1 | 'Beach House' |
2 | 'Charming House' |
4 | 'Le Petit Chalet' |
The accommodations 'Lisbon Flat' and 'Farm House' are excluded from the returned ActiveRecord objects since the inner join will only return the accommodations with a matching booking. This is not the desired result since we want to get the sum of all bookings per accommodation, even if that means that we might get accommodations with a zero-sum.
This is where a left outer join comes in handy.
The ActiveRecord left_joins
/ left_outer_joins
method
Now, instead of joins
, let's try rails left_joins
(you can also use its alias, the left_outer_joins
):
Accommodation.left_joins(:bookings)
produces the following SQL:
SELECT "accommodations".* FROM "accommodations" LEFT OUTER JOIN "bookings" ON "bookings"."id" = "accommodations"."booking_id"
The left outer joined table
When accommodations and bookings are left joined, this is what happens:
accommodations.id | accommodations.name | bookings.id | bookings.accommodation_id | bookings.check_in | bookings.check_out |
---|---|---|---|---|---|
1 | 'Beach House' | 1 | 1 | 20 Jun 2021 | 23 Jun 2021 |
1 | 'Beach House' | 1 | 1 | 07 Aug 2021 | 08 Aug 2021 |
2 | 'Charming House' | 2 | 2 | 22 May 2021 | 28 May 2021 |
3 | 'Lisbon Flat' | null | null | null | null |
4 | 'Le Petit Chalet' | 4 | 4 | 01 Apr 2021 | 02 Apr 2021 |
5 | 'Farm House' | null | null | null | null |
There are a couple of things that stand out looking at this table. First, and just like in the inner joined table, the 'Beach House' is showing up twice since there are two different bookings for this accommodation, a June booking, and an August booking. But what differentiates the two join methods is that now we have two entries that were missing in the inner join, the 'Lisbon Flat' and the 'Farm House'. Note though, that the values related to attributes of the bookings table are 'null'. So though these two accommodations have no bookings data, they'll still be available to query.
Remember: The columns of the bookings table are now available in the database to query against, but they will not be sent back to the application and built into ActiveRecord objects.
So when we actually run the query, this is what ActiveRecord returns.
Accommodations
Id | Name |
---|---|
1 | 'Beach House' |
1 | 'Beach House' |
2 | 'Charming House' |
3 | 'Lisbon Flat' |
4 | 'Le Petit Chalet' |
5 | 'Farm House' |
Let's now go back to our query and use ActiveRecord's group
and count
methods to count the bookings grouped by accommodation name:
Accommodation.left_joins(:bookings).group(:name).count('bookings.id')
Here's the produced SQL:
SELECT COUNT("bookings"."id") AS count_bookings_id, "accommodations"."id" AS accommodations_id FROM "accommodations" LEFT OUTER JOIN "bookings" ON "bookings"."accommodation_id" = "accommodations"."id" GROUP BY "accommodations"."name"
This will return the desired output, a hash with the accommodation names as keys and the bookings count as values. All accommodations are listed, each with its own bookings count, even the ones with zero bookings.
{"Beach House"=>2, "Charming House"=>1, "Lisbon Flat"=>0, "Le Petit Chalet"=>1, "Farm House"=>0}
Customizing joins with SQL strings
left_joins
and left_outer_joins
methods are only available from rails 5.2.3 onwards. If you're working with earlier versions of rails you'll have to customize the rails joins
method by passing an SQL string with the desired join method.
This means that,
Accommodation.left_joins(:bookings)
will have to be written as:
Accommodation.joins('LEFT OUTER JOIN bookings ON bookings.accommodation_id = acommodations.id")
Get all the accommodations with no bookings
Let's now say that we only want to get those accommodations that do not have any bookings.
We now know that left_joins
will include the accommodations that have no bookings but it will also include all the others that do. So how can we filter that list to return only the no-bookings accommodations? Looking at the left outer join table from the earlier exercise, we can say that what differentiates the two groups is that the accommodations with bookings will have an attribute bookings.id
with an integer value but the no-bookings accommodations will have a null
value for that same attribute.
So knowing that, we can use where
to get only the accommodations with a null bookings.id
.
Accommodation.left_joins(:bookings).where(bookings: { id: nil })
The SQL:
SELECT COUNT(*) FROM "accommodations" LEFT OUTER JOIN "bookings" ON "bookings"."accommodation_id" = "accommodations"."id" WHERE "bookings"."id" IS NULL
And the returned output:
Id | Name |
---|---|
3 | 'Lisbon Flat' |
5 | 'Farm House' |
If you're on rails 6 you could write a more semantic version for this solution using the ActiveRecord's missing
method. This will produce exactly the same SQL and output as before:
Accommodation.where.missing(:bookings)
In part III of the series, I'll continue with some examples, this time explaining how we can combine multiple joins in the same query.
Top comments (0)