Understanding Active Record left outer joins

rails

activerecord

sql

Mar, 2021

Understanding Active Record left outer joins

Part II of the series - Understanding ActiveRecord joins - What is a left outer join and how can it be used in ActiveRecord?

What is a left outer join and when should it be used?

A left outer join (or just left join) is used to query a table based on matching and non-matching entries from a related table. Contrarily to an inner join, the left outer join will always return all the entries of the left table even if the join condition does not find any match on the right table.

Let's again follow along with a couple of exercises:

Here's our domain model:

class Accommodation < ApplicationRecord
  has_many :bookings
end
class Booking < ApplicationRecord
  belongs_to :accommodation
end
Accommodation.all
Id Name
1 'Beach House'
2 'Charming House'
3 'Lisbon Flat'
4 'Le Petit Chalet'
5 'Farm House'
Booking.all
Id Accommodation Id Check In Check Out
1 1 20 Jun 2021 23 Jun 2021
2 1 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 4 01 Apr 2021 02 Apr 2021

Get all accommodations with their bookings count

To return a list of accommodations we need to query the accommodations table. But we need to return that list with a calculation derived from a related table bookings. This calls for a join.

Recapping Part I of this series, let's first try using ActiveRecord's joins method:

Accommodation.joins(:bookings)

In SQL this is translated to an INNER JOIN:

SELECT "accommodations".* FROM "accommodations" INNER JOIN "bookings" ON "bookings"."id" = "accommodations"."booking_id"

The inner joined table

accommodations.id accommodations.name bookings.id bookings.accommodation_id bookings.check_in bookings.check_out
1 'Beach House' 1 1 20 Jun 2021 23 Jun 2021
1 'Beach House' 1 1 07 Aug 2021 08 Aug 2021
2 'Charming House' 2 2 22 May 2021 28 May 2021
4 'Le Petit Chalet' 4 4 01 Apr 2021 02 Apr 2021

this returns the following accommodation entries:

Id Name
1 'Beach House'
1 'Beach House'
2 'Charming House'
4 'Le Petit Chalet'

The accommodations 'Lisbon Flat' and 'Farm House' are excluded from the returned ActiveRecord objects since the inner join will only return the accommodations with a matching booking. This is not the desired result since we want to get the sum of all bookings per accommodation, even if that means that we might get accommodations with a zero-sum.

This is where a left outer join comes in handy.

The ActiveRecord left_joins/ left_outer_joins method

Now, instead of joins, let's try rails left_joins (you can also use its alias, the left_outer_joins):

Accommodation.left_joins(:bookings)

produces the following SQL:

SELECT "accommodations".* FROM "accommodations" LEFT OUTER JOIN "bookings" ON "bookings"."id" = "accommodations"."booking_id"

The left outer joined table

When accommodations and bookings are left joined, this is what happens:

accommodations.id accommodations.name bookings.id bookings.accommodation_id bookings.check_in bookings.check_out
1 'Beach House' 1 1 20 Jun 2021 23 Jun 2021
1 'Beach House' 1 1 07 Aug 2021 08 Aug 2021
2 'Charming House' 2 2 22 May 2021 28 May 2021
3 'Lisbon Flat' null null null null
4 'Le Petit Chalet' 4 4 01 Apr 2021 02 Apr 2021
5 'Farm House' null null null null

There are a couple of things that stand out looking at this table. First, and just like in the inner joined table, the 'Beach House' is showing up twice since there are two different bookings for this accommodation, a June booking, and an August booking. But what differentiates the two join methods is that now we have two entries that were missing in the inner join, the 'Lisbon Flat' and the 'Farm House'. Note though, that the values related to attributes of the bookings table are 'null'. So though these two accommodations have no bookings data, they'll still be available to query.

Remember: The columns of the bookings table are now available in the database to query against, but they will not be sent back to the application and built into ActiveRecord objects.

So when we actually run the query, this is what ActiveRecord returns.

Accommodations

Id Name
1 'Beach House'
1 'Beach House'
2 'Charming House'
3 'Lisbon Flat'
4 'Le Petit Chalet'
5 'Farm House'

Let's now go back to our query and use ActiveRecord's group and count methods to count the bookings grouped by accommodation name:

Accommodation.left_joins(:bookings).group(:name).count('bookings.id')

Here's the produced SQL:

SELECT COUNT("bookings"."id") AS count_bookings_id, "accommodations"."id" AS accommodations_id FROM "accommodations" LEFT OUTER JOIN "bookings" ON "bookings"."accommodation_id" = "accommodations"."id" GROUP BY "accommodations"."name"

This will return the desired output, a hash with the accommodation names as keys and the bookings count as values. All accommodations are listed, each with its own bookings count, even the ones with zero bookings.

{"Beach House"=>2, "Charming House"=>1, "Lisbon Flat"=>0, "Le Petit Chalet"=>1, "Farm House"=>0}

Customizing joins with SQL strings

left_joins and left_outer_joins methods are only available from rails 5.2.3 onwards. If you're working with earlier versions of rails you'll have to customize the rails joins method by passing an SQL string with the desired join method.

This means that,

Accommodation.left_joins(:bookings)

will have to be written as:

Accommodation.joins('LEFT OUTER JOIN bookings ON bookings.accommodation_id = acommodations.id")

Get all the accommodations with no bookings

Let's now say that we only want to get those accommodations that do not have any bookings.

We now know that left_joins will include the accommodations that have no bookings but it will also include all the others that do. So how can we filter that list to return only the no-bookings accommodations? Looking at the left outer join table from the earlier exercise, we can say that what differentiates the two groups is that the accommodations with bookings will have an attribute bookings.id with an integer value but the no-bookings accommodations will have a null value for that same attribute.

So knowing that, we can use where to get only the accommodations with a null bookings.id.

Accommodation.left_joins(:bookings).where(bookings: { id: nil })

The SQL:

SELECT COUNT(*) FROM "accommodations" LEFT OUTER JOIN "bookings" ON "bookings"."accommodation_id" = "accommodations"."id" WHERE "bookings"."id" IS NULL

And the returned output:

Id Name
3 'Lisbon Flat'
5 'Farm House'

If you're on rails 6 you could write a more semantic version for this solution using the ActiveRecord's missing method. This will produce exactly the same SQL and output as before:

Accommodation.where.missing(:bookings)

In part III of the series, I'll continue with some examples, this time explaining how we can combine multiple joins in the same query.