Understanding Active Record inner joins

rails

activerecord

sql

Mar, 2021

Understanding Active Record inner joins

Part I of the series - Understanding ActiveRecord joins - What is an inner join and how can it be used in ActiveRecord?

What is an inner join and when should it be used?

An inner join lets you combine two tables as long as they have matching values for a common attribute. It is useful when you want to query a table based on matching entries from a related table.

You've probably seen a Venn diagram similar to the cover image of this post and that illustrates this combination, where each circle is a table and their intersection represents the common attributes.

But what does this mean in practice? And how can we take advantage of the joined tables?

Let's use the following domain example to better illustrate this:

class Payment < ApplicationRecord
  belongs_to :booking
end
class Booking < ApplicationRecord
  has_many :payments

  enum status: %w[requested confirmed canceled]
end

Consider also that we have 3 payment entries, as well as 4 booking entries, in our database:

Payment.all
Id Amount Booking Id
1 450 1
2 100 2
3 770 3
Booking.all
Id Status Check In Check Out
1 'requested' 20 Jun 2021 23 Jun 2021
2 'requested' 07 Aug 2021 08 Aug 2021
3 'confirmed' 22 May 2021 28 May 2021
4 'canceled' 01 Apr 2021 02 Apr 2021

Inner Joins in belongs_to associations

Find all the payments of requested bookings

Let's say that we need to get all payments related to bookings with the status 'requested'.

We could solve this the ruby way, like:

Payment.all.select { |payment| payment.booking.requested? }

This works and reads well, but it is not optimal. Look at the queries triggered:

SELECT "payments".* FROM "payments"
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 3], ["LIMIT", 1]]

Sounds familiar? This is the famous n + 1 problem. We are querying payments once and then for each payment we query the related booking (passing the payment's booking_id to the where clause).

There's a more efficient way of getting the same list of payments running only a single query, and this is where the ActiveRecord method joins comes in.

The ActiveRecord joins method

The ActiveRecord joins method defaults to an inner join unless we customize it passing an SQL string to it (more on this later).

So joining payments with bookings,

Payment.joins(:booking)

produces the following SQL:

SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id"

This statement says that we are selecting all the fields of the payments table - "payments".* - and joining booking entries - INNER JOIN "bookings" - that have an id that matches a payment's booking_id - ON "bookings"."id" = "payments"."booking_id".

The joined table

When payments and bookings are inner joined, this is what happens:

payments.id payments.amount payments.booking_id bookings.id bookings.status bookings.check_in bookings.check_out
1 450 1 1 'requested' 20 Jun 2021 23 Jun 2021
2 100 2 2 'requested' 07 Aug 2021 08 Aug 2021
3 770 3 3 'confirmed' 22 May 2021 28 May 2021

But note that, although the bookings' columns are now available to query, at the moment we are only selecting data from payments (SELECT "payments".* ).

So when we actually run the query, this is what is returned:

Id Amount Booking Id
1 450 1
2 100 2
3 770 3

It returns all the payments with a matching booking. Since all payments must belong to a booking, we get all Payment entries, a similar outcome to Payment.all, except that now we have booking data to query against.

So now we can add a where clause to the query, filtering payments by booking status:

Payment.joins(:booking).where(bookings: { status: 'requested' })

Syntax notes: the joins method uses the name of the association defined on the model. Since a payment belongs_to a booking, the :booking symbol is in the singular form. On the other hand, the where method uses the name of the table that we want to apply the filter to, in this case, that's bookings. Since we are joining two tables we need to tell ActiveRecord (and SQL) which table to apply the where clause to.

Produced SQL:

SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id" WHERE "bookings"."status" = 0"

This returns our desired output, the two payments related to 'requested bookings':

Payments

Id Amount Booking Id
1 450 1
2 100 2

Inner Joins in has_many associations

So far, we have been looking at the simplest kind of association, the belongs_to.

Let's now add a new model Guest that has a has_many relationship with bookings. A guest can have many bookings and a booking belongs to a guest:

class Guest < ApplicationRecord
  has_many :bookings
end
class Booking < ApplicationRecord
  belongs_to :guest
end
Guest.all
Id Name
1 'John Smith'
2 'Jane Williams'
3 'Dory Timothy'
4 'Yong Bergman'
5 'Willie Connelly'
Booking.all
Id Guest Id Check In Check Out
1 3 20 Jun 2021 23 Jun 2021
2 5 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 3 01 Apr 2021 02 Apr 2021

Find all guests with bookings

Similarly to the previous belongs_to exercise, we can inner join guests with bookings:

Guest.joins(:bookings)

Syntax notes: Contrarily to what happened in the belongs_to exercise, the :bookings symbol passed to joins is now plural, since a guest can have many bookings.

Produced SQL:

SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"

The joined table

guests.id guests.name bookings.id bookings.guest_id bookings.check_in bookings.check_out
2 'Jane Williams' 3 2 22 May 2021 28 May 2021
3 'Dory Timothy' 1 3 20 Jun 2021 23 Jun 2021
3 'Dory Timothy' 4 3 01 Apr 2021 02 Apr 2021
5 'Willie Connelly' 2 5 07 Aug 2021 08 Aug 2021

The returned table

Id Name
2 'Jane Williams'
3 'Dory Timothy'
3 'Dory Timothy'
5 'Willie Connelly'

Note that John Smith is not included in this list since this guest has no bookings. Remember that inner joins only returns matching entries. If there is no match on the joined table, then that entry will be omitted.

Note also that 'Dory Timothy' is showing up twice. That's because there are two bookings requested by this guest. This outcome is very common when joining a has_many association. The return table will return the same guest n times its bookings. If Jane Williams had 10 bookings, the returned table would include 10 Jane Williams entries.

If we'd like to return only unique guests, we can use distinct:

Guest.joins(:bookings).distinct

Produced SQL:

SELECT DISTINCT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"

The returned table is now

Id Name
2 'Jane Williams'
3 'Dory Timothy'
5 'Willie Connelly'

Inner Joins in has_many through associations

Find all the guests with pending payments

Let's bring back the Payment model to our domain, this time with a pending or confirmed status.

class Guest < ApplicationRecord
  has_many :bookings
  has_many :payments, through: :bookings
end
class Booking < ApplicationRecord
  has_many :payments
end
class Payment < ApplicationRecord
  belongs_to :booking

  emun: %w[pending confirmed]
end
Guest.all
Id Name
1 'John Smith'
2 'Jane Williams'
3 'Dory Timothy'
4 'Yong Bergman'
5 'Willie Connelly'
Booking.all
Id Guest Id Check In Check Out
1 3 20 Jun 2021 23 Jun 2021
2 5 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 3 01 Apr 2021 02 Apr 2021
Payment.all
Id Amount Booking Id Status
1 450 1 'paid'
2 100 2 'pending'
3 770 3 'pending'
4 810 4 'paid'

To find the guests with pending payments, we need to go through the chain of associations: Guest -> Booking -> Payment. This is called a nested join.

Luckily, the ActiveRecord joins method allows us to pass nested associations:

Guest.joins(bookings: :payment)

So what we're saying is: 'Return all guests that have bookings with payments'. This produces two inner joins:

SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests"."id" INNER JOIN "payments" ON "payments"."booking_id" = "bookings"."id"

And results in the following joined table:

guests.id guests.name bookings.id bookings.guest_id bookings.check_in bookings.check_out payments.id payments.amount payments.booking_id payments.status
2 'Jane Williams' 3 2 22 May 2021 28 May 2021 3 770 3 'pending'
3 'Dory Timothy' 1 3 20 Jun 2021 23 Jun 2021 1 450 1 'paid'
3 'Dory Timothy' 4 3 01 Apr 2021 02 Apr 2021 4 810 4 'paid'
5 'Willie Connelly' 2 5 07 Aug 2021 08 Aug 2021 2 100 2 'pending'

The payments data is now available to query. So next, we filter by payment status using the where method:

Guest.joins(bookings: :payment).where(payments: { status: 'pending' })

And the final returned outcome:

Guests

Id Name
2 'Jane Williams'
5 'Willie Connelly'

Customizing joins with SQL strings

Instead of passing symbols that tell ActiveRecord which association to join (and that will be later translated to SQL), the joins method also accepts a SQL string directly as an argument. This has the advantage of providing extra flexibility in building more complex queries.

ActiveRecord joins in polymorphic associations

Adding a new polymorphic model Accommocation to our domain:

class Accommodation < ApplicationRecord
  has_many :bookings, as: :bookable
end
class Booking < ApplicationRecord
  belongs_to :bookable, polymorphic: true
end

In polymorphic associations if you try joins using symbols, you'll get an error:

Booking.joins(:bookable)

ActiveRecord::EagerLoadPolymorphicError (Cannot eagerly load the polymorphic association :bookable)

That is because we need to tell joins which bookable type we want to use since there can be many.

One of the ways we could solve this problem would be to pass an SQL string:

Bookings.joins("INNER JOIN accommodations ON accommodations.id = bookings.bookable_id AND bookings.bookable_type = 'Accommodation'")

Until rails 5 was released, if you wanted to run a left outer join, you'd also have to pass a custom SQL to the joins method. Since then, Rails supports left outer joins through the ActiveRecord methods left_joins and its alias left_outer_joins.

In part II of this series - Understanding ActiveRecord joins - I'll go through similar exercises to explain when and how we can use ActiveRecord's left_joins.