Understanding ActiveRecord inner joins

activerecord

rails

sql

Mar, 2021

Understanding ActiveRecord inner joins

Part I of the series - Understanding ActiveRecord joins - What is an inner join and when should it be used?

What is an inner join and when should it be used?

An inner join is used to query a table based on matching entries from a related table.

But what does this mean in practice?

Let's use the following domain example to better illustrate this:

class Payment < ApplicationRecord
  belongs_to :booking
end
class Booking < ApplicationRecord
  has_many :payments

  enum status: %w[requested confirmed canceled checked_in checked_out]
end

Consider also that we have 3 payment entries on our database and 4 booking entries.

Payment.all
Id Amount Booking Id
1 450 1
2 100 2
3 770 3
Booking.all
Id Status Check In Check Out
1 'requested' 20 Jun 2021 23 Jun 2021
2 'requested' 07 Aug 2021 08 Aug 2021
3 'confirmed' 22 May 2021 28 May 2021
4 'canceled' 01 Apr 2021 02 Apr 2021

Inner Joins in belongs_to associations

Find all the payments of requested bookings

Let's say that we need to get all payments related to bookings with the status 'requested'.

We could solve this the ruby way, like:

Payment.all.select { |payment| payment.booking.requested? }

This works and reads well, but it is not optimal. Look at the queries triggered:

SELECT "payments".* FROM "payments"
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 

Sounds familiar? This is the famous n + 1 problem. We are querying payments once and then for each payment we query the related booking (using the payment's booking_id in the where clause).

There's a more efficient way of doing the same on a single query, this is where joins comes in.

The ActiveRecord joins method

The ActiveRecord joins method defaults to an inner join unless we customize it passing an SQL string to it (more on this later).

So joining with payments with bookings,

Payment.joins(:booking)

produces the following SQL:

SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id"

This statement says that we are selecting all the fields of the payments table - "payments".* - and joining the booking entries - INNER JOIN "bookings" - whose id match a payment's booking_id - ON "bookings"."id" = "payments"."booking_id".

The joined table

When payments and bookings are inner joined, this is what happens:

payments.id payments.amount payments.booking_id bookings.id bookings.status bookings.check_in bookings.check_out
1 450 1 1 'requested' 20 Jun 2021 23 Jun 2021
2 100 2 2 'requested' 07 Aug 2021 08 Aug 2021
3 770 3 3 'confirmed' 22 May 2021 28 May 2021

Note that: The columns of the bookings table are now available in the database to query against, but they will not be sent back to the application and built into ActiveRecord objects.

So when we actually run the query, this is what ActiveRecord returns.

Payments

Id Amount Booking Id
1 450 1
2 100 2
3 770 3

It returns all the payments with a matching booking. Since all payments must belong to a booking, we get all Payment entries, a similar outcome to Payment.all, except that now we have booking data to query against.

So now we can add a where clause to the query, filtering payments by booking status:

Payment.joins(:booking).where(bookings: { status: 'requested' })

Syntax notes: the joins method uses the name of the association defined on the model. Since a payment belongs_to a booking, the :booking symbol is in singular form. On the other hand, the where method uses the name of the table that we want to apply the filter to, in this case, that's bookings. Since we are joining two tables we need to tell ActiveRecord (and SQL) which table to apply the where clause to.

Produced SQL:

SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id" WHERE "bookings"."status" = 0"

This returns our desired output, the two payments related to 'requested bookings':

Payments

Id Amount Booking Id
1 450 1
2 100 2

Inner Joins in has_many associations

So far, we have been looking at the simplest kind of association, the belongs_to.

Let's now add a new model Guest that has a has_many relationship with bookings. A guest can have many bookings and a booking belongs to a guest:

class Guest < ApplicationRecord
  has_many :bookings
end
class Booking < ApplicationRecord
  belongs_to :guest
end
Guest.all
Id Name
1 'John Smith'
2 'Jane Williams'
3 'Dory Timothy'
4 'Yong Bergman'
5 'Willie Connelly'
Booking.all
Id Guest Id Check In Check Out
1 3 20 Jun 2021 23 Jun 2021
2 5 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 3 01 Apr 2021 02 Apr 2021

Find all guests with bookings

Similarly to the previous belongs_to exercise, we can inner join guests with bookings:

Guest.joins(:bookings)

Syntax notes: Contrarily to what happened in the belongs_to exercise, the :bookings symbol passed to joins is now plural, since a guest can have many bookings.

Produced SQL:

SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"

The joined table

guests.id guests.name bookings.id bookings.guest_id bookings.check_in bookings.check_out
2 'Jane Williams' 3 2 22 May 2021 28 May 2021
3 'Dory Timothy' 1 3 20 Jun 2021 23 Jun 2021
3 'Dory Timothy' 4 3 01 Apr 2021 02 Apr 2021
5 'Willie Connelly' 2 5 07 Aug 2021 08 Aug 2021

The returned table

Id Name
2 'Jane Williams'
3 'Dory Timothy'
3 'Dory Timothy'
5 'Willie Connelly'

Note that John Smith is not included in this list since this guest has no bookings. Remember that inner joins only returns matching entries. If there is no match on the joined table, then that entry will be omitted.

Note also that 'Dory Timothy' is showing up twice. That's because there are two bookings requested by this guest. This outcome is very common when joining a has_many association. The return table will return the same guest n times its bookings. If Jane Williams had 10 bookings, the returned table would include 10 Jane Williams entries.

If we'd like to return only unique guests, we can use distinct:

Guest.joins(:bookings).distinct

Produced SQL:

SELECT DISTINCT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"

The returned table is now

Id Name
2 'Jane Williams'
3 'Dory Timothy'
5 'Willie Connelly'

Inner Joins in has_many through associations

Find all the guests with pending payments

Let's bring back the Payment model to our domain, this time with a pending or confirmed status.

class Guest < ApplicationRecord
  has_many :bookings
  has_many :payments, through: :bookings
end
class Booking < ApplicationRecord
  has_many :payments
end
class Payment < ApplicationRecord
  belongs_to :booking

  emun: %w[pending confirmed]
end
Guest.all
Id Name
1 'John Smith'
2 'Jane Williams'
3 'Dory Timothy'
4 'Yong Bergman'
5 'Willie Connelly'
Booking.all
Id Guest Id Check In Check Out
1 3 20 Jun 2021 23 Jun 2021
2 5 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 3 01 Apr 2021 02 Apr 2021
Payment.all
Id Amount Booking Id Status
1 450 1 'paid'
2 100 2 'pending'
3 770 3 'pending'
4 810 4 'paid'

To find the guests with pending payments, we need to go through the chain of associations: Guest -> Booking -> Payment. This is called a nested join.

Luckily, the ActiveRecord joins method allows us to pass nested associations:

Guest.joins(bookings: :payment)

So what we're saying is: 'Return all guests that have bookings with payments'. This produces two inner joins:

SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests"."id" INNER JOIN "payments" ON "payments"."booking_id" = "bookings"."id"

And results in the following joined table:

guests.id guests.name bookings.id bookings.guest_id bookings.check_in bookings.check_out payments.id payments.amount payments.booking_id payments.status
2 'Jane Williams' 3 2 22 May 2021 28 May 2021 3 770 3 'pending'
3 'Dory Timothy' 1 3 20 Jun 2021 23 Jun 2021 1 450 1 'paid'
3 'Dory Timothy' 4 3 01 Apr 2021 02 Apr 2021 4 810 4 'paid'
5 'Willie Connelly' 2 5 07 Aug 2021 08 Aug 2021 2 100 2 'pending'

The payments data is now available to query so next, we filter by payment status using the where method:

Guest.joins(bookings: :payment).where(payments: { status: 'pending' })

And the final returned outcome:

Guests

Id Name
2 'Jane Williams'
5 'Willie Connelly'

Customizing joins with SQL strings

Instead of passing symbols that tell ActiveRecord which association to join (and that will be later translated to SQL), the joins method also accepts a SQL string directly as an argument. This has the advantage of providing extra flexibility in building more complex queries.

ActiveRecord joins in polymorphic associations

Adding a new polymorphic model Accommocation to our domain:

class Accommodation < ApplicationRecord
  has_many :bookings, as: :bookable
end
class Booking < ApplicationRecord
  belongs_to :bookable, polymorphic: true
end

In polymorphic associations if you try joins using symbols, you'll get an error:

Booking.joins(:bookable)

ActiveRecord::EagerLoadPolymorphicError (Cannot eagerly load the polymorphic association :bookable)

That is because we need to tell joins which bookable type we want to use since there can be many.

One of the ways we could solve this problem would be to pass an SQL string:

Bookings.joins("INNER JOIN accommodations ON accommodations.id = bookings.bookable_id AND bookings.bookable_type = 'Accommodation'")

Until rails 5 was released, if you wanted to run a left outer join, you'd also have to pass a custom SQL to the joins method. Since then, rails supports left outer joins through the ActiveRecord methods left_joins and its alias left_outer_joins.

In part II of this series - Understanding ActiveRecord joins - I'll go through similar exercises to explain when and how we can use ActiveRecord's left_joins.

Subscribe below to get future blog posts

No spam, no ads. Unsubscribe any time.