Understanding multiple joins in Active Record

rails

activerecord

sql

Mar, 2021

Understanding multiple joins in Active Record

Part III of the series - Understanding ActiveRecord joins - How can we filter an ActiveRecord collection based on multiple associations?

In part I and part II of this series, we used mostly single joins and left_joins to filter an ActiveRecord collection based on a direct association.

How can we filter a collection based on multiple associations?

Let's start by looking at our domain models: Host, Accommodation, Booking, Payment, Rating, and Tag.

class Host < ApplicationRecord
  has_many :accommodations
end
class Accommodation < ApplicationRecord
  belongs_to :host
  has_many :bookings
  has_many :tags
end
class Booking < ApplicationRecord
  belongs_to :accommodation
  has_many :ratings
end
class Payment < ApplicationRecord
  belongs_to :booking
end
class Rating < ApplicationRecord
  belongs_to :booking
end
class Tag < ApplicationRecord
  belongs_to :accommodation
end
Host.all
Id Name
100 "Bennie Lubowitz"
101 "Soon Goyette"
102 "Aimee Douglas"
Accommodation.all
Id Name Host Id
121 "Charming House" 100
122 "Lisbon Flat" 101
123 "Le Petit Chalet" 102
124 "The Farm House" 102
125 "Beach House" 100
Booking.all
Id Accommodation Id Check In Check Out
62 121 20 Jun 2020 23 Jun 2020
63 122 07 Aug 2021 08 Aug 2021
64 123 22 May 2021 28 May 2021
65 124 01 Apr 2021 02 Apr 2021
Payment.all
Id Amount Booking Id
50 450 62
51 100 63
52 770 64
52 150 65
Rating.all
Id Rate Booking Id
25 4 62
26 3 64
27 5 63
Tag.all
Id Name Accommodations Id
10 "Pool 121
11 "Countryside" 121
12 "Riverview" 122

Multiple associations

Get all the bookings that have at least one payment and one rating

Looking at our models, we see that both payments and ratings are directly associated with bookings. The ActiveRecord method joins accepts multiple direct associations, separated by a comma. So in our case, that will be:

Booking.joins(:payments, :ratings)

This will produce the following SQL:

SELECT "bookings".* FROM "bookings" 
INNER JOIN "payments" ON "payments"."booking_id" = "bookings"."id" 
INNER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"

And return the following bookings:

Id Accommodation Id Check In Check Out
62 121 20 Jun 2020 23 Jun 2020
63 122 07 Aug 2021 08 Aug 2021
64 123 22 May 2021 28 May 2021

Note that although there is a payment for booking id 65, there is no rating for this same booking, so it will not be included in the returned collection. If we'd have no ratings at all, no bookings would be returned. This means that combining inner joins is cumulative. Using our example, we will only return bookings that have payments AND ratings.

It could be the case though, that you need to return bookings with payments that might or might not have an associated rating. The 'might or might not have' condition calls for a left_joins:

Booking.joins(:payments).left_joins(:ratings)

This will produce the following SQL:

SELECT "bookings".* FROM "bookings" 
INNER JOIN "payments" ON "payments"."booking_id" = "bookings"."id" 
LEFT OUTER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"

And return the following bookings:

Id Accommodation Id Check In Check Out
62 121 20 Jun 2020 23 Jun 2020
63 122 07 Aug 2021 08 Aug 2021
64 123 22 May 2021 28 May 2021
65 124 01 Apr 2021 02 Apr 2021

Now, all bookings with payments are included, even the booking id 65. This will return exactly the same ActiveRecord objects that Bookings.joins(:payments) returns. The difference is that you will have ratings data available to query.

For instance, we can now do:

Booking.joins(:payments).left_joins(:ratings).where(ratings: { id: nil  })

Which is the same as saying: get all the bookings that have payments but no ratings.

Nested joins

Get all hosts that have bookings for their accommodations

The premise here has a slight, but important, nuance from the first exercise. Before, both joined tables - payments and ratings - had a direct relationship with bookings.

But now we're dealing with nested associations - accommodations are directly associated with hosts but bookings are directly associated with accommodations, not to hosts.

So instead of passing two associations separated by a comma, we can pass the nested associations as a key-value pair:

Host.joins(accommodations: :bookings) 

Notice the two INNER JOINs in the SQL below:

SELECT "hosts".* FROM "hosts" 
INNER JOIN "accommodations" ON "accommodations"."host_id" = "hosts"."id" 
INNER JOIN "bookings" ON "bookings"."accommodation_id" = "accommodations"."id"

We're first joining accommodations to hosts and then joining bookings to accommodations.

Let's have a look at the returned collection now:

Id Name
100 "Bennie Lubowitz"
101 "Soon Goyette"
102 "Aimee Douglas"

Although not all accommodations have bookings (the "Beach House" doesn't), all hosts have at least one accommodation that has bookings.

Multi-level Nested joins

Get all hosts that have accommodations, where those accommodations have bookings with ratings, and where those accommodations also have tags

This case is also a bit different from the previously nested joins we analyzed. This time we will have two subsets of nested joins happening, both in relation to accommodations.

We will need to travel from accommodations to ratings, through bookings, and also from accommodations to tags. In ActiveRecord, that would be:

Host.joins(accommodations: [{ bookings: :ratings }], :tags) 

Let's look at the SQL behind it to understand what's going on:

SELECT "hosts".* FROM "hosts" 
INNER JOIN "accommodations" ON "accommodations"."host_id" = "hosts"."id" 
INNER JOIN "bookings" ON "bookings"."accommodation_id" = "accommodations"."id"
INNER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"
INNER JOIN "tags" ON "tags"."accomodation_id" = "accommodations"."id"
  • There's a first nesting that starts with accommodations, gets all their matching bookings, and then gets all the ratings that match the joined bookings.

  • Then there's a second nesting that gets the accommodations and their matching tags.

Run these two nested joins together and we get the two hosts that meet all these requirements:

Id Name
100 "Bennie Lubowitz"
101 "Soon Goyette"

A few personal remarks and tips

Multiple joins can get complex very quickly. There a few tips that can help you make sense of what's going on at the SQL level:

Make use of the .to_sql method

ActiveRecord can make simple queries easier to read but that's not so true for complex queries. If you're not sure what is the SQL your ActiveRecord query is building, you can call .to_sql to get the returned SQL statement, similar to the SQL snippets I've added throughout these exercises.

Remember you can pass SQL strings to ActiveRecord

To avoid being lost in the ActiveRecord -> SQL translations, I usually pass SQL strings to the ActiveRecord query methods instead. I find it more explicit and flexible in complex queries.

Multiple joins can make your queries slow

My last advice is to always be aware of any potential performance issues when you're working with multiple joins. Joining tables can be a heavy job for both your database and ActiveRecord.

__

So far we've been using joins to filter data. In Part IV, the last one of this series, we'll see how to access the joined data through the returned ActiveRecord objects.