How to access joined data with ActiveRecord

rails

activerecord

sql

Apr, 2021

How to access joined data with ActiveRecord

Part IV of the series - Understanding ActiveRecord joins - How can we access data from joined associations?

So far we've been using joins to support filtering based on related associations. But what if we not only want to filter but also access and use data from those associations?

Let's bring back the domain models we've been using during this series. This time Booking and Rating will be enough to illustrate some examples:

class Booking < ApplicationRecord
  has_many :ratings
end
class Rating < ApplicationRecord
  belongs_to :booking
end
Booking.all
Id Check In Check Out
62 20 Jun 2020 23 Jun 2020
63 07 Aug 2021 08 Aug 2021
64 22 May 2021 28 May 2021
65 01 Apr 2021 02 Apr 2021
Rating.all
Id Rate Comments Booking Id
25 4 'Amazing' 62
26 3 'Coming back soon' 64
27 5 'friendly staff' 63

pluck vs select

Get a list of bookings that have been rated. The list should contain booking ids with their corresponding rating comments.

A simple way to return this list could be to join ratings to bookings and pluck the id values from bookings and the corresponding comments values from ratings. Remember that we can pass an SQL string to explicitly say which <table>.<column> we'd like to pluck:

Booking.joins(:ratings).pluck('bookings.id', 'ratings.comments')

In SQL terms, this means that we're inner joining ratings and selecting only the two columns that we need, id from bookings table and the comments from the ratings table.

SELECT "bookings"."id", "ratings"."comments" 
FROM "bookings" 
INNER JOIN "ratings" 
ON "ratings"."booking_id" = "bookings"."id"

Since we're using pluck, the output is an array of arrays with the id, comments combinations:

[[62, 'amazing'], [63, 'friendly staff'], [64,  'Coming back soon']]

If we'd like to transform this into an index/dictionary structure that is easier to read and consult we can use the .to_h method:

Booking.joins(:ratings).pluck('bookings.id', 'ratings.comments').to_h

There, much better:

{ 62 => 'Amazing',
  63 => 'friendly staff',
  64 => 'Coming back soon' }

So pluck returns the data prepared for us in a ruby array object. If we'd rather return an ActiveRecord collection, we can solve this exercise in a similar way using select instead:

Booking.joins(:ratings).select('bookings.id', 'ratings.comments')

This will translate in exactly the same SQL as before:

SELECT "bookings"."id", "ratings"."comments" 
FROM "bookings" 
INNER JOIN "ratings" 
ON "ratings"."booking_id" = "bookings"."id"

But now, instead of an array we're returning ActiveRecord objects:

 [#<Booking:0x00007f86084a4530 id: 62>,
 #<Booking:0x00007f86084a4418 id: 63>,
 #<Booking:0x00007f86084a4288 id: 64>]

Looking at this output, it looks like we've lost the comments data. But notice what happens when we ask one of these objects what attributes they have:

bookings = Booking.joins(:ratings).select('bookings.id', 'ratings.comments')

bookings.first.attributes

=> {"id"=>62, "comments"=>"Amazing"}

What's going on here? Activerecord's select is basically making all attributes that we've selected into ActiveRecord instance methods.

So if we do:

bookings = Booking.joins(:ratings).select('bookings.id', 'ratings.comments')

bookings.first.comments

=> "Amazing"

All this without triggering any extra queries.

So to get the list that we want, we can now iterate on each booking and print the available attributes:

Booking.joins(:ratings).select('bookings.id', 'ratings.comments').each do |booking|
   puts booking.id
   puts booking.comments
end

Which will print:

62
"Amazing"
63
"friendly staff"
64
"Coming back soon"

Be aware of n + 1 queries

It's important to mention that joins per se does not avoid n + 1 queries. We've been managing to avoid them so far because we're asking SQL to join and select the data we need in a single query.

Watch what happens if you don't use the select method:

Booking.joins(:ratings).each do |booking|
   puts booking.id
   puts booking.ratings.map(&:comments)
end

Note that now comments is no longer a booking instance method. That means that for each booking we'll have to get its ratings and from there add another iteration to get each rating's comments.

  Booking Load (3.0ms)  SELECT "bookings".* FROM "bookings" INNER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"                                                                                      
  🤯 Rating Load (29.8ms)  SELECT "ratings".* FROM "ratings" WHERE "ratings"."booking_id" = $1  [["booking_id", 63]]                                                                                       
  🤯 Rating Load (0.3ms)  SELECT "ratings".* FROM "ratings" WHERE "ratings"."booking_id" = $1  [["booking_id", 64]]                  
  🤯 Rating Load (2.0ms)  SELECT "ratings".* FROM "ratings" WHERE "ratings"."booking_id" = $1  [["booking_id", 65]]                                                                                    

eager_load

If you don't want to select attributes prior to the iteration, you can eager_load the ratings data. What this will not do, however, is to make comments available as a booking instance method. So we'll have to get comments through the ratings:

Booking.joins(:ratings).eager_load(:ratings).each do |booking|
   puts booking.id
   puts booking.ratings.map(&:comments)
end

In SQL this will be:

 SELECT "bookings"."id" AS t0_r0, "bookings"."guest_id" AS t0_r1, "bookings"."accommodation_id" AS t0_r2, "bookings"."check_in" AS t0_r3, "bookings"."check_out" AS t0_r4, "bookings"."status" AS t0_r5, "bookings"."total_guests" AS t0_r6, "bookings"."created_at" AS t0_r7, "bookings"."updated_at" AS t0_r8, "ratings"."id" AS t1_r0, "ratings"."rate" AS t1_r1, "ratings"."comments" AS t1_r2, "ratings"."booking_id" AS t1_r3, "ratings"."created_at" AS t1_r4, "ratings"."updated_at" AS t1_r5 FROM "bookings" INNER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"

Note that now you'll have all booking and all rating attributes available which can make the query heavier in some cases.

You can also use eager_load without the joins which will default into a left outer join:

Booking.eager_load(:ratings).each do |booking|
   puts booking.id
   puts booking.ratings.map(&:comments) if booking.ratings.present?
end

Here's the SQL:

SELECT "bookings"."id" AS t0_r0, "bookings"."guest_id" AS t0_r1, "bookings"."accommodation_id" AS t0_r2, "bookings"."check_in" AS t0_r3, "bookings"."check_out" AS t0_r4, "bookings"."status" AS t0_r5, "bookings"."total_guests" AS t0_r6, "bookings"."created_at" AS t0_r7, "bookings"."updated_at" AS t0_r8, "ratings"."id" AS t1_r0, "ratings"."rate" AS t1_r1, "ratings"."comments" AS t1_r2, "ratings"."booking_id" AS t1_r3, "ratings"."created_at" AS t1_r4, "ratings"."updated_at" AS t1_r5 FROM "bookings" LEFT OUTER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"

Conclusion

There are several ways you can access data from joined associations:

  • You can use pluck if returning a simple ruby object is enough for you
  • If you'd like to get the benefits of returning an ActiveRecord collection use select instead
  • If you want to work with all the model and the joined association(s) attributes, you can use eager_load
  • Don't forget to keep a close eye on performance. While pluck and select can reduce the load time of your queries, combining multiple joins in a single query and making all the attributes accessible with eager_load can easily bloat them. In that case, consider breaking up your queries either using a preload strategy or writing separate queries that support each other.

That's it! Hope this 4 part series on understanding ActiveRecord joins was useful! Feel free to write me if you have any doubts or suggestions. You can also subscribe to my newsletter below to be updated on new posts.

Subscribe below to get future blog posts

No spam, no ads. Unsubscribe any time.