When belongs_to association in Rails goes weird

dferrazm profile image Daniel Ferraz ・5 min read

ActiveRecord is one of those many things in Rails that can be both good and bad depending on the perspective, the size of the project, how it's used, etc.

It's loved when it performs a really good job on abstracting away the specifics of the DB being used (like any good ORM should do), when it makes really easy to define and bootstrap a fully functional app with the domain classes and the persistence mechanism in place, when it helps keeping the data valid and consistent through validators, when it facilitates the migration and rollback of DB schema changes, etc.

It's hated when it makes really easy to tighly couple the application domain and logic to the persistence layer, when it faciliates the proliferation of God objects since it's so easy to put more and more behavior inside the record models and pass them around everywhere, when your domain classes become a big ball of mud knowing too much about everything and with hooks everyhwere, when it makes your tests slow, etc.

But one thing that it should strive, apart from the pros and cons, is to maintain consistency on its behavior and usage. Recently I noticed a weird behavior with the belongs_to association which is somewhat inconsistent with the other associations and hurts predictabiliy when using this association. In fact, it's such an easy-to-reproduce scenario that I don't know how I didn't notice that before, after years of developing with Rails.

First, when things work the way we expect

Attention: the following examples that I'll use are unlikely scenarios that are not expected to happen on most applications. In case it happens, it's most likely because there is a bug in place. We should anyway be prepared for it.

Suppose you have the following in your Rails application:

create_table :products do |t|
  t.string :title

create_table :reviews do |t|
  t.references :product
  t.text :comment

class Product < ApplicationRercord
  has_many :reviews

class Review < ApplicationRecord
  belongs_to :product

Suppose now that someone just removed the reviews.product_id column but forgot to remove the related has_many :reviews association from Product. The next time we hit product.reviews, what it should happen? If you're expecting an error to be raised, you got it right.

# => ActiveRecord::StatementInvalid ([db_specific_error_class]: no such column: reviews.product_id)

Moreover, suppose that, instead of removing just the FK column from reviews, the whole table is removed along with the model class. In this case, an error is still raised, although different, since the association's target class definition cannot be found.

# => NameError (uninitialized constant Product::Review)

These two behaviors help a lot on tracking issues in the app, pointing remaining pieces of code that were forgotten somehow in the process of removing a feature from the app. The next time these pieces of code are executed, a big error would pop up in the error monitoring tool and would hardly slip under the radar.

Then, things unexpected happen

Let's go back to the first scenario where the reviews.product_id column was removed. Now suppose that the assocation belongs_to :product in Review was not removed (very unlikely, but still). What would then happen the next time the we hit review.product? If you like consistency and guessed an error, you got that wrong, unfortunately.

review.product # => nil

That's right. It just silently returns nil. The same nil that would be returned if the product_id column were still there but empty.

It gets even more interesting in the scenario where the whole table/model products/Product is removed. What happens now with review.product? If you're smart and noticed where this going and guessed nil, you got it right.

Product.take # => NameError (uninitialized constant Product)

review.product # => nil

Super weird. By returning nil, we would never be able to spot these issues early. They would pass by silently and days could go by producing unexpected results until someone notices something is wrong. It also just does not make sense at all. At least for me. If the underlying DB column does not exist, why would the association still be returning nil? We could just easily pollute an entire model with fake associations without any consequences.

class Review < ApplicationRecord
  belongs_to :i_dont
  belongs_to :care
  belongs_to :about
  belongs_to :valid
  belongs_to :associations

%i[i_dont care about valid associations].map do |association|
  review.send association
# => [nil, nil, nil, nil, nil]

In my opinion this goes very much against the robustness of active_record itself and all its ways to keep data consistent and valid, acting as a mirrored version of the DB schema on steroids, to keep wrong and incorrect data out.

Where's the issue then?

Assuming active_record latest version: 6.0.1

Digging deeply into active_record's source code, there is this method find_target? in the BelongsToAssociation:

def find_target?
  !loaded? && foreign_key_present? && klass

Its sole purpose is to check whether the target association needs to be found and loaded given the current conditions. This method is overriden in this class, meaning that the other associations use a more abstract and slightly different version to check if the target association must be loaded.

From the code, the target belongs_to association is only loaded if it's not yet loaded AND if the foreign key is present AND if the association class is defined.

The problem is that foreign_key_present? only checks whether the attribute is present. For sake of simplification, it would be something like @foo_id.present?. It returns false either when the FK column exists but the value is empty or if the FK does not exist at all, since no verification to check whether the column exists is being made. The condition returns eagerly with false and the target association is never tried to be loaded, returning nil in the very end. That's why also it returns nil even when class definitions are missing, since the klass part of the condition is never evaluated and the error of uninitialized constant Foo is never raised.

By simply switching the condition to !loaded? && klass && foreign_key_present?, we partially solve the problem.

I opened a WIP pull request to trigger the discussion and check whether this behavior is intentionally and by design, since after I fixed this issue locally, a bunch of unit tests failed. I realized then that there are a lot of tests relying on dynamic belongs_to associations lacking the corresponding FKs in the test suite.

I couldn't find no other information across the web depicting this issue. Let me know in the comments if you have thoughts or more information with reasons why this is the way it is.


Editor guide