HABTM, has_many through, STI, and polymorphic associations in Rails

#rails #activerecord #associations #database

Once you're past belongs_to and has_many, Rails gives you four more association tools that confuse a lot of people in interviews: has_and_belongs_to_many, has_many :through, single table inheritance, and polymorphic associations. They sound advanced, but each one solves a specific, concrete problem. The trick is knowing which problem.

I'll go through all four with real tables and real code, then end on the one comparison interviewers love to ask: why pick has_many :through over HABTM.

has_and_belongs_to_many (HABTM)

This is the simplest way to model a many-to-many relationship. Say a user can be on many projects, and a project can have many users. You set it up on both sides:

class User < ApplicationRecord
  has_and_belongs_to_many :projects
end

class Project < ApplicationRecord
  has_and_belongs_to_many :users
end

Behind the scenes there's a join table that holds nothing but the two foreign keys. By convention Rails wants it named after both tables in alphabetical order:

create_table :projects_users, id: false do |t|
  t.belongs_to :project
  t.belongs_to :user
end

Notice id: false. The join table has no primary key and, more importantly, no model. You never write class ProjectsUser. That's the whole point of HABTM: it's lightweight. You get user.projects and project.users and that's it.

It's also where HABTM runs out of room. There's no model, so there's nowhere to put a role, a joined_at timestamp, a validation, or a callback. The join is just a link and stays a link.

has_many :through

This is the grown-up version of the same many-to-many. Instead of a bare join table, you create a real model for the relationship and route through it:

class User < ApplicationRecord
  has_many :memberships
  has_many :projects, through: :memberships
end

class Project < ApplicationRecord
  has_many :memberships
  has_many :users, through: :memberships
end

class Membership < ApplicationRecord
  belongs_to :user
  belongs_to :project
end

The migration looks like a normal table because it is one:

create_table :memberships do |t|
  t.belongs_to :user, null: false, foreign_key: true
  t.belongs_to :project, null: false, foreign_key: true
  t.string :role, default: "member"
  t.datetime :joined_at
  t.timestamps
end

Now the relationship can carry data. A membership has a role, a joined date, timestamps, validations, whatever you need. You can ask user.memberships.where(role: "admin") or add validates :role, inclusion: { in: %w[member admin owner] } on the Membership model. None of that is possible with a plain HABTM join.

You still get the convenient shortcut: user.projects skips straight past memberships to the projects on the other side. You just also have the join model available when you need it.

Single table inheritance (STI)

STI is for when several models are variations on a theme and share most of their columns. Instead of one table per model, they all live in one table, and Rails uses a type column to remember which class each row belongs to.

class User < ApplicationRecord
end

class Admin < User
end

class Customer < User
end

There's no separate admins or customers table. Everything sits in users, and you just need that type column:

add_column :users, :type, :string

When you call Admin.create(email: "a@example.com"), Rails writes a row to users with type set to "Admin". When you later run Admin.all, it automatically adds WHERE type = 'Admin' for you. User.all still returns everyone, and each record comes back as the right subclass, so an admin row gives you back an Admin instance with all its methods.

STI is great when the subclasses are genuinely alike and differ mostly in behavior. It gets ugly fast when they don't. If Admin needs five columns that Customer never uses, every customer row carries five NULLs forever, and the table turns into a junk drawer. The rule I use: STI when subclasses share almost all their columns, separate tables when they diverge.

Polymorphic associations

A polymorphic association lets one model belong to more than one kind of parent, without a separate foreign key for each one. The classic example is comments. You want to comment on a post, and also on a photo, and maybe later on a video. You don't want post_id, photo_id, and video_id columns sitting mostly empty on the comments table.

Instead you store two columns: one for the parent's id, and one for the parent's type.

class Comment < ApplicationRecord
  belongs_to :commentable, polymorphic: true
end

class Post < ApplicationRecord
  has_many :comments, as: :commentable
end

class Photo < ApplicationRecord
  has_many :comments, as: :commentable
end

The migration uses references with polymorphic: true, which creates both columns at once:

create_table :comments do |t|
  t.text :body
  t.references :commentable, polymorphic: true, null: false
  t.timestamps
end

That gives you commentable_type (a string like "Post" or "Photo") and commentable_id. When you save a comment on a post, Rails stores "Post" and the post's id. When you read comment.commentable, it uses the type to know which table to look in. You can comment on any model that declares has_many :comments, as: :commentable, and adding a new commentable type later means zero schema changes.

One thing to watch: a polymorphic column can't have a normal database foreign key constraint, because the database doesn't know which table the id points to. You give up that bit of referential integrity in exchange for the flexibility.

So when do you pick has_many :through over HABTM?

This is the question that comes up most, and the answer is short: the moment the relationship itself needs to hold anything.

HABTM gives you a link and nothing else. The second you need a role on the membership, or a timestamp for when someone joined, or a validation that stops duplicates, or a callback, or even just the ability to query the join directly, HABTM has no place to put it. There's no model. You'd have to migrate to has_many :through anyway, and migrating later is more painful than starting there.

So in practice I almost always reach for has_many :through. HABTM is fine for a pure tag-style link that you're certain will never grow attributes, like connecting articles to categories where the connection is just a connection. But "never" is a strong word, and most join tables sprout a column eventually. Starting with a join model costs you one extra file today and saves you a migration and a refactor down the line.

If you can only remember one line for the interview: use HABTM when the relationship is just a link, and has_many :through when the relationship is a thing in its own right.