You've Tuned the Queries and It's Still Slow — Now Change the Data Model

#rails #performance #database #postgres

Rails Performance: Lessons from Production — #9

The first eight posts optimized without touching the schema — fixing N+1, adding indexes, caching, moving work to the background. But sometimes you tune the query to its limit and it's still slow — because the root cause is the schema design itself: you normalized the data so cleanly that every query has to JOIN a pile of tables and recompute a derived value. This post takes a different axis: changing the data model. The core idea is the denormalization trade-off. Same example throughout (Courier has_many :shipments).

💥 Indexes are added, N+1 is fixed, the couriers page is still slow

A couriers list page. Each row shows "how many shipments this courier has" + "the status of the latest one." You did everything right:

@couriers = Courier.includes(:shipments).limit(50)
# in the view: courier.shipments.count, courier.shipments.last.status

N+1 is fixed (includes), shipments.courier_id is indexed. But the page is still slow — the APM shows every row running a COUNT and finding "the latest one."

The thing is: "shipment count" and "latest status" are values that, every page load, for every courier, get recomputed from millions of shipment rows. This isn't a query-style problem — your data model is forcing you to recompute derived values every time.

Query tricks have hit their ceiling. The next step is to change the data model: turn "compute every time" into "compute once and store it." That's denormalization.

🧮 Move 1: counter_cache — stop COUNTing the child table every time

The most common, built-in denormalization. Store "the parent's count" on the model:

class Shipment < ApplicationRecord
  belongs_to :courier, counter_cache: true   # adds a shipments_count column to couriers
end

Rails automatically bumps couriers.shipments_count ±1 on every shipment create/destroy. Reading it:

courier.shipments_count   # reads a column, zero COUNT queries

"Each courier runs its own SELECT COUNT(*)" becomes "read a ready-made column."

There's one cost you must know about: counter_cache only updates on ActiveRecord create / destroy. Use update_all, delete_all, import, or raw SQL and it won't move — the number drifts. When it drifts, fix it with Courier.reset_counters(courier.id, :shipments) (pass that courier's id).

Concurrency, though, you don't have to worry about — it's safe. counter_cache doesn't "read it out, +1, write it back" — it lets the DB do the increment:

UPDATE couriers SET shipments_count = shipments_count + 1 WHERE id = 7

Why it matters: with "read then add" (count = count + 1 computed in Ruby), two requests each creating a shipment at the same time could both read 100, both compute 101, both write 101 → one increment lost. But SET count = count + 1 is a single atomic operation in the DB; two concurrent ones get serialized (100 → 101 → 102) and nothing is miscounted.

📦 Move 2: store the "computed value" redundantly

counter_cache is a special case. The general form of denormalization is: any "frequently needed and expensive" derived value — precompute it and store it on the parent.

"Latest shipment status" is expensive to fetch every time (shipments.order(:created_at).last.status) → store a copy on the courier:

# couriers gets a latest_shipment_status column
class Shipment < ApplicationRecord
  belongs_to :courier

  # after a new shipment is saved, sync its status onto the courier's redundant column
  after_create_commit do
    courier.update_column(:latest_shipment_status, status)
  end
end

Breaking that callback down:

after_create_commit: an ActiveRecord callback that fires only after a new record is created and the transaction commits (the _commit variant guarantees it's really persisted, not mid-transaction).
courier.update_column(:latest_shipment_status, status): write this shipment's status into the courier's latest_shipment_status column (the redundant copy). update_column issues a single SQL UPDATE on one column, skipping validations/callbacks and not touching updated_at — right for "just sync a redundant value."

In plain terms: every time a new shipment is saved, copy its status onto the courier's "latest status." Reading it is just courier.latest_shipment_status — one column, no JOIN, no sort.

⚠️ But don't rush to denormalize — for WHERE courier_id = ? ORDER BY created_at DESC LIMIT 1, first add a (courier_id, created_at) composite index and each query becomes fast (the index is already sorted; grab one row). If an index solves it, don't touch the data model.

So when is redundancy worth it? ① Even if each query is fast, a list of 50 couriers is still 50 round-trips — too many at high volume/traffic; ② "the latest one per group" is genuinely hard to de-N+1 cleanly — includes loads all shipments into memory to pick from, and doing it in one SQL statement needs a window function / lateral join (not native in Rails, raw SQL). When the index isn't enough, or the query is too awkward to maintain, a single redundant column is the cheapest fix.

This is the key technique for "read-heavy, hot-data" products like stock or e-commerce: things like price change %, volume, total sales — expensive aggregates that must display in real time — are often stored redundantly, or even:

materialized view (Postgres CREATE MATERIALIZED VIEW): "materialize" one expensive aggregate query into a physical table, refreshed on a schedule (REFRESH). Good for "reports / leaderboards that are expensive but can tolerate a few minutes stale" (in Rails, the scenic gem).

⚖️ The core cost of denormalization: consistency

Denormalization isn't free — you stored an extra copy, and that copy can drift from the truth. This is the fundamental trade-off against normalization:

Normalized: data has a single source of truth, always correct, but queries JOIN/recompute (slow reads).
Denormalized: a redundant copy makes reads fast, but you're now responsible for keeping that copy in sync (writes get complex, and it can be inconsistent).

There are three ways to maintain consistency, from synchronous to asynchronous:

callback / touch (synchronous): update the redundant column inline on write (the after_create_commit above is exactly this). Simple, but easy to let get out of hand: callbacks that run queries, touch the parent, or enqueue jobs add hidden cost to every save (buried in the model); worse, they cascade — one callback triggers another save, which triggers its callback… one write fans out into a chain of updates and cache invalidations, and can deadlock under concurrency. This is why the community calls callbacks a code smell.
background job (asynchronous): enqueue a job to update the redundant value after the write. Accept brief inconsistency (eventual consistency) for fast writes — common for hot data. For maintaining redundant data, prefer a job (#6) over a synchronous callback — no write slowdown, no cascade.
periodic reconciliation job (safety net): batch-recompute and fix any drifted values (like counter_cache's reset_counters).

The point of denormalization isn't "store a copy" — it's this trade-off: not "store junk to go fast," but knowingly spending consistency complexity to buy read speed, with a reconciliation backstop in place. Denormalize without thinking that through and you'll eventually get bitten by "why don't these two numbers match?"

🔗 Other ways to change the data model

Beyond denormalization, a few common schema-level moves — reach for them when the situation calls for it:

Polymorphic associations perform poorly: commentable_type + commentable_id can't have a real foreign key and can't JOIN efficiently to a single table. At scale, switch to explicit associations (separate FK columns/tables) to win back integrity and performance.
has_many :through join tables: slow without indexes — the join table needs a composite index on (a_id, b_id) (and the reverse pair if you query both directions).
Data lifecycle on big tables: time-series data (stock candles, logs) explodes in size → partitioning (by time), hot/cold separation, archiving, keeping the main table lean so queries scan only the relevant partition.
Column types: money/prices in decimal, not float (float has rounding errors); the right time type — schema-design fundamentals.

These all change the schema structure; the cost and risk are far higher than rewriting a query — the last layer you reach for, after confirming the earlier moves aren't enough.

🏁 Wrap-up: the whole series as one map

"When should I change the data model?" is answered by a bigger question — how to work through a slow request, one layer at a time. That's the spine of this whole series. Performance optimization is really three axes:

① Make each request do less work (this axis has a light-to-heavy order)

Measure first (APM / EXPLAIN) — find the real bottleneck, don't guess.
Tune queries and indexes first: fix N+1, add composite indexes, don't drag data into Ruby (#1–#3). ~80% of problems end here.
Then cache: don't recompute what you already computed (#4, #5).
Trim the Ruby-layer waste: serialization, allocation, memoization (#7).
Only then change the data model: when the query pattern conflicts with the normalized schema, denormalize judiciously (counter_cache, redundant hot data, materialized views); changing schema structure (splitting tables, partitioning) is more deliberate still (#9, this post).

② Move slow work out of the request — email, third-party APIs, reports shouldn't block a request the user is waiting on; push them to the background (#6).

③ Let the machine serve more requests at once — no matter how fast the code, the wrong Puma / connection-pool / CDN setup still causes a traffic jam (#8).

One principle:

Performance optimization = make each request do less + move slow work out of the request + let the machine serve more at once — and each one starts the same way: measure first, then go from light to heavy.

Back to this post's layer: normalization is the right default; denormalization is a deliberate trade — spending consistency complexity to buy read speed — used only when measurement proves queries, indexes, and caching aren't enough, and always with a reconciliation backstop.

After these nine posts, facing a slow page you hold not just a pile of tricks but a sense of which layer to reach for first — and that judgment tends to decide the outcome more than any single technique.