DEV Community

Dane Wu
Dane Wu

Posted on

You Wanted a Number, but Loaded 500,000 Rows Into Memory

Rails Performance: Lessons from Production — #3

The first two posts were about making queries cheaper — fewer N+1s, indexes, not dragging data back into Ruby. This one turns that into a more practical habit: let the database do what the database can do. A .count in the wrong place, a misused .present?, and you can pull an entire table into memory just to get a single number. Same example throughout (a shipments table).


💥 A report page that ate the server's memory

We had an admin report showing, per courier, the number of delivered shipments. It was fine — until someone tweaked the filtering logic. After that, opening the page spiked memory, and under traffic it OOM-killed the server.

The culprit was this line:

courier.shipments.select { |s| s.status == "delivered" }.size
Enter fullscreen mode Exit fullscreen mode

Looks reasonable — "it's just select." But this .select { ... } (with a block) is Ruby's Array method, not ActiveRecord's select(:col). It loads all of that courier's shipments into memory first, then filters them one by one in Ruby. Fine on a small table; on a big one it eats all the RAM.

One word, select, two completely different meanings — one builds SQL, the other loads everything and filters in Ruby. It's one of the easiest traps to fall into and the hardest to spot. The fix is to let the DB filter and count:

courier.shipments.where(status: "delivered").count
# → SELECT COUNT(*) FROM shipments WHERE courier_id = ? AND status = 'delivered'
Enter fullscreen mode Exit fullscreen mode

The DB filters, counts, and returns a single number. The memory problem vanishes.

Behind this is a recurring pattern: for a single number, boolean, or sum, you drag a whole batch of rows back into Ruby. This post walks through its common variants.

Continuing the four layers from series #1: moving data from ③ the DB back to ④ memory has a cost. The whole theme is — don't move it if you don't have to, and if you must, move less.


🔢 Counting: count vs size vs length

All three "count things," but they work differently — pick wrong and you hit the trap above.

count — always sends SELECT COUNT(*) to the DB and returns a number.

Shipment.where(status: "delivered").count
# → SELECT COUNT(*) FROM shipments WHERE status = 'delivered'   always queries the DB
Enter fullscreen mode Exit fullscreen mode

Downside: even if the records are already loaded, it still queries the DB again.

length — can only count an in-memory array, so it force-loads everything then counts.

shipments = Shipment.where(status: "delivered")
shipments.length
# → SELECT * FROM shipments WHERE status = 'delivered'   pulls the whole batch into memory, then counts the array
Enter fullscreen mode Exit fullscreen mode

Downside: if not loaded, it drags the whole batch into memory (the opening trap).

size — picks based on state: not loaded → acts like count; already loaded → acts like length. Always the cheapest option.

shipments = Shipment.where(status: "delivered")
shipments.size      # not loaded → runs count, SELECT COUNT(*), no rows moved
shipments.to_a      # (this loads the data into memory)
shipments.size      # loaded → counts the in-memory array, no DB query
Enter fullscreen mode Exit fullscreen mode
Behavior Risk
count always queries the DB re-queries even when loaded
length always uses memory drags the whole batch in when not loaded
size not loaded → DB, loaded → memory almost none, safest

How to choose: only want a number, won't use the rows → count; data already (or about to be) loaded and you count while using it → size; length almost never.

This also explains why series #1 used c.shipments.size in the view, not .count — shipments were preloaded into memory, so size counts memory; .count would re-query the DB per courier, bringing the N+1 right back.


❓ Existence: exists? vs present?

You want to know whether a courier has any delivered shipments — just true / false.

present? / any? — loads the matching rows into memory first, then checks if the array is empty:

courier.shipments.where(status: "delivered").present?
# → SELECT * FROM shipments WHERE courier_id = 1 AND status = 'delivered'
#   (loads all matches, just to decide "any?")
Enter fullscreen mode Exit fullscreen mode

exists? — asks the DB and stops at the first hit:

courier.shipments.where(status: "delivered").exists?
# → SELECT 1 FROM shipments WHERE courier_id = 1 AND status = 'delivered' LIMIT 1
Enter fullscreen mode Exit fullscreen mode

LIMIT 1 is the key: the DB returns "yes" on the first match — no full scan, no rows moved.

Exception: if you'll use those rows right after the check, load them first with .to_a and use present? — the load isn't wasted:

shipments = courier.shipments.where(status: "delivered").to_a
if shipments.present?   # ✅ present? is right here, the data is already needed
  shipments.each { ... }
end
Enter fullscreen mode Exit fullscreen mode

In one line: just checking existence, won't use the rows → exists?; using the rows right after → load first, then present?.


➕ Aggregation: sum, average, group — hand them to the DB

Compute "total weight of all delivered shipments."

❌ Pull rows back and loop in Ruby:

Shipment.where(status: "delivered").sum { |s| s.weight }   # block = Ruby method
Enter fullscreen mode Exit fullscreen mode

sum with a block is Ruby's Array method — it loads 500k objects into memory, then loops and adds them one by one.

✅ Let the DB sum:

Shipment.where(status: "delivered").sum(:weight)   # column name = ActiveRecord method
# → SELECT SUM(weight) FROM shipments WHERE status = 'delivered'
Enter fullscreen mode Exit fullscreen mode

The difference is the argument: sum(:weight) passes a column symbol → builds SQL for the DB ✅; sum { |s| s.weight } passes a block → loads then loops in Ruby ❌.

Other aggregates work the same — ActiveRecord wraps them:

Shipment.average(:weight)   # AVG
Shipment.maximum(:weight)   # MAX
Shipment.minimum(:weight)   # MIN
Enter fullscreen mode Exit fullscreen mode

Grouped counts (how many each courier shipped) also go to the DB:

# ❌ pull everything, group in Ruby
Shipment.all.group_by(&:courier_id).transform_values(&:count)

# ✅ DB groups & counts, returns a tiny result
Shipment.group(:courier_id).count
# → SELECT courier_id, COUNT(*) FROM shipments GROUP BY courier_id
# → { 1 => 1200, 2 => 800, ... }
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: check whether sum / count is given a column name or a block — column name goes to the DB (right), block runs in Ruby (usually wrong).

In one line: sum, average, count, group — operations that "collapse a pile of rows into one result" are exactly what the DB does, right next to the data. Pulling rows back to compute = hauling a truckload home just to read one number off it.


📦 When you really must: find_each loads in batches

Sometimes you genuinely have to pull every row back into Ruby (sending emails, generating reports, exporting). For that unavoidable case, the point shifts to: don't load all at once — load in batches.

.each — loads everything into memory at once:

Shipment.where(status: "delivered").each do |s|
  SendMailJob.perform_later(s)
end
# → SELECT * ... pulls all 500k rows; every object in RAM at once → possible OOM
Enter fullscreen mode Exit fullscreen mode

find_each — fetches batch by batch under the hood (1000 by default), processing one batch before loading the next:

Shipment.where(status: "delivered").find_each do |s|
  SendMailJob.perform_later(s)
end
# → SELECT * ... ORDER BY id LIMIT 1000
#   SELECT * ... AND id > 1000 ORDER BY id LIMIT 1000
#   ...uses the last id of the previous batch as the next starting point
Enter fullscreen mode Exit fullscreen mode

At any moment only 1000 rows are in memory — 500k won't blow up. Batch size is tunable: find_each(batch_size: 500).

For operating on a whole batch (bulk updates), use in_batches:

Shipment.where(status: "pending").in_batches do |batch|
  batch.update_all(status: "processing")   # update a whole batch at once, not row by row
end
Enter fullscreen mode Exit fullscreen mode

🏁 Wrap-up

You want ❌ pull into Ruby ✅ keep in the DB / batch
a count .to_a.count .count / .size
existence .present? .exists?
sum / avg / group pull back and loop sum(:column), group.count
process many rows .each (load all) find_each (batch)

One principle runs through all of it:

Let the database do what the database can do; when you must pull into Ruby, do it in batches, not all at once.

This is the same thing as series #1 — moving data into memory has a cost. That opening OOM wasn't caused by "too much data," but by "moving too much data back for a single number." Before you write count, sum, or present?, ask: do I want a result, or do I actually need this whole batch of rows? It's usually the former — so let the DB hand you the result.

Top comments (0)