If you have 30 seconds. "Don't duplicate data" is a rule that fails in production: caches are legitimate, and you ship them anyway. The rule that actually holds is finer-grained — every stored value that resembles a duplication is either Live, Snapshot, or Cache, and each category has a distinct implementation. Skip the classification and you ship the same kind of bug I just found multiplied by 560 rows: a column quietly going stale because nobody decided whether it was supposed to track its source or not. Useful if you have backend tables you stopped trusting.
A column nobody updated
Saturday April 18th, 2026, mid-morning. I'm reading the dashboard with Hélène, who has kept the school's books for nineteen years. Her notebook is open, mine is the screen. She points at one row — a ceramics student, third year, a perfectly ordinary case — and frowns. Rembrandt says she still owes us €1,159. Hélène's payment schedule says €2,262. We cross-check on paper. Eleven monthly installments at €205.65. Hélène is right.
The column my dashboard is reading is contacts.montant_total. It was populated at import in March from a single sum at a single point in time, and nothing in the codebase updates it. Three weeks later, an installment is added — Hélène recorded it manually in the new echeances table — and the column doesn't move. The dashboard reads what it was given in March, which was correct in March.
I run the diff. 488 contacts have NULL in montant_total and 1,789 entries in echeances_inscription. 72 contacts diverge by more than €1. The cumulative gap, signed, is around €18,000 of phantom non-debt. « Et combien d'autres comme ça » — And how many others — Hélène asks, calmer than I am, because this is exactly what she expects from software that decides things on its own.
The bug isn't in the SUM, in the migration script, or in the cron. The bug is that nobody ever decided what contacts.montant_total was supposed to be — a snapshot of a moment, a cache of a sum, or a value that should always reflect the current schedule. The column was inserted because it was needed then. The mechanism that would have kept it true now was never written, because the question wasn't asked.
This is the article about the question.
Why "don't duplicate" doesn't work
The naive rule any engineer learns is don't duplicate data. In practice, every non-trivial application duplicates data. Materialized views are duplications. contacts.dernier_contact_at is a duplication. cours.places_prises is a duplication. The naive rule, applied literally, would forbid all of them — including the ones you absolutely want.
The rule that actually holds is more precise: every stored value that resembles a duplication must be classified, before being created or kept, as Live, Snapshot, or Cache. Each category has a distinct implementation contract. A duplication without a category is a bug in waiting.
This is the rule I now check on every migration review. It's worth describing.
The three categories
Live
The value must always reflect the current state of the system. There is no business reason to freeze it at a particular moment.
Implementation: don't store it. Read it directly from a SQL view (v_*) or a query on the source tables.
In Rembrandt: v_reste_du_contact.montant_prevu_total — the dynamic sum of a contact's installments. v_contact_cours_actifs — a contact's active courses at this moment. The active enrollment count for a course.
The anti-pattern is storing a copy "to avoid the join." Divergence is guaranteed without a refresher, and the refresher never gets written.
Snapshot
The value must remain frozen at the moment of a business event. Modifying it retroactively is a functional fault.
Implementation: store it, never recalculate. Protection in writing after creation — a CHECK constraint, a trigger that forbids UPDATE, or simple documented discipline.
In Rembrandt: inscriptions.tarif_applique — the price at the date of enrollment, which doesn't move if the course is repriced later. echeances_inscription.montant_prevu — a contractual commitment, immutable. factures.numero and factures.tva_taux — accounting freeze. contrats_formation.montant_ht — a dated commitment with a public training fund.
The anti-pattern is "recalculating retroactively for consistency." That steals history. If a price revision must apply, it applies through a new event — credit note plus new invoice — not by editing the existing snapshot.
Cache
The value is derivable from other data but expensive to compute on every read. You accept storing it for performance, on the condition that you declare an explicit refresher in the same commit as the column.
Implementation: store it plus declare the mechanism. Three mechanisms are admitted: GENERATED ALWAYS AS (...) for intra-row derivation, a SQL trigger named trg_* on the feeding tables, or a materialized view mv_* with a scheduled or post-bulk REFRESH.
No cache survives without its named refresher. If you can't guarantee refreshment, you fall back to Live.
In Rembrandt: cours.places_prises — cache refreshed by trigger on inscriptions. mv_pnl_mensuel — materialized cache, REFRESH after budget_values updates.
The three-question decision
In front of every new field that resembles a duplication, three questions in order:
- Must the value evolve with the upstream data? If no — it's a frozen past event → Snapshot. Store, document "snapshot at creation."
-
Is computing it on the fly acceptable performance-wise? If yes → Live. Don't store. Create a
v_*view. -
Which refresh mechanism for this Cache? Intra-row derivation →
GENERATED ALWAYS AS. Frequently updated source table → synchronoustrg_*trigger, in the same commit. Heavy derivation, intensive read, hourly or daily freshness tolerance → materialized viewmv_*with scheduled REFRESH.
If none of the three Cache implementations is tenable, the value should not be stored. Fall back to Live and accept the compute cost, or question the business need.
What I found in my own schema
I ran the audit a week after the Hélène incident. Fourteen divergences, six categories. Three of them stand out as archetypes:
-
contacts.montant_total— a Live in disguise treated as a Snapshot, by negligence. The fix is not "recalculate periodically" — that re-establishes divergence at the next installment created. The fix is to delete the column and route reads throughv_reste_du_contact. Migration of category, not patch. -
cours.places_prises— a Cache correctly classified, but with two competing triggers. Coherent by accident — alphabetical execution order. A latent bomb, defused last week by collapsing to a single trigger. -
contacts.dernier_contact_at— a Cache without a refresher. Imported once from HubSpot, never updated since. GDPR decisions skewed by a value that hasn't moved in eighteen months.
In each case, the issue isn't the storage. It's that the storage was decided without declaring the contract that would have kept it true.
A small naming discipline
Every Cache column carries a SQL comment at migration time:
COMMENT ON COLUMN cours.places_prises IS 'CACHE: refreshed by trg_inscriptions_sync_places';
Without this comment, the next reader of the schema cannot distinguish a managed Cache from a Live that diverged silently. Two prefixes do the rest: v_* for views (Live), mv_* for materialized views (Cache at the DB level), trg_* for triggers (typically Cache refreshers).
When a migration review surfaces a column that doesn't fit any of those, you stop. You ask. You classify, or you remove.
What you can copy into your project
Three concrete elements you can apply tomorrow:
- Run a categorization audit on the columns you stopped trusting. Pick the five most-suspect-feeling columns, classify each as Live, Snapshot, or Cache. Any column you cannot place is a bug already shipped.
- For Cache columns, write the refresher in the same commit as the column. Trigger, generated column, or scheduled REFRESH — but the refresher exists at creation time, not "later." Later doesn't come.
- Refuse the patch that re-establishes the bug. When a Live-disguised-as-Snapshot diverges, the fix isn't "recalculate ponctually." That postpones the next incident by exactly one upstream event. The fix is migration of category — drop the column, route reads through the view.
The rule reads in one line: every stored value that resembles a duplication is Live, Snapshot, or Cache, with a contract declared in the same commit. It's the line I would tape above a colleague's desk.
Coda
Hélène closed her notebook at the end of that morning. She didn't say « je vous l'avais dit » — I told you so — because that's not her style; she said « vous voyez bien que ça ne suffit pas » — you see clearly that it isn't enough — which is more devastating because it points at what is missing rather than at what was wrong. She was right twice. The software was wrong, and the discipline that would have kept it right hadn't been written down. Three days later, the rule was a file in the repo. It hasn't caught a regression yet, because it operates at the gate — it stops the regression from being shipped. The day a junior — me, sometimes — proposes a column with no contract, the review brings it back. That's the kind of victory you can't measure, only assume.
Companion code: rembrandt-samples/live-snapshot-cache/ — the 3-question decision checklist plus four SQL patterns (Live view, Snapshot protection, Cache trigger, category migration), MIT, copy-pastable.

Top comments (0)