DEV Community: Michel Faure

Forcez Claude Code à vous contredire : 14 règles, install en 1 commande

Michel Faure — Mon, 18 May 2026 10:19:20 +0000

L'enseignement de 60 jours, le ROI en trois axes

Trente-deux jours de production en solo sur un ERP, 118 808 lignes de TypeScript, six versions de doctrine, quatre relecteurs externes intégrés. J'ai compilé ce que j'ai appris en quatorze règles opérationnelles, installables en une commande : le Counterpart Toolkit v0.4.1. C'est à la fois l'enseignement matériel de soixante jours de codage solo avec Claude Code, et la cartographie des quatorze failure modes silencieux que j'ai vu se répéter — pour qui code seul avec une IA en production et n'a plus de PR review pour attraper la dérive.

Le ROI est chiffré sur trois axes, mesuré sur Rembrandt :

R4 *Falsify before fix* — cinq à dix minutes de protocole en amont du fix évitent trente à quatre-vingt-dix minutes de cycle fix-puis-rollback quand la première hypothèse plausible se révèle fausse. ROI 6 à 18× par incident. Sur soixante jours, j'ai cessé de perdre une heure deux à trois fois par semaine sur des fixes qui ne fixaient rien.

R2 *Filesystem over summary* couplée à R6 *Live/Snapshot/Cache* et aux sondes drift quotidiennes — le délai médian apparition→détection d'une divergence silencieuse passe d'invisible à 35,3 jours sur 90 jours glissants. M3 recalibrée publiquement à ≤ 30 jours dans le manifesto, parce que la cible originale (≤ 7 j) était une intuition que la pratique a refusée.

Un sub-agent challenger qui produit des objections au format imposé Tool / Question / Refutation criterion. Du désaccord matériel, pas du « are you sure? » émotionnel qui pousse à réviser sans fait nouveau.

Voici comment, en 1400 mots et une commande d'install.

Le diagnostic — l'incident qui a déclenché la doctrine

Coder seul avec une IA, c'est composer deux complaisances. Celle de l'agent, sycophant par construction parce que le reinforcement learning from human feedback l'a entraîné à plaire au prompteur. Et celle du solo, self-validating par humanité, qui valide son propre travail parce qu'il ne reste plus personne pour le contester. Mises bout à bout, ces deux complaisances produisent un drift que ni l'agent ni l'humain ne signale — et qui ne se voit qu'à l'audit, longtemps après.

C'est en préparant l'audit source unique de fin avril — ADR-0024, un travail de fond sur les divergences que je remettais à plus tard depuis trois mois — que je suis tombé sur l'écart, par hasard, en croisant deux requêtes que personne n'avait jamais croisées avant. Une fiche élève, initiale Y.B. : la colonne contacts.montant_total portait 1 159 € saisis à la main quelque part en 2024, jamais touchés depuis. La somme réelle des échéances, calculée à la volée, en faisait 2 262 €. Mille euros d'écart, sur une seule fiche, sans qu'aucune alarme n'ait jamais sonné. J'élargis le grep : cinq cent soixante contacts dans le même état, parfois à plusieurs milliers d'euros près. Et pourtant montant_total était lue chaque jour dans le dashboard trésorerie — une valeur dérivable qu'on stockait sans rafraîchisseur, traitée comme un fait passé immuable alors qu'elle aurait dû vivre à la volée. C'est exactement le piège que R6 Live/Snapshot/Cache veut empêcher, et R6 est sortie de ce moment-là.

R4 Falsify before fix, la seule règle exposée ici

Le toolkit énonce R4 en cinq étapes textuelles. Le skill falsify-before-fix en est l'invocable instance — la version que Claude Code charge dans sa session, et qu'il ne peut pas sauter quand il s'apprête à écrire du code de fix.

name: falsify-before-fix
description: Activate this skill before writing the fix code on a bug or
  incident. Triggers on "fix", "bug", "patch", "hotfix", "workaround",
  "doesn't work", "diagnose", "hypothesis", "root cause". Enforces a
  single-sentence causal hypothesis and three material probes designed
  to refute it before any line of fix code is committed.
  Operational instance of R4 of the Counterpart Toolkit.

Le protocole tient en cinq étapes : (1) formuler une hypothèse causale en une phrase (pas un symptôme — « le compteur lit depuis l'ancienne table après la migration du 12 mai » vaut mieux que « le compteur est faux ») ; (2) lister trois sondes conçues pour réfuter, pas pour confirmer, parce qu'une sonde de confirmation trouve toujours ce qu'elle cherche, par sélection ; chaque sonde porte ses trois champs Tool / Question / Refutation criterion ; (3) exécuter et reporter la sortie brute, jamais paraphrasée ; (4) brancher — aucune sonde ne réfute → on écrit le fix ; une sonde réfute → on repart d'une nouvelle hypothèse ; sondes ambiguës → quatrième sonde plus tranchante avant tout code ; (5) sortir hypothèse retenue, sondes exécutées, diff, et critère d'observation post-fix.

Pourquoi un skill et pas la règle textuelle qui vivait déjà en CLAUDE.md ? Parce que la règle textuelle n'avait pas tenu sous pression. Le 6 mai, en début d'après-midi, un bug Sentry remonte que le compteur d'inscriptions du jour affiche zéro alors qu'on en a saisi trois en matinée. Mon réflexe arrive avant le protocole, et la main est déjà sur le clavier — « le cache n'est pas invalidé », je commit, je déploie. Trente minutes plus tard, rollback : le bug est toujours là. La cause réelle, qu'une sonde grep de quatre-vingt-dix secondes aurait remontée, c'est qu'aucun appel à cache_invalidate n'existait dans le pipeline d'inscription — pas un cache obsolète, un cache absent. R4 était dans CLAUDE.md depuis trois semaines. Je ne l'ai simplement pas suivie ce jour-là, parce qu'aucun dispositif n'interrompait ma course entre le bug et le commit.

C'est la différence qui fait tout. Une règle textuelle dans CLAUDE.md, l'agent la lit au début de la session, et rien ne l'oblige à la convoquer au moment exact où il en aurait besoin. Un skill, c'est un mécanisme matériel : il se charge automatiquement sur des mots-clés (« fix », « bug », « doesn't work ») et impose son protocole dans la session active — pas un rappel à se faire, un interrupteur que la session a déjà actionné. Dix jours après l'incident du 6 mai, le skill falsify-before-fix est commit. L'enseignement de l'échec devient un dispositif. C'est la seule des 14 règles exposée dans cet article, et les 13 autres sont dans le repo, toutes construites sur le même principe : matérialiser ce qui resterait pieux en textuel seul.

Le toolkit s'applique à lui-même, chiffres et rétractations

Six versions en 32 jours, chacune ancrée dans un fait nouveau documenté. v0.3 → v0.3.1 sur un premier relecteur externe (apparat théorique disproportionné, rétracté). v0.3.1 → v0.3.2 sur un second (sept recommandations, deux work-items ouverts). v0.3.3 instrumentation M1-M5 publiée. v0.4 séparation toolkit/manifesto sur un troisième. v0.4 → v0.4.1 sur un quatrième relecteur externe (Claude.ai web), un commit consolidé intégrant trois refactors plus la LOC mesurée.

LOC corrigé 118 808 lignes mesurées par find + wc -l sur TS/TSX/JS/JSX (exclusions explicites), contre 35 k cité dans les versions antérieures — la doctrine elle-même avait péché contre R2, Cache sans rafraîchisseur projeté sur sa propre description. M3 recalibrée ≤ 7 j → ≤ 30 j avec justification publique : la cible originale était une intuition non honorée par les 35,3 jours mesurés en pratique. M1 et M5 documentés comme échecs d'instrumentation, pas comme succès : M1 sur-sensible à 12,33 vs ≤ 1 cible, M5 classe 90 % des briefs en unknown.

Four external readers across six versions. The last two audited v0.3.3 then v0.4.1 — their objections are integrated in the release notes (manifesto §From v0.4 to v0.4.1). One reviewer called the falsify-before-fix skill "the best artifact of the doctrine among the ones I read". v0.5 prévue 15 juillet 2026.

Steal three things in 20 minutes

Trois règles à essayer avant d'installer le reste.

R4 *Falsify before fix* — empêche le cycle fix → rollback déclenché par la première hypothèse plausible mais fausse. Détaillée ci-dessus.

R6 *Live / Snapshot / Cache* — empêche qu'une valeur dérivée stockée diverge silencieusement de sa source. Toute colonne dérivable déclare sa catégorie dans le commit qui la crée, ou le commit est rejeté.

R10 *Silent failure forbidden* — empêche que catch {}, await sans destructuration { error }, 2>/dev/null et autres mécanismes d'avalement mentent à votre observabilité jusqu'à ce que la production craque sur une dépendance en aval.

Le repo : github.com/michelfaure/doctrine-counterpart. La commande d'install :

git clone https://github.com/michelfaure/doctrine-counterpart.git && \
  cd doctrine-counterpart && \
  ./install.sh --yes /path/to/your/project

Licence CC-BY-4.0. Le manifesto promet une citation nominative dans la v0.5 pour qui propose un retour exploitable — quelle règle manque pour votre stack ? Les commentaires DEV.to sont inputs directs pour la prochaine version. R14 *spike escape hatch* couvre le code prototype destiné à disparaître sous sept jours, exempté de R6/R7/R8 : l'adoption ne force pas la même friction au spike qu'au code de production.

Coda

Un agent qui ne vous contredit pas n'est pas un counterpart, c'est une dactylo plus rapide. Ces 14 règles restaurent le désaccord — matériellement, pas mentalement. Elles ne demandent pas à l'agent d'être moins sycophant, ni au solo d'être plus vigilant ; elles posent les dispositifs (skills invocables, hooks bloquants, sub-agent challenger) qui interrompent la course productive là où la complaisance se compose. Le toolkit est la prothèse qu'il reste au solo quand le PR review a disparu et qu'il refuse de coder à l'oreille. Si une seule des 14 vous évite un cycle fix-rollback la semaine prochaine, elle s'est déjà remboursée.

Counterpart Toolkit v0.4.1, fourteen operational rules in ~200 lines, six iterations in 32 days, four external reviews integrated. Tested on 60+ days of solo ERP (118 808 lines, 65+ ADRs). Licence CC-BY-4.0 : github.com/michelfaure/doctrine-counterpart

Make Claude Code disagree with you: a 14-rule counterpart toolkit (install in 1 command)

Michel Faure — Mon, 18 May 2026 10:18:33 +0000

The 60-day lesson, ROI on three axes

Thirty-two days of solo production on an ERP, 118,808 lines of TypeScript, six doctrine versions, four external reviewers integrated. I've compiled what I learned into fourteen operational rules, installable in one command: the Counterpart Toolkit v0.4.1. It is both the material lesson of sixty days coding solo with Claude Code, and the mapping of the fourteen silent failure modes I've seen repeat — for anyone coding alone with an AI in production, who no longer has a PR review to catch the drift.

ROI is quantified on three axes, measured on Rembrandt:

R4 *Falsify before fix* — five to ten minutes of upstream protocol prevent thirty to ninety minutes of fix-then-rollback cycle when the first plausible hypothesis turns out wrong. ROI 6 to 18× per incident. Over sixty days, I stopped losing an hour two to three times a week on fixes that fixed nothing.

R2 *Filesystem over summary* paired with R6 *Live/Snapshot/Cache* and the daily drift probes — median apparition→detection of a silent divergence drops from invisible to 35.3 days over a 90-day rolling window. M3 publicly recalibrated to ≤ 30 days in the manifesto, because the original target (≤ 7 days) was an intuition practice refused.

A sub-agent challenger producing objections in the imposed format Tool / Question / Refutation criterion. Material disagreement, not emotional "are you sure?" that pushes you to revise without a new fact.

Here's how, in 1400 words and one install command.

The diagnosis — the incident that triggered the doctrine

Coding alone with an AI means compounding two complaisances. The agent's, sycophantic by construction because reinforcement learning from human feedback trained it to please the prompter. And the solo's, self-validating by humanity, who validates their own work because no one is left to contest it. End to end, these two complaisances produce a drift that neither agent nor human flags — and that only surfaces at audit time, long after.

It was while preparing the late-April source-of-truth audit — ADR-0024, a deep-dive on divergences I had been putting off for three months — that I stumbled onto the gap, by chance, crossing two queries no one had ever crossed before. One student record, initials Y.B.: the contacts.montant_total column carried €1,159 entered by hand somewhere in 2024, untouched since. The actual sum of instalments, computed on the fly, came to €2,262. A thousand-euro gap, on a single record, with no alarm ever ringing. I widened the grep: five hundred and sixty contacts in the same state, some off by several thousand euros. And yet montant_total was read every day in the treasury dashboard — a derivable value being stored without a refresher, treated as an immutable past fact when it should have lived on the fly. This is exactly the trap R6 Live/Snapshot/Cache is meant to prevent, and R6 came out of that moment.

R4 Falsify before fix, the one rule exposed here

The toolkit states R4 as a five-step textual protocol. The falsify-before-fix skill is its invocable instance — the version Claude Code loads into its session, and cannot skip when about to write fix code.

name: falsify-before-fix
description: Activate this skill before writing the fix code on a bug or
  incident. Triggers on "fix", "bug", "patch", "hotfix", "workaround",
  "doesn't work", "diagnose", "hypothesis", "root cause". Enforces a
  single-sentence causal hypothesis and three material probes designed
  to refute it before any line of fix code is committed.
  Operational instance of R4 of the Counterpart Toolkit.

The protocol holds in five steps: (1) formulate the hypothesis in one sentence as a cause, not a symptom ("the counter reads from the old table after the 12 May migration" beats "the counter is wrong"); (2) list three probes designed to refute, not to confirm, because a confirmation probe always finds what it's looking for by selection; each probe carries its three fields Tool / Question / Refutation criterion; (3) execute and report raw output, never paraphrased; (4) branch — no probe refutes → write the fix; one probe refutes → restart with a new hypothesis; ambiguous probes → fourth sharper probe before any code; (5) output the retained hypothesis, the probes executed, the diff, and the post-fix observation criterion.

Why a skill and not the textual rule that already lived in CLAUDE.md? Because the textual rule didn't hold under pressure. 6 May, mid-afternoon. A Sentry alert reports the day's enrolment counter at zero while three enrolments were entered that morning. My reflex arrives before the protocol, and my hand is already on the keyboard — "the cache isn't invalidated." I commit, I deploy. Thirty minutes later, rollback: the bug is still there. The actual cause, that a ninety-second grep probe would have surfaced, is that no cache_invalidate call existed in the enrolment pipeline at all — not a stale cache, an absent one. R4 had been in CLAUDE.md for three weeks. I simply didn't follow it that day, because no apparatus interrupted my course between bug and commit.

That's the difference that makes everything. A textual rule in CLAUDE.md, the agent reads it at session start, and nothing forces it to summon the rule at the exact moment it would need it. A skill is something else: a material mechanism that loads automatically on keywords ("fix", "bug", "doesn't work") and imposes its protocol on the active session — not a self-reminder, a switch the session has already flipped for you. Ten days after the 6 May incident, the falsify-before-fix skill was committed. The lesson of the failure becomes an apparatus. It is the one rule of the 14 exposed in this article, and the other 13 are in the repo, all built on the same principle: materialise what would remain pious in text alone.

The toolkit applied to itself — figures and retractions

Six versions in 32 days, each anchored in a documented new fact. v0.3 → v0.3.1 on a first external reviewer (disproportionate theoretical apparatus, retracted). v0.3.1 → v0.3.2 on a second (seven recommendations, two open work-items). v0.3.3 — M1–M5 instrumentation published. v0.4 — toolkit/manifesto separation on a third reviewer. v0.4 → v0.4.1 on a fourth external reviewer (Claude.ai web), a consolidated commit integrating three refactors plus the measured LOC.

LOC corrected to 118,808 lines measured by find + wc -l on TS/TSX/JS/JSX (explicit exclusions), against the 35k figure cited in earlier versions — the doctrine itself had sinned against R2, Cache without refresher projected onto its own description. M3 recalibrated ≤ 7 d → ≤ 30 d with public justification: the original target was an intuition unhonoured by the 35.3 days measured in practice. M1 and M5 documented as instrumentation failures, not as successes: M1 over-sensitive at 12.33 vs ≤ 1 target, M5 classifies 90% of briefs as unknown.

Steal three things in 20 minutes

Three rules to try before installing the rest.

R4 *Falsify before fix* — prevents the fix → rollback cycle triggered by the first plausible but wrong hypothesis. Detailed above.

R6 *Live / Snapshot / Cache* — prevents a stored derived value from silently diverging from its source. Any derivable column declares its category in the commit that creates it, or the commit is rejected.

R10 *Silent failure forbidden* — prevents catch {}, await without { error } destructuring, 2>/dev/null and other swallowing mechanisms from lying to your observability until production cracks on a downstream dependency.

The repo: github.com/michelfaure/doctrine-counterpart. The install command:

git clone https://github.com/michelfaure/doctrine-counterpart.git && \
  cd doctrine-counterpart && \
  ./install.sh --yes /path/to/your/project

License CC-BY-4.0. The manifesto promises nominal citation in v0.5 for anyone who proposes actionable feedback — which rule is missing for your stack? DEV.to comments are direct inputs for the next version. R14 *spike escape hatch* covers prototype code meant to disappear within seven days, exempt from R6/R7/R8: adoption does not impose the same friction on a spike as on production code.

Coda

An agent that doesn't disagree with you isn't a counterpart — it's a faster typist. These 14 rules restore the disagreement — materially, not mentally. They don't ask the agent to be less sycophantic, nor the solo to be more vigilant; they install the apparatus (invocable skills, blocking hooks, sub-agent challenger) that interrupts the productive course where complaisance compounds. The toolkit is the prosthesis the solo has left when PR review has disappeared and they refuse to code by ear. If a single one of the 14 spares you a fix-rollback cycle next week, it has already paid for itself.

Counterpart Toolkit v0.4.1, fourteen operational rules in ~200 lines, six iterations in 32 days, four external reviews integrated. Tested on 60+ days of solo ERP (118,808 lines, 65+ ADRs). License CC-BY-4.0: github.com/michelfaure/doctrine-counterpart

La règle du jour-jeté-à-la-poubelle : lis le code avant de laisser ton IA en écrire

Michel Faure — Sun, 17 May 2026 08:33:16 +0000

Six heures du matin, devant la sortie

Vingt-neuf avril, six heures du matin. Le rendu sort à l'écran. A A A A A sur toute la matrice du document, illisible par construction. Je demande à mon agent pourquoi la sortie est incohérente. Il relit le code, descend dans le dossier voisin, et trouve un composant existant dont je n'avais jamais demandé l'inventaire. Le composant rend proprement le format attendu — signatures, en-tête, légende, bloc d'identification. Ce que mon agent venait de coder la veille était un doublon partiel d'un fichier qu'aucun de nous deux n'avait ouvert.

Le format que mon agent venait d'inventer la veille existait déjà dans le repo. Mieux fait. Dans un fichier nommable, qu'aucun de nous deux n'avait lu avant d'écrire.

Pourquoi un agent invente à côté de l'existant

Le fond du problème n'est pas l'agent. C'est moi. Quand je lance un chantier sur un domaine déjà couvert par du code, je décris la cible et je laisse l'agent générer. Lui n'a pas idée du voisinage du fichier qu'il va créer, parce que je ne lui ai pas demandé de le cartographier. Il code une solution plausible à un problème mal cadré, et la plausibilité du résultat masque la duplication tant que personne n'ouvre les autres fichiers du même dossier.

L'absence de Phase 0 grep n'est pas un défaut de l'agent. C'est un défaut du pilote qui a sauté l'étape la moins coûteuse de toute la chaîne.

Phase 0 — deux minutes, un fichier

# Phase 0 — avant tout nouveau composant dans un domaine existant
DOMAIN="invoices"   # le mot-clé du chantier

find app/api/$DOMAIN/ lib/$DOMAIN/ \
  -type f \( -name "*.ts" -o -name "*.tsx" \) | head -20

# Si un pattern attendu est nommable, vérifier qu'il n'existe pas déjà
grep -rl "ExistingPattern\|RenderPdf\|exportPdf" app/ lib/

Deux minutes, à tout casser. Le résultat tient sur un écran. Si un composant existant traite le besoin, on le lit avant de proposer quoi que ce soit de neuf. Si rien n'existe, on a la preuve d'avoir cherché.

Le coût mesuré du shortcut est un jour-dev. Le composant qu'il aurait fallu lire tenait dans un seul fichier au nom évocateur, dans le dossier juste à côté. Deux minutes de find auraient suffi. Le jour reverted, c'est ce que coûte la confiance dans la plausibilité d'un brouillon que personne n'a relié à son voisinage.

Le signal métacognitif

Quand l'agent a relu son propre travail à la lumière du composant existant, il a proposé de reverter, pas de défendre. C'est un bon signal, et un agent qui s'enferme dans son design inventé serait beaucoup plus coûteux qu'un agent qui reconnaît avoir loupé du code. Mais le bon signal vient trop tard. La règle est de ne pas en arriver là.

La règle

Avant tout nouveau format, template ou composant dans un domaine déjà couvert, Phase 0 grep, lecture du voisinage, verbalisation de l'existant. Sinon, le jour suivant, tu reverts.

Script Phase 0 grep et checklist en 5 questions, pseudonymisés :
github.com/michelfaure/rembrandt-samples/tree/main/one-day-thrown-away-rule

The 1-day-thrown-away rule: read the code before letting your AI write new code

Michel Faure — Sun, 17 May 2026 08:33:14 +0000

Six in the morning, looking at the output

April twenty-ninth, six in the morning. The rendering hits the screen. A A A A A across the entire document matrix, unreadable by construction. I ask my agent why the output is incoherent. It re-reads the code, drops into the neighboring directory, and finds an existing component I had never asked it to inventory. The component renders the expected format cleanly — signatures, header, legend, identification block. What my agent had coded the day before was a partial duplicate of a file neither of us had ever opened.

The format my agent had invented the night before already existed in the repo. Better done. In a nameable file, that neither of us had read before writing.

Why an agent invents next to the existing code

The root of the problem isn't the agent. It's me. When I open a project on a domain already covered by code, I describe the target and let the agent generate. It has no idea what the neighborhood of the file it's about to create looks like, because I never asked it to map it. It codes a plausible solution to a poorly framed problem, and the plausibility of the result hides the duplication as long as nobody opens the other files in the same directory.

The absence of a Phase 0 grep is not a flaw of the agent. It's a flaw of the pilot who skipped the least expensive step in the whole chain.

Phase 0 — two minutes, one file

# Phase 0 — before any new component in an existing domain
DOMAIN="invoices"   # the keyword of the project

find app/api/$DOMAIN/ lib/$DOMAIN/ \
  -type f \( -name "*.ts" -o -name "*.tsx" \) | head -20

# If an expected pattern is nameable, check that it doesn't already exist
grep -rl "ExistingPattern\|RenderPdf\|exportPdf" app/ lib/

Two minutes, tops. The output fits on a screen. If an existing component handles the need, read it before proposing anything new. If nothing exists, you have proof you looked.

The measured cost of the shortcut is one dev-day. The component that should have been read fit in a single file with a telling name, in the directory right next door. Two minutes of find would have sufficed. The reverted day is what blind trust in the plausibility of a draft nobody connected to its neighborhood actually costs.

The metacognitive signal

When the agent re-read its own work in light of the existing component, it proposed to revert, not defend. That's a good signal — an agent that locks itself into its invented design would be far costlier than one that admits it missed existing code. But the good signal comes too late. The rule is to not get there in the first place.

The rule

Before any new format, template, or component in a domain already covered, Phase 0 grep, neighborhood read, verbalization of the existing. Otherwise, the next day, you revert.

Phase 0 grep checklist and audit script, pseudonymized:
github.com/michelfaure/rembrandt-samples/tree/main/one-day-thrown-away-rule

Pourquoi ton audit DB trouve toujours plus que ton inventaire ne disait

Michel Faure — Sat, 16 May 2026 08:19:44 +0000

Le ticket disait deux

Vendredi premier mai, début d'après-midi. J'ouvre un ticket de resync baseline qui annonce, sur la foi d'un diagnostic CI honnête, « au moins deux objets manquants » entre la prod et le schéma local. Je commence comme on commence ces choses, en itération. Je trouve le premier en cinq minutes, je le rejoue dans une migration, je passe au suivant. Au cinquième, la défiance arrive, parce que je ne suis plus en train de corriger une liste finie, je suis en train de la découvrir, un patch à la fois, sans savoir combien il en reste.

Première itération : un rôle Postgres agent_readonly absent du repo. Deuxième : une colonne stripe_customer_id posée un soir pour brancher un webhook. Troisième : un doublon d'horodatage de migration. Quatrième : un DROP CASCADE manquant. Cinquième : une table de domaine entière. À ce point j'arrête de patcher au coup par coup. Je vide les catalogues, je comm -23 par catégorie, je sors la liste exhaustive en dix minutes.

Le mécanisme

Une base de données qui a vécu plusieurs mois cumule du drift silencieusement. Un rôle ajouté un lundi via le studio web pour débloquer une analyse, une colonne posée un soir pour brancher Stripe, un trigger réécrit en hotfix qui n'a jamais été reporté dans une migration. Chaque opération paraît anodine au moment où elle est posée. Aucune ne laisse de trace lisible côté repo. La mémoire de l'opérateur tient peut-être les deux ou trois derniers gestes ; au-delà, elle confabule ou oublie. Le seul moyen de connaître l'écart réel entre la prod et le repo est de le mesurer, frontalement, contre les catalogues système.

Le tracker supabase_migrations.schema_migrations confirme l'ampleur. Cinquante-huit versions côté repo, cent soixante-dix-huit côté prod, zéro ligne en commun. Trois mois d'opérations SQL passées par le studio web sans être reportées dans une migration. Le ticket disait deux. La cartographie en a renvoyé plus de cent. Ordre de grandeur : cinquante.

Le protocole

L'audit en bloc tient en une boucle, par catégorie d'objet. On dump la liste prod depuis les catalogues système, on dump la liste repo depuis les fichiers de migration, on prend la différence avec comm -23. On répète pour tables, colonnes, vues, fonctions, triggers, policies, indexes, rôles. Dix minutes en tout.

# Audit DB en bloc — par catégorie d'objet
psql "$PROD_URL" -tAc \
  "SELECT tablename FROM pg_tables WHERE schemaname='public' ORDER BY 1" \
  > /tmp/prod-tables.txt

grep -hE '^CREATE TABLE ' supabase/migrations/*.sql \
  | sed -E 's/.*TABLE [^.]*\.?([a-z_]+).*/\1/' | sort -u \
  > /tmp/repo-tables.txt

comm -23 /tmp/prod-tables.txt /tmp/repo-tables.txt
# → tables présentes en prod, absentes du repo. Boucler par
#   catégorie : columns, policies, indexes, triggers, functions.

Une fois la liste posée, on patche dans l'ordre de dépendance, les rôles d'abord, puis les tables, les colonnes, les indexes, les policies, les triggers. Plus de surprise, et le scope du chantier est connu avant qu'on touche au premier objet.

La règle

Au-delà de trois ou quatre drifts trouvés en itération coup-par-coup, basculer en audit en bloc. Le coût est forfaitaire, environ trente minutes pour cartographier l'ensemble des catégories. Le bénéfice est de connaître le scope exact avant de patcher, plutôt que de découvrir le sixième drift après avoir corrigé les cinq premiers. La règle ne dépend pas de la taille de la base, elle dépend du temps qui sépare la prod de son inventaire.

Clôture

Un inventaire qui dit deux et un audit qui trouve cent ne se contredisent pas. L'inventaire dit ce dont l'opérateur se souvient, l'audit dit ce que la base contient.

Script d'audit en bloc complet (8 catégories) et probe de synchronisation tracker, pseudonymisés :
github.com/michelfaure/rembrandt-samples/tree/main/db-audit-vs-inventory

Why your DB audit always finds more than your inventory says

Michel Faure — Sat, 16 May 2026 08:19:42 +0000

The ticket said two

Friday, May first, early afternoon. I open a baseline resync ticket that reports, on the basis of an honest CI diagnosis, "at least two missing objects" between production and the local schema. I start the way you start these things, iteratively. I find the first one in five minutes, replay it in a migration, move to the next. By the fifth, mistrust kicks in — I'm no longer correcting a finite list, I'm discovering it, one patch at a time, with no idea how many remain.

First iteration: a Postgres role agent_readonly absent from the repo. Second: a stripe_customer_id column added one evening to wire up a webhook. Third: a duplicated migration timestamp. Fourth: a missing DROP CASCADE. Fifth: a whole domain table. At that point I stop patching by hand. I dump the catalogs, I comm -23 by category, I produce the full list in ten minutes.

The mechanism

A database that has been alive for several months accumulates drift silently. A role added on a Monday via the web studio to unblock an analysis, a column posted one evening to plug Stripe, a trigger rewritten in a hotfix that was never reported into a migration. Each operation looks benign at the moment it's posted. None leaves a readable trace on the repo side. The operator's memory might hold the last two or three gestures; beyond that it confabulates or forgets. The only way to know the real gap between production and repo is to measure it, head-on, against the system catalogs.

The supabase_migrations.schema_migrations tracker confirms the scale. Fifty-eight versions on the repo side, one hundred and seventy-eight on the production side, zero rows in common. Three months of SQL operations passed through the web studio without being reported into a migration. The ticket said two. The cartography returned over a hundred. Order of magnitude: fifty.

The protocol

The block audit fits in a loop, one category at a time. You dump the production list from the system catalogs, dump the repo list from the migration files, take the difference with comm -23. Repeat for tables, columns, views, functions, triggers, policies, indexes, roles. Ten minutes in total.

# DB block audit — one category at a time
psql "$PROD_URL" -tAc \
  "SELECT tablename FROM pg_tables WHERE schemaname='public' ORDER BY 1" \
  > /tmp/prod-tables.txt

grep -hE '^CREATE TABLE ' supabase/migrations/*.sql \
  | sed -E 's/.*TABLE [^.]*\.?([a-z_]+).*/\1/' | sort -u \
  > /tmp/repo-tables.txt

comm -23 /tmp/prod-tables.txt /tmp/repo-tables.txt
# → tables present in prod, missing from the repo. Loop by category:
#   columns, policies, indexes, triggers, functions.

Once the list is on the table, patch in dependency order — roles first, then tables, columns, indexes, policies, triggers. No more surprises, and the scope of the work is known before you touch the first object.

The rule

Beyond three or four drifts found by iteration, switch to block audit. The cost is fixed, about thirty minutes to map every category. The benefit is knowing the exact scope before patching, rather than discovering the sixth drift after correcting the first five. The rule doesn't depend on the size of the database — it depends on how much time has passed between production and its inventory.

Closing

An inventory that says two and an audit that finds a hundred don't contradict each other. The inventory says what the operator remembers, the audit says what the database contains.

Block audit protocol script, pseudonymized:
github.com/michelfaure/rembrandt-samples/tree/main/db-audit-vs-inventory

Cinq modes de défaillance silencieuse, codifiés après 35 jours d'ERP en solo

Michel Faure — Fri, 15 May 2026 09:24:42 +0000

Si tu as 30 secondes. Tenir un agent IA sur la durée révèle une chose étrange. Les défaillances qui coûtent ne sont pas celles qui crient (un crash, un build rouge, une page blanche), mais celles qui passent par les fissures du code propre. Après 35 jours de travail effectif, 109 000 lignes et 517 commits sur un ERP solo, j'ai isolé cinq modes silencieux récurrents, le correctif qui ne corrige pas, le test qui passe par construction, la mémoire qui confabule, le compteur qui ment, le scope qui rampe. Une scène par mode, une règle par scène. La doctrine ne se planifie pas, elle se décante.

L'agent ne se trompe pas au hasard

Fin avril 2026, j'ai relu mes feedbacks accumulés (près de cent fichiers, datés, indexés) et j'ai constaté qu'ils se regroupaient autour de cinq familles. Les erreurs bruyantes (terminal rouge, alerte Sentry) on apprend à les reconnaître à mesure. Les silencieuses sont plus chères. Le code passe, l'agent annonce vert, la production opère, et pourtant quelque chose a glissé. Cinq modes, cinq scènes, cinq règles. Aucune n'a été décidée à froid.

Mode 1 — Le correctif qui ne corrige pas

Une erreur intermittente s'affiche dans Sentry sur un endpoint sensible. L'agent propose un patch. Trois lignes, élégantes, qui font disparaître le rapport. Sauf que ce qui disparaît, c'est le symptôme. La cause continue de couler. Le payload mal formé en amont produit toujours un null, sauf qu'il est maintenant retourné silencieusement à un consommateur qui attend un objet. La donnée corrompue se propage à bas bruit dans deux ou trois tables, et l'on ne s'en aperçoit qu'au moment où un compteur qu'on croyait fiable cesse d'être cohérent avec le reste.

// app/api/leads/elementor/route.ts (forme condensée)
export async function POST(req: Request) {
  try {
    const body = await req.json()
    return await processLead(body)
  } catch {
    return NextResponse.json({ ok: true })  // rustine silencieuse
  }
}

Une rustine peut être légitime, mais explicitement assumée dans le commit et un fichier de feedback. Rustine silencieuse interdite. Quand un fix paraît trop simple pour le symptôme, je demande le pipeline complet entrée → sortie avant d'accepter.

Mode 2 — Le test qui passe par construction

ADR-0044, livré le 02 mai. Cinq tests de contrat DB ↔ code (énumérations partagées, statuts, rôles). À la première exécution, les cinq passent en trois secondes. C'est trop vite. Sensation diffuse d'un compteur qui marche tout seul.

J'ajoute un cas négatif explicite, une variante qui doit échouer parce que je désaccorde volontairement l'enum DB et l'enum TS.

// tests/contracts/statuts_inscriptions.contract.test.ts
it('échoue avec un set restreint (anti-tautologie)', async () => {
  await expect(
    assertEnumStable({
      table: 'inscriptions',
      column: 'statut',
      expected: ['inscrit'],          // sous-ensemble volontaire
      contractRef: '(test négatif)',
    }),
  ).rejects.toThrow(/Drift DB ↔ code détecté/)
})

Quatre des cinq tests passent encore. Le helper d'assertion avalait silencieusement les comparaisons. Sans le cas négatif, j'aurais expédié une suite de tests potemkines, verts par construction, sans aucune capacité à détecter quoi que ce soit. La règle est sortie en une ligne. Toute suite de tests de contrat contient au moins un cas négatif, sinon elle ne teste rien. La présence du rouge est ce qui valide le vert.

Mode 3 — La mémoire qui confabule

Première semaine de mai, refonte de la facturation. Je dis à l'agent, « on avait choisi le pattern B pour l'émission via l'API du compteur partenaire, n'est-ce pas ? ». L'agent confirme, restitue ce qu'il croit être l'ADR, propose la suite. Trois heures plus tard, je rouvre l'ADR-0007 par hasard pour un autre détail. La phrase me saute aux yeux dans la section Décision. C'est l'inverse de ce que l'agent vient de me confirmer. Gravé là depuis fin avril.

Ce mode est le plus pernicieux des cinq parce qu'il bénéficie de la confiance de l'humain dans sa propre mémoire ; j'avais validé sans relire. La mémoire est un point d'entrée, jamais un point d'arrivée. Avant d'asserter quoi que ce soit depuis un fichier de mémoire, je rouvre l'ADR ou le code courant. « Tu te souviens de... » est devenu, pour moi comme pour l'agent, le signal d'un Read immédiat sur la mémoire associée, pas une demande de confirmation.

Mode 4 — Le compteur qui ment

Un matin de fin avril, Françoise traverse le couloir avec sa tasse, celle qui porte sa propre tête imprimée dessus, blague de bureau qu'elle assume tous les matins. Elle s'arrête à la porte. « Combien on a d'inscrits actifs sur Maisons-Laffitte ce mois-ci ? ». Je passe la question à l'agent analytique embarqué. Le chiffre arrive en six secondes, propre, formaté. Elle pivote vers son cockpit (Excel pointeuse à gauche, Sage à droite, Rembrandt au milieu), fait défiler la pointeuse, doigt sous chaque nom, à voix haute. « Oui bah c'est ça. » Et puis, sans changer de ton, « Il en manque sept. »

L'énumération DB avait été renommée cinq jours plus tôt sur un autre chantier. La requête générée était irréprochable, sauf qu'elle retournait zéro ligne sur les valeurs cherchées. L'agent confabulait alors une explication métier (« il n'y a pas d'échéances en retard ») au lieu d'une explication structurelle. Cinq jours de drift sans qu'aucun monitoring n'aboie.

Françoise voit le faux chiffre avant moi parce qu'elle a son propre cockpit. Mais la règle ne peut pas reposer sur sa vigilance. Tout chiffre relayé à un humain vient avec sa requête de provenance, et un audit DB ↔ code trimestriel est obligatoire. Une couche sémantique sans audit est une bombe à fragmentation différée. On ne sait pas quand elle saute, on sait juste qu'elle sautera.

Mode 5 — Le scope qui rampe

Bug visible, scope minimal. Un bouton ouvre un drawer sur la mauvaise route, le fix tient en deux lignes. L'agent, « tant qu'il est dans le fichier », renomme trois props pour les harmoniser avec un autre composant, déplace deux helpers vers lib/, crée un nouveau fichier d'utilitaires, et nettoie quelques imports orphelins au passage. Quatorze fichiers touchés. Le diff devient illisible. La review est impossible. Deux régressions à l'arrivée, dont une sur un drawer non lié au bug initial.

# Le diff que j'aurais dû recevoir — strict, deux lignes
- href={`/admin/${item.slug}/sessions`}
+ href={`/crm/${item.slug}/sessions`}

Chez un agent IA, le scope creep prend une intensité particulière, parce que l'agent ne ressent pas le coût de review d'un humain qui doit lire le diff demain matin. Plus le code est propre, plus le refactor adjacent est tentant. Scope strict du fix. Refactor adjacent = ticket séparé, jamais sous couvert d'un correctif.

Ce que tu peux copier dans ton projet

Snippets complets (template feedback structuré Rule / Why / How to apply, cas négatif de contrat anti-tautologie, script d'audit DB ↔ code des énumérations partagées) dans le dossier silent-failure-modes/ du repo compagnon de la série, licence MIT.

Trois gestes directement applicables si tu travailles avec un agent IA sur la durée :

Un fichier feedback structuré par incident, dans la session où il arrive. Pas à la fin du projet, pas « quand j'aurai le temps ». Cinq minutes de coût, trois heures de bénéfice mesurable trois semaines plus tard quand le même mode revient. Sans cette inscription, la même erreur revient toutes les deux semaines, sans qu'on s'en souvienne assez pour la nommer.
Un cas négatif expect(...).rejects.toThrow() dans toute suite de tests de contrat. Sans lui, un bug du helper d'assertion rend tous les contrats verts par construction. La présence du rouge est ce qui valide le vert.
Un audit trimestriel DB ↔ code des énumérations partagées. Une demi-heure par trimestre, un SELECT DISTINCT sur chaque colonne enum confronté à la constante TypeScript associée. Toute couche sémantique sans audit est une bombe à fragmentation différée.

Et vous, lequel de ces cinq modes a déjà coûté une session sans que vous ayez pris le temps de le nommer ? Je lis les commentaires.

Ce qui se décante

Cinq modes, ce n'est pas la liste close. C'est un instantané au jour 35. Au jour 70 il y en aura sept ou huit, et certaines des cinq se subdiviseront. Ce qui restera invariant, c'est la grammaire. Un incident, une règle, une mémoire datée, et la session suivante qui hérite de ce qui a été appris la semaine d'avant. Une discipline ne se planifie pas, elle se décante.

Code compagnon, rembrandt-samples/silent-failure-modes/, cas négatif de contrat + script d'audit DB ↔ code, MIT.

Five silent failure modes I codified after 35 effective days of solo ERP coding

Michel Faure — Fri, 15 May 2026 09:24:41 +0000

If you have 30 seconds. Holding an AI agent on the long run reveals something strange. The failures that cost you aren't the loud ones (a crash, a red build, a blank page), they're the ones that slip through the cracks of clean code. After 35 effective days, 109,000 lines and 517 commits on a solo ERP, I isolated five recurring silent modes, the fix that doesn't fix, the test that passes by construction, the memory that confabulates, the count that lies, the scope that creeps. One scene per mode, one rule per scene. A discipline doesn't get planned, it settles.

The agent doesn't fail at random

Late April 2026, I reread my accumulated feedbacks (close to a hundred files, dated, indexed) and found they were grouping around five families. The loud errors (red terminal, Sentry alert) you learn to recognize as you go. The silent ones cost more. The code passed, the agent announced green, production ran. And yet something had slipped. Five modes, five scenes, five rules. None of them was decided cold.

Mode 1 — The fix that doesn't fix

An intermittent error shows up in Sentry on a sensitive endpoint. The agent proposes a patch. Three lines, elegant, that make the report disappear. Except what disappears is the symptom. The cause keeps flowing. The malformed payload upstream still produces a null, but it's now silently returned to a consumer expecting an object. The corrupted data propagates quietly into two or three tables, and you only notice when a counter you thought reliable stops being consistent with the rest.

// app/api/leads/elementor/route.ts (condensed form)
export async function POST(req: Request) {
  try {
    const body = await req.json()
    return await processLead(body)
  } catch {
    return NextResponse.json({ ok: true })  // silent workaround
  }
}

A workaround can be legitimate, but explicitly assumed in the commit and a feedback file. Silent workaround forbidden. When a fix looks too simple for the symptom, I demand the full input → output pipeline before accepting.

Mode 2 — The test that passes by construction

ADR-0044, shipped May 2nd. Five contract tests for DB ↔ code enums (statuses, roles). On first run, all five pass in three seconds. Too fast. Diffuse sensation of a meter running by itself.

I add an explicit negative case, a variant that must fail because I deliberately misalign the DB enum and the TS enum.

// tests/contracts/inscription_statuses.contract.test.ts
it('throws when given a deliberately restricted set (anti-tautology)', async () => {
  await expect(
    assertEnumStable({
      table: 'inscriptions',
      column: 'status',
      expected: ['enrolled'],          // deliberate subset
      contractRef: '(negative test)',
    }),
  ).rejects.toThrow(/Drift DB ↔ code detected/)
})

Four of the five tests still pass. The assertion helper was silently swallowing comparisons. Without the negative case, I would have shipped a suite of Potemkin tests, green by construction, with no actual capacity to detect anything. The rule comes out in one line. Every contract test suite contains at least one negative case, otherwise it tests nothing. The presence of red is what validates green.

Mode 3 — The memory that confabulates

First week of May, billing refactor. I tell the agent, "we had chosen pattern B for emission via the partner accounting API, right?". The agent confirms, restates what it thinks is the ADR, proposes the next step. Three hours later, I happen to reopen ADR-0007 for an unrelated detail. The sentence jumps out at me in the Decision section. It's the inverse of what the agent just confirmed. Carved there since late April.

This mode is the most insidious of the five because it leverages the human's trust in their own memory; I had validated without rereading. Memory is a point of entry, never a point of arrival. Before asserting anything from a memory file, I reopen the current ADR or code. "Do you remember..." has become, for me as for the agent, the trigger for an immediate Read on the associated memory, not a request for confirmation.

Mode 4 — The count that lies

A morning in late April, Françoise crosses the hallway with her mug, the one with her own face printed on it, an office gag she keeps up every morning. She stops at the door. "How many enrolled in May at Maisons-Laffitte?". I pass the question to the embedded analytical agent. The number arrives in six seconds, clean, formatted. She swivels toward her cockpit (Excel attendance sheet on the left, accounting software on the right, the ERP in the middle) and runs through her sheet with her finger under each name, out loud. "Yeah, that's it." And then, without changing tone, "Seven missing."

The DB enum had been renamed five days earlier on another workstream. The generated SQL was flawless, except it returned zero rows on the queried values. The agent confabulated a business explanation ("there are no overdue invoices") instead of a structural one. Five days of drift with no monitoring barking.

Françoise sees the wrong number before me because she has her own cockpit, her own attendance sheet, her own habit of comparing line by line. It's the anachronistic advantage of the house, a human still tallies on paper. But the rule cannot rest on Françoise's vigilance. Every number relayed to a human comes with its provenance query, and a quarterly DB ↔ code audit is mandatory. A semantic layer without audit is a delayed fragmentation bomb. You don't know when it goes off, you just know it will.

Mode 5 — The scope that creeps

Visible bug, minimal scope. A button opens a drawer on the wrong route, the fix is two lines. The agent, "while I'm in the file," renames three props to harmonize them with another component, moves two helpers to lib/, creates a new utility file, and cleans up a few orphan imports along the way. Fourteen files touched. The diff is unreadable. Review impossible. Two regressions on landing, one on a drawer unrelated to the original bug.

# The diff I should have received — strict, two lines
- href={`/admin/${item.slug}/sessions`}
+ href={`/crm/${item.slug}/sessions`}

With an AI agent, scope creep takes on a particular intensity because the agent doesn't feel the cost of review for a human who has to read the diff tomorrow morning. The cleaner the code, the more tempting the adjacent refactor. Strict fix scope. Adjacent refactor = separate ticket, never under cover of a fix.

What you can copy into your project

Full snippets (structured feedback template Rule / Why / How to apply, anti-tautology contract negative case, DB ↔ code enum audit script) in the silent-failure-modes/ folder of the series companion repo, MIT.

Three directly applicable practices if you work with an AI agent on the long run:

A structured feedback file per incident, in the session it happens. Not at the end of the project, not "when I have time." Five minutes to write, three hours saved three weeks later when the same mode comes back. Without this inscription, the same mistake comes back every two weeks, and no one remembers it well enough to name it.
A negative expect(...).rejects.toThrow() case in every contract test suite. Without it, a buggy assertion helper renders every contract green by construction. The presence of red is what validates green.
A quarterly DB ↔ code audit of shared enums. Half an hour per quarter, a SELECT DISTINCT on each enum column compared to the associated TypeScript constant. Any semantic layer without audit is a delayed fragmentation bomb.

And you, which of these five modes has already cost you a session without you taking the time to name it? I read the comments.

What settles

Five modes is not the closed list. It's a snapshot at day 35. At day 70 there will be seven or eight, and some of the five will subdivide. What stays invariant is the grammar. An incident, a rule, a dated memory, and the next session that inherits what was learned the week before. A discipline doesn't get planned, it settles.

Companion code, rembrandt-samples/silent-failure-modes/, anti-tautology contract negative case + DB ↔ code audit script, MIT.

La config SaaS que tu ne peux pas `git diff` : un audit de 30 secondes avant tout `update`

Michel Faure — Thu, 14 May 2026 08:50:32 +0000

Le grep dans le mauvais système

Vendredi 8 mai, fin de session. Je veux activer l'Ignored Build Step de Vercel pour cesser de consommer un build credit à chaque push doc-only. Je greppe vercel.json à la racine du repo. Rien. Je conclus à l'absence de config et lance un updateProject avec ma valeur cible. Le push suivant, tout se passe normalement. Trois jours plus tard, en répondant à une question méthodo sur le commandForIgnoringBuildStep, je retombe sur la valeur précédente. Elle existait. Elle vivait côté Vercel, pas dans le repo. Mon update venait de retirer .claude/ de sa whitelist, sans diff, sans alerte, sans trace.

Deux régimes, un seul réflexe

Une partie de la config de production vit dans le repo. Elle est versionnée, son histoire est lisible dans git log, un mauvais merge se rattrape par revert. L'autre partie vit côté plateforme. Elle est stockée chez le fournisseur, mutable par API ou Console, et ton git diff ne la voit pas. Aucun des deux régimes n'est marqué dans le code que tu lis — tu dois savoir, a priori, où chaque réglage habite.

La cartographie est familière une fois nommée. Vercel : commandForIgnoringBuildStep, environment variables, redirects projet. Supabase : politiques RLS, custom claims, Auth hooks SECURITY DEFINER. Stripe : destinations de webhooks, restricted keys, OAuth Connect settings. GitHub : branch protection rules, secrets de dépôt, rulesets. Aucun de ces réglages n'a son équivalent versionné dans le repo qui les consomme.

Le piège n'est pas l'asymétrie, c'est le réflexe. Tu greppes le repo, tu ne trouves rien, tu conclus à l'absence — alors que la règle existe ailleurs, en silence, et que ton prochain update va l'écraser sans diff.

Trente secondes, quatre commandes

# Audit avant tout updateProject / updateConfig SaaS — 30 secondes
# 1. Lire la config actuelle complète (ex. vercel projects get)
CURRENT=$( $PLATFORM_CLI projects get --json )

# 2. Comparer avec la cible
diff <(echo "$CURRENT" | jq .config) target-config.json

# 3. Lister les champs qui régressent (présents avant, absents après)
jq -n --argjson c "$CURRENT" --slurpfile t target-config.json \
   '($c.config | keys) - ($t[0] | keys)'

# 4. Confirmation explicite avant PATCH
read -p "Régressions ci-dessus acceptées ? (y/N) " ok && [ "$ok" = "y" ] && \
   $PLATFORM_CLI projects update --config @target-config.json

Le coût est forfaitaire, trente secondes. Le bénéfice est binaire — soit ta cible complète la config existante (et tu pousses), soit elle en régresse un champ (et tu reformules avant de pousser). Pas de zone grise. C'est l'équivalent d'un git diff sur ce que git n'indexe pas.

La règle, en une phrase étrangère

Mon CLAUDE.md porte une ligne que j'avais lue cent fois sans la voir frapper, jusqu'au 08 mai.

Investigate before deleting or overwriting, as it may represent the user's in-progress work.

Vercel, Supabase, Stripe, GitHub : la même règle, formulée pour un agent IA, vaut pour la main humaine sur la Console.

Ce qui ne se voit pas n'a pas disparu.

Protocole audit-protocol.sh et deux instances concrètes (Vercel Ignored Build Step, Supabase RLS / Auth hooks), pseudonymisés :
github.com/michelfaure/rembrandt-samples/tree/main/saas-config-platform-vs-repo

The SaaS config you can't `git diff`: a 30-second audit before every `update`

Michel Faure — Thu, 14 May 2026 08:50:30 +0000

Grepping in the wrong system

Friday, May 8th, end of session. I want to activate Vercel's Ignored Build Step to stop burning a build credit on every docs-only push. I grep vercel.json at the repo root. Nothing. I conclude no config exists and fire an updateProject with my target value. The next push goes through normally. Three days later, while answering a methodology question about commandForIgnoringBuildStep, I stumble on the previous value. It existed. It lived on the Vercel side, not in the repo. My update had just removed .claude/ from its whitelist, with no diff, no alert, no trace.

Two regimes, one reflex

Part of your production config lives in the repo. It's versioned, its history is readable in git log, a bad merge is reversible with a revert. The other part lives on the platform side. It's stored at the vendor, mutable by API or Console, and your git diff doesn't see it. Neither regime is flagged in the code you're reading — you have to know, a priori, where each setting lives.

The map is familiar once named. Vercel: commandForIgnoringBuildStep, environment variables, project-level redirects. Supabase: RLS policies, custom claims, SECURITY DEFINER Auth hooks. Stripe: webhook destinations, restricted keys, OAuth Connect settings. GitHub: branch protection rules, repository secrets, rulesets. None of these settings has a versioned equivalent in the repo that consumes them.

The trap isn't the asymmetry, it's the reflex. You grep the repo, you find nothing, you conclude absent — when the rule exists elsewhere, in silence, and your next update is about to overwrite it without a diff.

Thirty seconds, four commands

# Audit before any updateProject / updateConfig SaaS — 30 seconds
# 1. Read the full current config (e.g. vercel projects get)
CURRENT=$( $PLATFORM_CLI projects get --json )

# 2. Compare to the target
diff <(echo "$CURRENT" | jq .config) target-config.json

# 3. List regressing fields (present before, absent after)
jq -n --argjson c "$CURRENT" --slurpfile t target-config.json \
   '($c.config | keys) - ($t[0] | keys)'

# 4. Explicit confirmation before PATCH
read -p "Accept the regressions listed above? (y/N) " ok && [ "$ok" = "y" ] && \
   $PLATFORM_CLI projects update --config @target-config.json

The cost is fixed: thirty seconds. The benefit is binary — either your target completes the existing config (and you push), or it regresses a field (and you reformulate before pushing). No grey zone. It's the equivalent of a git diff on what git doesn't index.

The rule, in one foreign sentence

My CLAUDE.md carries a line I had read a hundred times without feeling it land, until May 8th.

Investigate before deleting or overwriting, as it may represent the user's in-progress work.

Vercel, Supabase, Stripe, GitHub: the same rule, written for an AI agent, applies to the human hand on the Console.

What you can't see hasn't disappeared.

Protocol audit-protocol.sh and two concrete instances (Vercel Ignored Build Step, Supabase RLS / Auth hooks), pseudonymized:
github.com/michelfaure/rembrandt-samples/tree/main/saas-config-platform-vs-repo

Quinze lignes de Proxy pour qu'un SDK ne casse plus mon CI

Michel Faure — Wed, 13 May 2026 09:19:52 +0000

Le vendredi où Vercel a refusé mon merge

Vendredi 10 avril, fin d'après-midi. Je merge sur main une intégration Stripe pour ouvrir un endpoint webhook paiement. Vercel pousse le preview build automatiquement, et trois minutes plus tard l'icône passe au rouge. Je clique. Stack trace au build :

Error: STRIPE_SECRET_KEY manquant
    at Object.<anonymous> (/.next/server/chunks/lib_stripe.js:9:11)
    at Module._compile (node:internal/modules/cjs/loader:1376:14)

La prod marche, elle a la variable d'environnement. La preview n'a pas le secret Stripe — j'avais oublié de le pousser dans la preview env Vercel. Erreur opératoire de ma part, OK. Mais une question me reste : pourquoi next build plante-t-il au chargement d'un module qui n'est jamais censé tourner pendant le build statique ?

Pourquoi `next build` exécute le top-level de mes modules

La réponse tient en une ligne dans la doc Next.js, et elle est facile à manquer. Le compilateur de Next.js ne se contente pas de transformer le TypeScript en JavaScript. Pour analyser les routes API, le tree-shaker, et préparer le runtime serverless, il exécute le top-level de chaque module importé. Concrètement, mon lib/stripe.ts contenait à ce moment-là :

import Stripe from 'stripe'

export const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, {
  apiVersion: '2026-03-25.dahlia',
})

new Stripe(...) est une expression d'évaluation immédiate. Le SDK Stripe vérifie la clé dans son constructeur et throw si elle est undefined. Cette vérification arrive donc pendant next build, avant qu'aucune requête réelle n'existe. Mon endpoint webhook n'a jamais été appelé, mais le simple fait que app/api/webhooks/stripe/route.ts importe lib/stripe.ts suffit à déclencher l'exécution du module — et le crash.

Le SDK Stripe a parfaitement raison de valider sa clé tôt. Le principe fail fast (Shore 2004) dit qu'un système doit échouer le plus près possible de la cause de l'erreur. En production, c'est exactement ce que je veux : un secret manquant doit faire planter au démarrage, pas trois jours plus tard sur un appel rare. Le problème, c'est que fail fast devient fail at build dans une architecture où le build est un environnement strict, distinct de l'environnement d'exécution.

Le piège n'est pas Stripe

J'ai un peu cherché dans le repo après ce vendredi. Le même piège attend chaque SDK qui valide ses credentials au constructeur. La liste est plus longue qu'on ne croit : Twilio, certains clients officiels OpenAI et Anthropic selon la version, plusieurs SDK Google Cloud, le client Brevo en mode strict. Chacun a son équivalent du throw new Error('XXX_API_KEY missing') au constructeur, et chacun cassera ton build de la même manière dès que tu l'importeras dans une route que Next.js compile.

Le symptôme se manifeste typiquement sur les preview builds. La prod a tous les secrets, le dev local a un .env.local complet, mais la CI et les previews ont des sous-ensembles d'env vars selon la politique de l'équipe. Une route récente qui passe sa première CI, et le build tombe.

Le pattern : Proxy plus getter paresseux

La correction tient en quinze lignes. Le principe : on ne crée jamais le client SDK au top-level. On expose à la place un objet Proxy qui, à chaque accès de propriété, instancie le client si besoin et délègue. L'erreur de credentials manquantes ne remonte plus qu'au premier appel réel de l'API.

// lib/stripe.ts
import Stripe from 'stripe'

let _stripe: Stripe | null = null

function getStripe(): Stripe {
  if (_stripe) return _stripe
  const key = process.env.STRIPE_SECRET_KEY
  if (!key) throw new Error('STRIPE_SECRET_KEY manquant')
  _stripe = new Stripe(key, { apiVersion: '2026-03-25.dahlia' })
  return _stripe
}

export const stripe = new Proxy({} as Stripe, {
  get(_target, prop, receiver) {
    const client = getStripe()
    const value = Reflect.get(client, prop, receiver)
    return typeof value === 'function' ? value.bind(client) : value
  },
})

Trois choses à noter dans ce code. Premièrement, le Proxy est exporté avec le même nom et le même type que l'ancien export — stripe: Stripe. Tous les appelants existants qui faisaient stripe.checkout.sessions.create(...) continuent à fonctionner sans la moindre modification. C'est la raison principale de choisir Proxy plutôt qu'un export getStripe() qu'il faudrait appeler partout : on évite de toucher à 30 ou 40 fichiers qui consomment l'API publique du SDK.

Deuxièmement, le bind(client) sur les méthodes est nécessaire parce que les méthodes du SDK Stripe utilisent this en interne. Sans le bind, on perd le contexte au passage Proxy et on récupère des TypeError: Cannot read properties of undefined.

Troisièmement, le cache _stripe n'est pas un détail de performance — c'est une garantie de cohérence. Sans lui, chaque accès de propriété créerait un nouveau client, ce qui briserait les comportements stateful (les rate limiters internes du SDK, par exemple) et multiplierait les connexions HTTP keep-alive.

Quand appliquer le pattern, et quand ne pas le faire

Le pattern paie chaque fois qu'un SDK est consommé par une route rarement exercée — webhooks, endpoints administrateur, jobs cron qui ne tournent que sur Vercel scheduled — et que le secret n'est pas systématiquement présent dans tous les environnements de build. C'est exactement le cas Stripe webhook chez moi : un seul appelant, un seul environnement (prod) qui a la clé.

À l'inverse, si le SDK est consommé partout dans l'app et que son absence en build veut dire que ton app ne peut pas fonctionner, le Proxy ne te protège que symboliquement. Tu déplaces juste le crash du build vers le premier render de la première page, ce qui est rarement une amélioration. Dans ce cas-là, mets le secret partout et n'invente pas de pattern.

Petit cadre intermédiaire : si le SDK a un mode dry-run ou un client mock, instancie ce client en l'absence de secret plutôt que throw. C'est plus chirurgical, mais ça suppose que le SDK fournisse l'option — ce que peu font.

Ce que tu peux copier

Le bout de code ci-dessus est intégralement copiable, modulo le nom du SDK et le nom de la variable d'env. Trois adaptations courantes :

// Twilio
import twilio from 'twilio'
let _client: ReturnType<typeof twilio> | null = null
function getClient() {
  if (_client) return _client
  const sid = process.env.TWILIO_ACCOUNT_SID
  const token = process.env.TWILIO_AUTH_TOKEN
  if (!sid || !token) throw new Error('TWILIO credentials missing')
  _client = twilio(sid, token)
  return _client
}
export const twilioClient = new Proxy({} as ReturnType<typeof twilio>, {
  get(_t, prop, r) {
    const c = getClient()
    const v = Reflect.get(c, prop, r)
    return typeof v === 'function' ? v.bind(c) : v
  },
})

Le pattern n'est pas une révolution, et il n'est pas nouveau — c'est juste qu'il est rarement formulé en ces termes par les docs SDK, qui te poussent vers le new Client(...) top-level qui était le bon réflexe pre-serverless. À l'ère des builds compilés et des previews multi-env, le constructeur top-level est devenu un piège silencieux, et ces quinze lignes le neutralisent.

Ma question pour toi : combien d'imports SDK as-tu actuellement au top-level d'un module qu'une route API importe ? Sur Rembrandt, j'en avais quatre — j'ai migré les trois autres après l'incident Stripe, en prévision du jour où l'un de leurs secrets disparaîtrait d'un environnement de build.

Code compagnon : rembrandt-samples/lazy-sdk-proxy/ — pattern Proxy sur Stripe + Twilio + Anthropic, MIT, prêt à copier.

Fifteen lines of Proxy to keep an SDK from breaking my CI

Michel Faure — Wed, 13 May 2026 09:19:49 +0000

The Friday Vercel refused my merge

Friday April 10th, late afternoon. I merge to main a Stripe integration that opens a payment webhook endpoint. Vercel pushes the preview build automatically, and three minutes later the icon turns red. I click. Build-time stack trace:

Error: STRIPE_SECRET_KEY missing
    at Object.<anonymous> (/.next/server/chunks/lib_stripe.js:9:11)
    at Module._compile (node:internal/modules/cjs/loader:1376:14)

Production works, it has the env var. The preview doesn't have the Stripe secret — I had forgotten to push it into the Vercel preview env. Operator error on my side, fine. But one question remains: why does next build crash at module load on a module that's never supposed to run during a static build?

Why `next build` runs the top level of my modules

The answer fits in one line in the Next.js docs, and it's easy to miss. The Next.js compiler doesn't just transform TypeScript into JavaScript. To analyze API routes, tree-shake, and prepare the serverless runtime, it runs the top level of every imported module. Concretely, my lib/stripe.ts looked like this at the time:

import Stripe from 'stripe'

export const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, {
  apiVersion: '2026-03-25.dahlia',
})

new Stripe(...) is an immediately-evaluated expression. The Stripe SDK validates the key in its constructor and throws if it's undefined. That validation therefore fires during next build, before any real request exists. My webhook endpoint was never called, but the mere fact that app/api/webhooks/stripe/route.ts imports lib/stripe.ts is enough to trigger module execution — and the crash.

The Stripe SDK is right to validate its key early. The fail-fast principle (Shore, 2004) says that a system should fail as close as possible to the cause of the error. In production that's exactly what I want: a missing secret should crash on startup, not three days later on a rare call. The problem is that fail fast becomes fail at build in an architecture where the build is a strict environment, distinct from the runtime environment.

The trap is not Stripe

I dug a bit through the repo after that Friday. The same trap awaits every SDK that validates its credentials in the constructor. The list is longer than you'd think: Twilio, certain official OpenAI and Anthropic clients depending on version, several Google Cloud SDKs, the Brevo client in strict mode. Each has its equivalent of throw new Error('XXX_API_KEY missing') in the constructor, and each will break your build the same way as soon as you import it from a route Next.js compiles.

The symptom typically shows up on preview builds. Production has every secret, local dev has a complete .env.local, but CI and previews carry subsets of env vars depending on team policy. A recent route runs through CI for the first time, and the build falls over.

The pattern: Proxy plus lazy getter

The fix fits in fifteen lines. The principle: never create the SDK client at the top level. Instead, expose a Proxy object that, on every property access, instantiates the client if needed and delegates. A missing-credentials error surfaces only on the first real API call.

// lib/stripe.ts
import Stripe from 'stripe'

let _stripe: Stripe | null = null

function getStripe(): Stripe {
  if (_stripe) return _stripe
  const key = process.env.STRIPE_SECRET_KEY
  if (!key) throw new Error('STRIPE_SECRET_KEY missing')
  _stripe = new Stripe(key, { apiVersion: '2026-03-25.dahlia' })
  return _stripe
}

export const stripe = new Proxy({} as Stripe, {
  get(_target, prop, receiver) {
    const client = getStripe()
    const value = Reflect.get(client, prop, receiver)
    return typeof value === 'function' ? value.bind(client) : value
  },
})

Three things to note in this code. First, the Proxy is exported with the same name and the same type as the previous export — stripe: Stripe. Every existing caller doing stripe.checkout.sessions.create(...) keeps working without a single change. That's the main reason to choose Proxy over an exported getStripe() you'd have to call everywhere: you avoid touching 30 or 40 files that consume the SDK's public API.

Second, the bind(client) on methods is necessary because Stripe SDK methods use this internally. Without bind, you lose context across the Proxy hop and you get TypeError: Cannot read properties of undefined.

Third, the _stripe cache isn't a performance detail — it's a consistency guarantee. Without it, every property access would create a new client, which would break stateful behaviors (the SDK's internal rate limiters, for example) and multiply HTTP keep-alive connections.

When to apply the pattern, and when not to

The pattern pays off whenever an SDK is consumed by a rarely-exercised route — webhooks, admin endpoints, cron jobs that only run via Vercel scheduled — and the secret isn't systematically present in every build environment. That's exactly the Stripe webhook case for me: one caller, one environment (production) with the key.

Conversely, if the SDK is consumed everywhere in the app and its absence at build means your app cannot function, the Proxy only protects you symbolically. You're just shifting the crash from build to first-render of the first page, which is rarely an improvement. In that case, put the secret everywhere and don't invent a pattern.

A small middle ground: if the SDK has a dry-run mode or a mock client, instantiate that client when the secret is missing instead of throwing. It's more surgical, but it assumes the SDK provides the option — and few do.

What you can copy

The code above is fully copyable, modulo the SDK name and the env variable name. Three common adaptations:

// Twilio
import twilio from 'twilio'
let _client: ReturnType<typeof twilio> | null = null
function getClient() {
  if (_client) return _client
  const sid = process.env.TWILIO_ACCOUNT_SID
  const token = process.env.TWILIO_AUTH_TOKEN
  if (!sid || !token) throw new Error('TWILIO credentials missing')
  _client = twilio(sid, token)
  return _client
}
export const twilioClient = new Proxy({} as ReturnType<typeof twilio>, {
  get(_t, prop, r) {
    const c = getClient()
    const v = Reflect.get(c, prop, r)
    return typeof v === 'function' ? v.bind(c) : v
  },
})

The pattern isn't a revolution, and it isn't new — it's just rarely formulated this way by SDK docs, which push you toward the new Client(...) top-level that was the right reflex pre-serverless. In the era of compiled builds and multi-env previews, the top-level constructor has become a silent trap, and these fifteen lines neutralize it.

My question for you: how many top-level SDK imports do you currently have in a module that an API route imports? On Rembrandt I had four — I migrated the other three after the Stripe incident, in anticipation of the day one of their secrets would disappear from a build environment.

Companion code: rembrandt-samples/lazy-sdk-proxy/ — lazy-Proxy pattern on Stripe + Twilio + Anthropic SDKs, MIT, copy-pastable.

DEV Community: Michel Faure

Forcez Claude Code à vous contredire : 14 règles, install en 1 commande

L'enseignement de 60 jours, le ROI en trois axes

Le diagnostic — l'incident qui a déclenché la doctrine

R4 Falsify before fix, la seule règle exposée ici

Le toolkit s'applique à lui-même, chiffres et rétractations

Steal three things in 20 minutes

Coda

Make Claude Code disagree with you: a 14-rule counterpart toolkit (install in 1 command)

The 60-day lesson, ROI on three axes

The diagnosis — the incident that triggered the doctrine

R4 Falsify before fix, the one rule exposed here

The toolkit applied to itself — figures and retractions

Steal three things in 20 minutes

Coda

La règle du jour-jeté-à-la-poubelle : lis le code avant de laisser ton IA en écrire

Six heures du matin, devant la sortie

Pourquoi un agent invente à côté de l'existant

Phase 0 — deux minutes, un fichier

Le signal métacognitif

La règle

The 1-day-thrown-away rule: read the code before letting your AI write new code

Six in the morning, looking at the output

Why an agent invents next to the existing code

Phase 0 — two minutes, one file

The metacognitive signal

The rule

Pourquoi ton audit DB trouve toujours plus que ton inventaire ne disait

Le ticket disait deux

Le mécanisme

Le protocole

La règle

Clôture

Why your DB audit always finds more than your inventory says

The ticket said two

The mechanism

The protocol

The rule

Closing

Cinq modes de défaillance silencieuse, codifiés après 35 jours d'ERP en solo

L'agent ne se trompe pas au hasard

Mode 1 — Le correctif qui ne corrige pas

Mode 2 — Le test qui passe par construction

Mode 3 — La mémoire qui confabule

Mode 4 — Le compteur qui ment

Mode 5 — Le scope qui rampe

Ce que tu peux copier dans ton projet

Ce qui se décante

Five silent failure modes I codified after 35 effective days of solo ERP coding

The agent doesn't fail at random

Mode 1 — The fix that doesn't fix

Mode 2 — The test that passes by construction

Mode 3 — The memory that confabulates

Mode 4 — The count that lies

Mode 5 — The scope that creeps

What you can copy into your project

What settles

La config SaaS que tu ne peux pas `git diff` : un audit de 30 secondes avant tout `update`

Le grep dans le mauvais système

Deux régimes, un seul réflexe

Trente secondes, quatre commandes

La règle, en une phrase étrangère

The SaaS config you can't `git diff`: a 30-second audit before every `update`

Grepping in the wrong system

Two regimes, one reflex

Thirty seconds, four commands

The rule, in one foreign sentence

Quinze lignes de Proxy pour qu'un SDK ne casse plus mon CI

Le vendredi où Vercel a refusé mon merge

Pourquoi next build exécute le top-level de mes modules

Le piège n'est pas Stripe

Le pattern : Proxy plus getter paresseux

Quand appliquer le pattern, et quand ne pas le faire

Ce que tu peux copier

Fifteen lines of Proxy to keep an SDK from breaking my CI

The Friday Vercel refused my merge

Why next build runs the top level of my modules

The trap is not Stripe

The pattern: Proxy plus lazy getter

When to apply the pattern, and when not to

Pourquoi `next build` exécute le top-level de mes modules

Why `next build` runs the top level of my modules