<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kingsley Onoh</title>
    <description>The latest articles on DEV Community by Kingsley Onoh (@kingsleyonoh).</description>
    <link>https://dev.to/kingsleyonoh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F568563%2F01c09769-b072-47a3-9c7b-16fefc2c573e.png</url>
      <title>DEV Community: Kingsley Onoh</title>
      <link>https://dev.to/kingsleyonoh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kingsleyonoh"/>
    <language>en</language>
    <item>
      <title>JSONB Was Fine. The Side Effects Needed a State Boundary.</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Tue, 16 Jun 2026 10:26:01 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/jsonb-was-fine-the-side-effects-needed-a-state-boundary-g5c</link>
      <guid>https://dev.to/kingsleyonoh/jsonb-was-fine-the-side-effects-needed-a-state-boundary-g5c</guid>
      <description>&lt;p&gt;What should happen when a checklist item sends a client message?&lt;/p&gt;

&lt;p&gt;In this portal, that question starts with a milestone stored as a JSON object inside &lt;code&gt;projects.milestones_json&lt;/code&gt;. The same milestone can also be a task linkage through &lt;code&gt;tasks.milestoneKey&lt;/code&gt;. It can produce a client-visible update. It can fire a Notification Hub event. It can change what the client sees in the portal and what the operator sees in the CLI.&lt;/p&gt;

&lt;p&gt;That is too much authority for one checklist item.&lt;/p&gt;

&lt;p&gt;The early temptation was simple: keep milestones as JSONB because project setup needed to be fast. A client project does not need a full ceremony for every step. Sometimes the operator needs to add three milestones from the CLI, mark the first one done, and move on. A separate milestone table felt heavier than the problem.&lt;/p&gt;

&lt;p&gt;I still think that part was right.&lt;/p&gt;

&lt;p&gt;The part I got wrong was assuming the storage choice was the design decision. It wasn't. The real decision was what happens when that flexible JSON object creates side effects outside the JSON field.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem
&lt;/h2&gt;

&lt;p&gt;A milestone stored as JSONB is easy to edit and hard to govern. Postgres will store the array. Drizzle will read it back. TypeScript can normalize the shape. None of that answers the business question: when a milestone is marked done, who is allowed to know?&lt;/p&gt;

&lt;p&gt;The portal had several competing truths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The admin route knows which project is being changed.&lt;/li&gt;
&lt;li&gt;The client record knows whether notifications are enabled.&lt;/li&gt;
&lt;li&gt;The project knows whether it is client-visible or internal-only.&lt;/li&gt;
&lt;li&gt;The milestone object knows its own &lt;code&gt;notifyClient&lt;/code&gt; and &lt;code&gt;notificationMode&lt;/code&gt; values.&lt;/li&gt;
&lt;li&gt;The task table knows whether linked work is complete.&lt;/li&gt;
&lt;li&gt;The notification layer knows whether to emit an event.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any one of those edges is checked loosely, the result can be wrong while every individual function still returns &lt;code&gt;200&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That is the kind of failure that bothers me most. Not a crash. A clean success response attached to the wrong business truth.&lt;/p&gt;

&lt;p&gt;The build journals had already taught that lesson elsewhere. One batch found a path where the URL project and update project could diverge inside the same tenant. Another found joined project and client rows being hydrated by ID only after the root row had been tenant-scoped. The first query was safe. The side edge was not.&lt;/p&gt;

&lt;p&gt;I was wrong to treat tenant isolation as a problem solved at the start of a request. Every side effect has its own identity boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Constraints
&lt;/h2&gt;

&lt;p&gt;I did not want a new table just to make the design feel pure. The system was still an internal operating tool. The PRD target was under 50 clients and 20 peak requests per second. The operator needed speed more than relational ceremony.&lt;/p&gt;

&lt;p&gt;Milestones also had to stay pleasant from the CLI. The code already had a route that marks a milestone done by a 1-based index. That is not glamorous, but it matches how operators think: first milestone, second milestone, third milestone. Forcing every milestone through IDs too early would make the command surface worse.&lt;/p&gt;

&lt;p&gt;But the shortcut had limits.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;projects.milestones_json&lt;/code&gt; can hold flexible milestone data. It cannot decide whether an email should be sent. It cannot prove the project is tenant-scoped. It cannot decide whether a task title should be backfilled into a milestone key. It cannot stop an internal-only project from producing a client-visible message.&lt;/p&gt;

&lt;p&gt;So the storage stayed flexible, and the side effects became strict.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design
&lt;/h2&gt;

&lt;p&gt;The core helper is &lt;code&gt;normalizeMilestones()&lt;/code&gt; in &lt;code&gt;src/lib/milestones.ts&lt;/code&gt;. It does the unglamorous work first: discard empty names, preserve known keys, generate missing keys, deduplicate collisions, and coerce &lt;code&gt;done&lt;/code&gt; into a real boolean. That gave the JSONB field a predictable shape without changing the database design.&lt;/p&gt;

&lt;p&gt;Then &lt;code&gt;syncProjectMilestoneStatus()&lt;/code&gt; handles the inverse direction. If tasks are linked to a milestone key, their state can mark a milestone complete only when the linked tasks justify it. That lets the task table and JSONB array communicate without pretending JSONB is relational.&lt;/p&gt;

&lt;p&gt;The more interesting function sits in &lt;code&gt;src/routes/admin/projects.ts&lt;/code&gt;. It is small enough to look boring, which is why I like it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;shouldEmitMilestoneNotification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;notifyClient&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;notificationMode&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;silent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;material_updates&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;all&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;preferenceMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NotificationPreferenceMode&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;notificationsEnabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;projectVisibility&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;projects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;visibility&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enumValues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notificationsEnabled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;client notifications disabled&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;projectVisibility&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;internal_only&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;project is internal only&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notifyClient&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notifyClient false&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notificationMode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;silent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notificationMode silent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notifyClient&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notifyClient true&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notificationMode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;material_updates&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notificationMode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;all&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`notificationMode &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notificationMode&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;preferenceMode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;material_updates&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;preferenceMode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;all&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`project notification preference &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;preferenceMode&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;emitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`project notification preference &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;preferenceMode&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a clever algorithm. It is a contract.&lt;/p&gt;

&lt;p&gt;The order matters. Client notifications being disabled beats everything. Internal-only project visibility beats an eager milestone setting. An explicit &lt;code&gt;notifyClient: false&lt;/code&gt; beats the project preference. Silent mode beats a general setting. Only after those denials does the function permit an event.&lt;/p&gt;

&lt;p&gt;That order reflects the business. The safest choice must win first.&lt;/p&gt;

&lt;p&gt;The route around it does the heavier lifting. It loads the project by &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;tenantId&lt;/code&gt;. It loads the client by &lt;code&gt;id&lt;/code&gt; and the same &lt;code&gt;tenantId&lt;/code&gt;. It normalizes milestones before touching the selected 1-based index. It treats an already-completed milestone as idempotent. It writes a portal update. Only then does it ask whether to emit the Notification Hub event.&lt;/p&gt;

&lt;p&gt;The notification event itself is fire-and-forget. That is a separate design choice from the milestone boundary. The client state change should not roll back because an email layer is unavailable. The update exists. The portal can show it. The notification failure can be logged and retried outside the request path.&lt;/p&gt;

&lt;p&gt;Tests make the boundary real. The event tests cover milestone completion, suppression when &lt;code&gt;notifyClient&lt;/code&gt; is false, suppression when project preference is portal-only, and the case where a portal update should exist even when email does not fire. Those are not framework tests. They are business truth tests.&lt;/p&gt;

&lt;p&gt;The backfill route is another scar. Tasks gained &lt;code&gt;milestoneKey&lt;/code&gt; after milestone JSON already existed. The route defaults to dry-run and requires &lt;code&gt;confirm=true&lt;/code&gt; before it mutates task titles and milestone keys. That is what a state boundary looks like when the data model changes underneath a live operator workflow: preview first, then write.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Surprised Me
&lt;/h2&gt;

&lt;p&gt;I expected the risky part to be JSONB. It was not.&lt;/p&gt;

&lt;p&gt;The risky part was side-effect drift. A JSON object can be perfectly valid and still produce the wrong portal update. A project can be tenant-scoped and still attach a joined row carelessly later. A notification can have the right event name and the wrong visibility rule.&lt;/p&gt;

&lt;p&gt;That changed how I read the rest of the portal code. I stopped asking only, "Is the row scoped?" I started asking, "Which other facts will this action create, and do they share the same boundary?"&lt;/p&gt;

&lt;p&gt;That question shows up everywhere in this project: comments, reports, document caches, capacity notices, stale work, handoff summaries, and client asks. The portal is not just storing facts. It is deciding which facts are safe to expose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;The final design kept the flexibility that made milestones useful from the CLI, but moved the risk into explicit gates. JSONB still holds the editable project steps. Tasks can link to those steps. Completion can produce a portal update. Notifications can fire only when the client, project, milestone, and preference rules agree.&lt;/p&gt;

&lt;p&gt;The latest recorded build gate reached 416 total tests, with 398 passing and 18 skipped. More important than the count, the tests caught wrong-but-running states: missing ask classification, stale internal project noise, QA due noise, generic reply drafts, and untrimmed handoff output.&lt;/p&gt;

&lt;p&gt;That is the transferable part. Flexible storage is fine when the business fact is local. The moment it creates a side effect, treat it like a state machine.&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>postgres</category>
      <category>jsonb</category>
      <category>statemachines</category>
    </item>
    <item>
      <title>Why I made OR-Tools prove it was better than the deterministic dispatcher</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Mon, 15 Jun 2026 20:05:19 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/why-i-made-or-tools-prove-it-was-better-than-the-deterministic-dispatcher-46i9</link>
      <guid>https://dev.to/kingsleyonoh/why-i-made-or-tools-prove-it-was-better-than-the-deterministic-dispatcher-46i9</guid>
      <description>&lt;p&gt;Dispatch optimization needs a lower bound before it needs a clever objective.&lt;/p&gt;

&lt;p&gt;In the first real OR-Tools integration, the solver selected fewer assignments than the deterministic fallback it needed to improve. That result made the boundary explicit: CP-SAT could optimize cost, priority, and tie-breakers only after it matched or beat the deterministic feasible assignment count.&lt;/p&gt;

&lt;p&gt;The constraint changed how I treated OR-Tools inside the dispatch engine. I had treated the solver as the smarter engine in the room. The code reminded me that dispatch combines math with an operating record. A plan has timestamps, frozen work, post-selection capacity checks, replay metrics, and explanations a dispatcher can defend after the board changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tempting version
&lt;/h2&gt;

&lt;p&gt;The tempting version is simple. Build one boolean variable per eligible technician-job decision. Add constraints for job uniqueness, technician capacity, planning windows, and frozen work. Maximize the objective. Return the result.&lt;/p&gt;

&lt;p&gt;That version reads well in a design doc. It is also too trusting for dispatch.&lt;/p&gt;

&lt;p&gt;A field-service board has commitments. A dispatcher accepts a plan. A technician starts driving. A supervisor freezes a job. A customer is waiting against an SLA clock. If the solver returns an answer that is mathematically feasible but operationally worse, the system still has to notice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The code that changed the contract
&lt;/h2&gt;

&lt;p&gt;The final adapter runs deterministic solving first, uses that count as a lower bound, and then lets CP-SAT optimize within that boundary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;vars&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;linearArgs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;decisions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;_&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;deterministicAssignmentCount&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;fallback&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;solve&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;assignments&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;size&lt;/span&gt;
&lt;span class="nf"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deterministicAssignmentCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;_&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;addGreaterOrEqual&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;LinearExpr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vars&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="nv"&gt;deterministicAssignmentCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;toLong&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;coeffs&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;decisions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;map&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;assignmentReward&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000L&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;toLong&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000L&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;cost&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;total&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;setScale&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;BigDecimal&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;RoundingMode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;HALF_UP&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;toLong&lt;/span&gt;
  &lt;span class="n"&gt;assignmentReward&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;jobIndex&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;toLong&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100L&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;
    &lt;span class="nf"&gt;technicianIndex&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;technician&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;toLong&lt;/span&gt;
&lt;span class="o"&gt;}.&lt;/span&gt;&lt;span class="py"&gt;toArray&lt;/span&gt;
&lt;span class="nv"&gt;cp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;maximize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;LinearExpr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;weightedSum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vars&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;coeffs&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sample is from &lt;code&gt;OrToolsSolverAdapter.solveWithCpSat&lt;/code&gt;. The assignment reward is intentionally large. Priority affects the reward. Cost is scaled to an integer. Job and technician indexes act as stable tie-breakers.&lt;/p&gt;

&lt;p&gt;The line that matters most is not the maximize call. It is &lt;code&gt;cp.addGreaterOrEqual(LinearExpr.sum(vars), deterministicAssignmentCount.toLong)&lt;/code&gt;. That line says the solver is allowed to optimize, but it is not allowed to schedule less work than the deterministic feasible path already found.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why deterministic sequencing stayed
&lt;/h2&gt;

&lt;p&gt;Even after CP-SAT selects decisions, the system does not blindly stamp them into the board. It passes selected decisions through deterministic scheduling. That second stage can still reject work for capacity or planning-window overflow.&lt;/p&gt;

&lt;p&gt;At first, that felt redundant. If the solver has constraints, why check again?&lt;/p&gt;

&lt;p&gt;Because the dispatch plan is not only a set of pairs. It is a sequence of visits with concrete start times, travel, overtime, and explanation codes. Stable timestamps matter for replay. Stable rejection reasons matter for support. The deterministic layer turns selected pairs into an operating plan that looks the same when the same input snapshot is replayed.&lt;/p&gt;

&lt;p&gt;That also protects partial plans. A solver timeout or infeasible slice should not fabricate certainty. The domain has reason codes such as &lt;code&gt;missing_capability&lt;/code&gt;, &lt;code&gt;frozen_assignment&lt;/code&gt;, &lt;code&gt;capacity_exceeded&lt;/code&gt;, &lt;code&gt;outside_planning_window&lt;/code&gt;, and &lt;code&gt;solver_timeout&lt;/code&gt;. A partial plan with honest unscheduled work is safer than a complete-looking plan built on silence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frozen work was the real domain invariant
&lt;/h2&gt;

&lt;p&gt;The solver failure was loud because it affected assignment count. Frozen work is quieter and more dangerous.&lt;/p&gt;

&lt;p&gt;The constraint builder treats accepted, completed, and frozen assignments as hard facts. A technician who conflicts with frozen work is rejected. A job that would collide with preserved work does not get moved just because the global objective improves.&lt;/p&gt;

&lt;p&gt;That choice is easy to miss if you only look at optimization. A solver optimizes variables. Dispatchers manage promises. Once a human has accepted work, the board has a memory. The optimizer has to respect that memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;The surprise was not that OR-Tools needed constraints. That is normal. The surprise was that the deterministic implementation became a guardrail for the solver rather than dead code waiting to be deleted.&lt;/p&gt;

&lt;p&gt;I kept it for three reasons.&lt;/p&gt;

&lt;p&gt;First, it gives the CP-SAT model a feasible assignment lower bound. Second, it gives the app a fallback when native solver loading, runtime failure, or timeout happens. Third, it gives replay a baseline that operators can compare against using SLA hit rate, travel minutes, overtime minutes, churn moves, unscheduled jobs, and solve time.&lt;/p&gt;

&lt;p&gt;That makes the deterministic path part of the product, not a temporary scaffold.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff
&lt;/h2&gt;

&lt;p&gt;The cost is extra machinery. There are two solve paths. There is trace metadata. There are post-selection checks. There are tests that assert OR-Tools was invoked, no fallback happened, and deterministic results still match where they should.&lt;/p&gt;

&lt;p&gt;The benefit is that optimization no longer gets special trust. It has to earn its place inside the operating record.&lt;/p&gt;

&lt;p&gt;That is the lesson I took from this build: in systems that move real work, a smarter algorithm is not automatically the source of truth. Sometimes the older deterministic code is the witness that keeps the new optimizer honest.&lt;/p&gt;

</description>
      <category>scala</category>
      <category>ortools</category>
      <category>deterministicsystems</category>
      <category>dispatchoptimization</category>
    </item>
    <item>
      <title>Evidence Beats Certainty: Why My Classifier Refuses to Pretend Every Product Has an Answer</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Sat, 13 Jun 2026 21:43:17 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/evidence-beats-certainty-why-my-classifier-refuses-to-pretend-every-product-has-an-answer-1n8l</link>
      <guid>https://dev.to/kingsleyonoh/evidence-beats-certainty-why-my-classifier-refuses-to-pretend-every-product-has-an-answer-1n8l</guid>
      <description>&lt;p&gt;Batch 010 found a bug that looked like good news.&lt;/p&gt;

&lt;p&gt;The classification worker was finishing its work. Runs moved through the database. Product rows had candidate tariff codes. The regression suite was far enough along that a casual glance could have treated the classifier as alive.&lt;/p&gt;

&lt;p&gt;Then one test forced three uncomfortable cases through the loop: no candidate, weak confidence, and a near tie. All three came back looking too clean. The worker was persisting the run as &lt;code&gt;classified&lt;/code&gt;, even when the evidence said the product needed review or had no supportable recommendation.&lt;/p&gt;

&lt;p&gt;That is the kind of bug I worry about in compliance software. Not the loud crash. The green row.&lt;/p&gt;

&lt;p&gt;A customs classifier can fail by throwing an exception. That failure is annoying, but honest. The operator sees it. The queue stops. The job gets retried. The audit trail can say, plainly, that classification did not happen.&lt;/p&gt;

&lt;p&gt;The worse failure is a result that looks complete while the evidence underneath is missing or contested.&lt;/p&gt;

&lt;p&gt;That was the real Batch 010 scar. The engine already carried the domain rule in its intent: classification is evidence, not a label. But the persistence path was still treating classification as if the only final state that mattered was success. The runtime could produce rejected candidates and confidence values. The database could store failure reasons. The tests could express review states. One narrow path still flattened doubt into completion.&lt;/p&gt;

&lt;p&gt;I was wrong about where the risk sat. I expected the hard part to be selecting the tariff code. The harder problem sat one layer later: making sure the code was not selected when the evidence did not deserve that much authority.&lt;/p&gt;

&lt;p&gt;Customs data makes that tension obvious. A product row is rarely a clean ontology entry. It is a SKU, a commercial name, a description written by someone under time pressure, a country of origin, a jurisdiction, maybe a material list, maybe an intended use. The difference between a good HS or HTS recommendation and a dangerous one can be a phrase that is absent, ambiguous, or buried in the wrong field.&lt;/p&gt;

&lt;p&gt;So I made the classifier refuse to pretend. If a product lacks a candidate, it should be blocked. If the best candidate is too weak, it should go to review. If two candidates are close enough that the lower one is still meaningful, the engine should preserve that tie instead of hiding it behind a confident-looking status.&lt;/p&gt;

&lt;p&gt;The decision lives in a small Rust function, which is why I like it. The policy is not scattered across a UI badge, a worker branch, and a reporting query. The worker asks one question: given the runtime outcome, what status should the database store?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;outcome_decision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;RuntimeClassification&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;OutcomeDecision&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="py"&gt;.selected_code&lt;/span&gt;&lt;span class="nf"&gt;.is_none&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;OutcomeDecision&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"blocked"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;failure_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"no_candidate"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;has_tie_candidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;OutcomeDecision&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"needs_review"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;failure_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tie_candidate"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="py"&gt;.confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.82&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;OutcomeDecision&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"needs_review"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;failure_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"low_confidence"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;OutcomeDecision&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"classified"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;failure_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That function from &lt;code&gt;src/classification/outcome.rs&lt;/code&gt; is not clever. It is deliberately plain. It says the classifier has four questions to answer before it earns the right to call a run classified.&lt;/p&gt;

&lt;p&gt;First, did the runtime select any code at all? If not, the run is &lt;code&gt;blocked/no_candidate&lt;/code&gt;. The operator should not see an empty answer wearing the same status as a resolved classification.&lt;/p&gt;

&lt;p&gt;Second, did the runtime find a meaningful tie? The rule runtime marks lower-ranked matches as rejected candidates, and a near tie gets the reason &lt;code&gt;tie_score&lt;/code&gt;. In that case the selected code still matters, but it is not enough. The run becomes &lt;code&gt;needs_review/tie_candidate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Third, did the selected code clear the confidence floor? The current worker uses &lt;code&gt;0.82&lt;/code&gt; as the line below which a product should not pass as clean. That number is a code-backed threshold, not a production claim. It is there because the engine needs a deterministic boundary for review routing.&lt;/p&gt;

&lt;p&gt;Only after those checks does the run become &lt;code&gt;classified&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The order matters. No candidate is different from low confidence. Low confidence is different from a tie. A tie with a selected code is different from a rule pack that found nothing. If those cases all share a green status, the UI can only lie or become complicated later. If the status and reason are precise at write time, the rest of the product can stay simpler.&lt;/p&gt;

&lt;p&gt;The test that caught this is the kind of test I wish more systems had before they gained users. It does not test the happy path with a cotton shirt and a confident tariff code. It creates three products with names that force the worker to admit uncertainty.&lt;/p&gt;

&lt;p&gt;One row has no matching rule. One row matches with a confidence below the floor. One row matches two close candidates, &lt;code&gt;6205.20&lt;/code&gt; and &lt;code&gt;6205.30&lt;/code&gt;, close enough that the rejected candidate still belongs in the record. The assertion is not only that the worker completes three jobs. It checks the stored status, the &lt;code&gt;failure_reason&lt;/code&gt;, the selected code where one exists, the candidate code list, and the &lt;code&gt;tie_score&lt;/code&gt; reason inside rejected candidates.&lt;/p&gt;

&lt;p&gt;That last part matters. I did not want a review queue filled with vague work items that say, "please check this." I wanted the queue to carry the reason the machine gave up authority. A reviewer should know whether they are handling an empty result, a weak result, or a contested result.&lt;/p&gt;

&lt;p&gt;The same logic affects audit exports. An audit pack that says a product was classified is different from an audit pack that says the system found two close candidates and routed the run to review. In both cases, the export has value, but it answers a different question. One says, "here is the evidence behind the recommendation." The other says, "here is the evidence behind the refusal to recommend."&lt;/p&gt;

&lt;p&gt;That distinction changes the product shape. The engine stores matched rules, rejected alternatives, confidence, risk band, rule pack version, input snapshot, reviewer decisions, and failure reasons. It also freezes the product and rule pack facts at queue time. If the product description changes after the job enters the queue, the worker still evaluates the snapshot it was handed. If the active rule pack changes later, historical runs still point back to the pack version that produced them.&lt;/p&gt;

&lt;p&gt;That is slower to reason about than a direct request that always reads current product state. It is also safer. A compliance review is not asking, "what would the system say today?" It often asks, "what did the system know then, and why did it make that call?"&lt;/p&gt;

&lt;p&gt;What surprised me was how much of the architecture flowed from that one sentence.&lt;/p&gt;

&lt;p&gt;The classifier uses a PostgreSQL job table instead of pretending a background job is a fire-and-forget detail. A worker leases rows, marks attempts, and exits if a run is already terminal. Product import refuses rows that lack required facts such as SKU, name, description, country, jurisdiction, product type, materials, or intended use. Rule packs have activation gates before they become active. Reviewer overrides append structured corrections instead of mutating the machine result. Audit exports are rendered from frozen snapshots instead of live joins that could drift.&lt;/p&gt;

&lt;p&gt;Those choices sound separate until Batch 010 ties them together. If the worker writes the wrong status, every careful snapshot around it becomes less trustworthy. The audit export preserves the wrong conclusion. The review queue misses the item. The dashboard looks cleaner than the evidence. Optional integrations can fire the wrong event. A bad status is not a display bug. It is an evidence bug.&lt;/p&gt;

&lt;p&gt;The fix was small because the earlier design had already made room for it. The database had a status field and a failure reason. The runtime returned selected and rejected candidates. The tests could create all three edge cases. Once the regression exposed the lie, the code only had to make the domain decision explicit.&lt;/p&gt;

&lt;p&gt;I also changed how I read passing tests after that. A test that proves a worker completed is not enough for a compliance loop. Completion is only a transport fact. The domain fact is whether the stored row still carries the same uncertainty the runtime produced. That is why the regression checks &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;failure_reason&lt;/code&gt;, &lt;code&gt;selected_code&lt;/code&gt;, &lt;code&gt;candidate_codes&lt;/code&gt;, and rejected candidate reasons in one place. If any one of those drifts, the row may still look finished, but the evidence contract is broken.&lt;/p&gt;

&lt;p&gt;That is the lesson I took from it, and I mean lesson in the practical sense, not as a slogan. If a domain has reviewable uncertainty, model that uncertainty before the happy path spreads through the codebase.&lt;/p&gt;

&lt;p&gt;For this project, uncertainty has names: &lt;code&gt;no_candidate&lt;/code&gt;, &lt;code&gt;low_confidence&lt;/code&gt;, and &lt;code&gt;tie_candidate&lt;/code&gt;. Those names are not UI copy. They are durable outcomes.&lt;/p&gt;

&lt;p&gt;A classifier that always returns an answer is easy to demo. It is also easy to distrust. In customs work, the more serious promise is narrower: when the evidence is good enough, store the recommendation; when it is not, store the reason it stopped.&lt;/p&gt;

&lt;p&gt;That is why the Trade Compliance Classification Engine refuses to treat every product as solved. Certainty is useful only when the record can prove how it was earned.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>customs</category>
      <category>classification</category>
      <category>audit</category>
    </item>
    <item>
      <title>Why I Made Stale Forecasts Fail Instead of Falling Back to Do Nothing</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Fri, 12 Jun 2026 23:03:41 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/why-i-made-stale-forecasts-fail-instead-of-falling-back-to-do-nothing-7m8</link>
      <guid>https://dev.to/kingsleyonoh/why-i-made-stale-forecasts-fail-instead-of-falling-back-to-do-nothing-7m8</guid>
      <description>&lt;p&gt;The UI showed &lt;code&gt;ready&lt;/code&gt;, &lt;code&gt;do_nothing&lt;/code&gt;, and a blank reason field.&lt;/p&gt;

&lt;p&gt;A facility manager reading that screen would assume the engine had looked at the peak window, checked the assets, and decided there was nothing worth doing. The interface looked calm. The audit trail looked complete.&lt;/p&gt;

&lt;p&gt;That was not true.&lt;/p&gt;

&lt;p&gt;The forecast behind the plan had expired. The planner should never have scored it. But the fallback path did exactly what I told it to do: when no action was selected, choose &lt;code&gt;do_nothing&lt;/code&gt; so the operator gets a safe recommendation instead of an empty response.&lt;/p&gt;

&lt;p&gt;Safe fallback became false confidence.&lt;/p&gt;

&lt;p&gt;The product has a simple rule: physical feasibility comes before economics. A battery cannot discharge below its minimum state of charge. A building load cannot curtail past its comfort limit. A flat-rate tariff cannot justify peak curtailment because there is no peak signal to respond to. Those are business truths encoded as code.&lt;/p&gt;

&lt;p&gt;That rule exists because energy planning has a dangerous temptation: collapse every problem into money. If the demand charge is high enough, the spreadsheet always finds a saving. Real facilities do not work that way. Operators know some processes cannot move, some comfort limits cannot bend, and some battery cycles are not worth spending for a small peak reduction. The planner has to encode that judgment before it calculates expected savings.&lt;/p&gt;

&lt;p&gt;The bug came from treating stale forecasts like another physical constraint.&lt;/p&gt;

&lt;p&gt;In a real infeasible window, &lt;code&gt;do_nothing&lt;/code&gt; is useful. If every battery is depleted, every comfort limit blocks curtailment, and every flexible process is already at its limit, doing nothing is a valid operational recommendation. It tells the operator: the engine understood the window and found no feasible savings-positive action.&lt;/p&gt;

&lt;p&gt;A stale forecast is different. It means the engine did not have permission to reason about the window at all. The input is invalid. The correct output is a failed plan with an explicit reason.&lt;/p&gt;

&lt;p&gt;I got that boundary wrong in the first implementation.&lt;/p&gt;

&lt;p&gt;The code had all the pieces in separate places. &lt;code&gt;createPlan&lt;/code&gt; detected stale forecasts. &lt;code&gt;generateCurtailmentPlan&lt;/code&gt; recorded a stale-forecast rejection. But the bottom of the planner had a broad fallback: if no selected actions exist, add &lt;code&gt;do_nothing&lt;/code&gt;. That line was written for infeasible windows, not invalid input, but it had no way to know the difference.&lt;/p&gt;

&lt;p&gt;The fix looks small because the hard part was naming the boundary.&lt;/p&gt;

&lt;p&gt;Batch 009 had already moved the planner away from fake input. Forecast creation loads qualified interval readings from the database. Plans check that the selected forecast and tariff belong to the requested site. The selected band travels as &lt;code&gt;forecastBandKw&lt;/code&gt;, and the action payload records both &lt;code&gt;confidence_band&lt;/code&gt; and &lt;code&gt;forecast_band_driver&lt;/code&gt;. Those pieces made the failure more embarrassing, not less. The system had evidence discipline at the edges, then lost it in one central fallback.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nim"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;staleForecastFailureReason&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"stale forecast cannot be used for a new plan without explicit override"&lt;/span&gt;

&lt;span class="k"&gt;proc &lt;/span&gt;&lt;span class="nf"&gt;planStatusForDecision&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PlannerDecision&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;stale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staleForecastFailureReason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selectedActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ready"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"no feasible planner action"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;proc &lt;/span&gt;&lt;span class="nf"&gt;generateCurtailmentPlan&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TenantContext&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PlannerInput&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PlannerTariff&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PlannerAsset&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;PlannerDecision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="n"&gt;validateInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;newJArray&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;newJArray&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;newJArray&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;curtailAllowed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addStaleAndTariffRejections&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;selectedSavings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rejectMissingAssetTelemetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
      &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assetType&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"battery"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;addBatteryAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectedSavings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;addChargeBatteryAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectedSavings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assetType&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"building_load"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"flexible_process"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;addCurtailAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;curtailAllowed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectedSavings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;addShiftLoadAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectedSavings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;staleForecast&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%*&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"action_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"do_nothing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"no feasible savings-positive action after physical constraints"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="n"&gt;addBindingRejections&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;makeDecision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tariff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectedSavings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;and not staleForecast&lt;/code&gt; is the visible change. The real design change is above it: &lt;code&gt;planStatusForDecision&lt;/code&gt; owns the distinction between invalid input and feasible output.&lt;/p&gt;

&lt;p&gt;Before that split, status came from &lt;code&gt;selectedActions.len&lt;/code&gt;. If there was at least one selected action, the plan became ready. That is a bad proxy because selected actions can be generated by fallback logic. The status needs to know why the planner had no action.&lt;/p&gt;

&lt;p&gt;The stale forecast flag now travels through the planning path as an input validity marker, not just another rejected-action reason. It still appears in &lt;code&gt;rejectedActions&lt;/code&gt; so the UI can show the operator what blocked the run. But it also controls persisted status and &lt;code&gt;failure_reason&lt;/code&gt; so API consumers and replay logic do not treat the plan as a valid no-op decision.&lt;/p&gt;

&lt;p&gt;What surprised me was how much of the surrounding architecture existed because of this one boundary.&lt;/p&gt;

&lt;p&gt;The same boundary shows up in the schema. &lt;code&gt;curtailment_plans&lt;/code&gt; stores &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;confidence_band&lt;/code&gt;, &lt;code&gt;input_snapshot&lt;/code&gt;, &lt;code&gt;plan_actions&lt;/code&gt;, &lt;code&gt;rejected_actions&lt;/code&gt;, &lt;code&gt;savings_estimate&lt;/code&gt;, &lt;code&gt;risk_summary&lt;/code&gt;, and &lt;code&gt;failure_reason&lt;/code&gt; as separate fields. That separation matters because a failed plan with rejected actions is not the same thing as a ready plan with rejected actions. The operator sees the human explanation either way, but the status tells the rest of the system whether the plan can be accepted, replayed, or escalated.&lt;/p&gt;

&lt;p&gt;Forecasts store p10, p50, and p90 bands. Plans record &lt;code&gt;confidence_band&lt;/code&gt; and &lt;code&gt;forecast_band_kw&lt;/code&gt;. The service checks that a forecast belongs to the same site as the plan. It checks whether the tariff changed after the forecast. It checks whether the forecast is older than the allowed window. All of that is careful work, but one broad fallback at the bottom of the planner erased the meaning.&lt;/p&gt;

&lt;p&gt;That is the part I was wrong about. I assumed a safe fallback is always safer than a hard failure.&lt;/p&gt;

&lt;p&gt;In operational software, a false safe state can be worse than an error. An error asks for attention. A ready no-op plan closes the loop. It tells the operator they can move on.&lt;/p&gt;

&lt;p&gt;The tests now capture the boundary directly. One unit test calls &lt;code&gt;generateCurtailmentPlan&lt;/code&gt; with a stale forecast and asserts that no &lt;code&gt;do_nothing&lt;/code&gt; action is selected. Another calls &lt;code&gt;planStatusForDecision&lt;/code&gt; with stale input and asserts that the persisted status is &lt;code&gt;failed&lt;/code&gt;, not &lt;code&gt;ready&lt;/code&gt;. The Playwright journeys cover the other side of the behavior: when the inputs are valid but the constraints block action, the operator still sees recommended and rejected action sections, binding constraints, and decision controls.&lt;/p&gt;

&lt;p&gt;That is why I like this failure as a design story. It did not ask for more code. It asked for a better state model. The planner needed two kinds of negative answer: one where the business should not act because no feasible action exists, and one where the software should not answer because its input has expired. Both are negative. Only one is a recommendation.&lt;/p&gt;

&lt;p&gt;The same distinction shaped replay. Backtests compare planner, no-action, and threshold policies using the same historical input snapshot. A replay can include a no-action policy because it is an intentional baseline. That is different from a planner run falling into &lt;code&gt;do_nothing&lt;/code&gt; because its forecast input had expired. Same words. Different contract.&lt;/p&gt;

&lt;p&gt;It also shaped operator feedback. A user can accept, reject, or modify a ready plan, and &lt;code&gt;operator_feedback.original_snapshot&lt;/code&gt; preserves the recommendation at the time of the decision. That only works if ready means ready. If stale input can still reach ready status, the audit trail becomes a record of the operator reacting to a recommendation the engine should never have issued. The database can preserve the snapshot perfectly and still preserve the wrong thing.&lt;/p&gt;

&lt;p&gt;That is why I prefer status fields that carry domain meaning, even when they feel strict. &lt;code&gt;failed&lt;/code&gt; is not a bad product outcome when it protects the operator from bad evidence. A failure reason such as &lt;code&gt;stale forecast cannot be used for a new plan without explicit override&lt;/code&gt; gives the next workflow something honest to do: rebuild the forecast, refresh the tariff, or ask the operator for an override. A ready no-op gives downstream code no reason to pause.&lt;/p&gt;

&lt;p&gt;I now treat &lt;code&gt;do_nothing&lt;/code&gt; as a domain decision, not as an absence handler.&lt;/p&gt;

&lt;p&gt;That rule carries across the codebase. Missing asset telemetry becomes &lt;code&gt;ASSET_TELEMETRY_INVALID&lt;/code&gt; before scoring. A flat-rate tariff produces a tariff-matrix rejection before curtailment can enter selected actions. Battery state of charge and cycle limits reject discharge before expected savings are calculated. Each one is visible because the planner has to show what it refused to do.&lt;/p&gt;

&lt;p&gt;The result is less forgiving code, and that is the point. A planner that fails with a clear reason is safer than a planner that returns a calm answer from bad inputs.&lt;/p&gt;

&lt;p&gt;I would carry this further if I rebuilt the planner from scratch. &lt;code&gt;staleForecast&lt;/code&gt; is still a boolean moving through function calls. It works, and the tests pin the behavior, but an explicit input-validity type would make the boundary harder to blur later. Something like &lt;code&gt;PlanInputStatus&lt;/code&gt; could separate ready, stale forecast, tariff mismatch, and missing history before the planner sees any assets. That is a better shape for the next version because it makes invalid input impossible to confuse with an infeasible action set.&lt;/p&gt;

&lt;p&gt;The transferable lesson is narrow: fallback logic needs a domain name. If you cannot name the state it represents, it will eventually hide a state you meant to expose.&lt;/p&gt;

</description>
      <category>nim</category>
      <category>planning</category>
      <category>forecasting</category>
      <category>replay</category>
    </item>
    <item>
      <title>Why I Froze Simulation Inputs Before the Solver Ran</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Thu, 28 May 2026 13:24:24 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/why-i-froze-simulation-inputs-before-the-solver-ran-592</link>
      <guid>https://dev.to/kingsleyonoh/why-i-froze-simulation-inputs-before-the-solver-ran-592</guid>
      <description>&lt;p&gt;A planner can approve the right transfer for the wrong reason.&lt;/p&gt;

&lt;p&gt;That was the failure mode I kept coming back to while building the Inventory Allocation Simulator. The solver could be mathematically correct, the recommendation could have positive net value, and the UI could show a clean explanation. But if the explanation reread today's warehouse and SKU tables after yesterday's simulation finished, the audit trail would be fiction.&lt;/p&gt;

&lt;p&gt;Inventory data changes constantly. A lane gets disabled. A SKU margin changes. Inbound units arrive. Demand history is corrected because a stockout period was recorded as zero sales. If a completed simulation depends on the current state of those tables, its story changes every time the business updates its planning data.&lt;/p&gt;

&lt;p&gt;I was wrong to treat that as a reporting problem at first. It was a data contract problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure hiding in a normal design
&lt;/h2&gt;

&lt;p&gt;The first design looked harmless: store a simulation run, run the worker, persist recommendations, and render the detail page by joining back to warehouses, SKUs, inventory, lanes, and policies. Most CRUD systems are written that way because joins are cheap and normalized tables keep data clean.&lt;/p&gt;

&lt;p&gt;That design breaks the moment a simulation becomes evidence.&lt;/p&gt;

&lt;p&gt;The recommendation is not just a row saying transfer 30 units. It needs to explain the constraint that bound the decision, the demand scenario that created the shortage, the service-level tail left unmet, and the tradeoffs accepted. In this project those fields live in &lt;code&gt;explanation&lt;/code&gt;: &lt;code&gt;binding_constraints&lt;/code&gt;, &lt;code&gt;scenario_sensitivity&lt;/code&gt;, &lt;code&gt;accepted_tradeoffs&lt;/code&gt;, &lt;code&gt;net_value&lt;/code&gt;, and solver diagnostics.&lt;/p&gt;

&lt;p&gt;If the explanation uses live tables, a planner can open the same completed run on Tuesday and see different supporting facts from Monday. That is worse than no explanation. It creates confidence in a record that no longer matches the decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The constraint I chose
&lt;/h2&gt;

&lt;p&gt;I made simulation creation the boundary.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;create_simulation_run!&lt;/code&gt; authorizes the planner, validates the scenario count, captures every planning surface needed by the solver, and stores it inside &lt;code&gt;simulation_runs.input_snapshot&lt;/code&gt;. The worker consumes that snapshot. The detail page reads that snapshot. Demand scenarios are stored with the run. Completed runs do not ask the mutable catalog what the world looks like now.&lt;/p&gt;

&lt;p&gt;The core function is small, which is the point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight julia"&gt;&lt;code&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="nf"&gt; capture_simulation_input_snapshot&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;AbstractTenantAdminStore&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TenantContext&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;policy_id&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
&lt;span class="x"&gt;)&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;NamedTuple&lt;/span&gt;
    &lt;span class="n"&gt;authorize!&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"run_cancel"&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"simulation"&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parsed_policy_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_uuid_value&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policy_id&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_snapshot_policy&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parsed_policy_id&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="x"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="x"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;warehouses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_warehouse_response&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fetch_warehouses&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_snapshot_page&lt;/span&gt;&lt;span class="x"&gt;())],&lt;/span&gt;
        &lt;span class="n"&gt;skus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_sku_response&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fetch_skus&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_snapshot_page&lt;/span&gt;&lt;span class="x"&gt;())],&lt;/span&gt;
        &lt;span class="n"&gt;inventory_positions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_inventory_response&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fetch_inventory_positions&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_snapshot_page&lt;/span&gt;&lt;span class="x"&gt;())],&lt;/span&gt;
        &lt;span class="n"&gt;demand_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_demand_response&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fetch_demand_history&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_snapshot_page&lt;/span&gt;&lt;span class="x"&gt;())],&lt;/span&gt;
        &lt;span class="n"&gt;transfer_lanes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="x"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_lane_response&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="x"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fetch_transfer_lanes&lt;/span&gt;&lt;span class="x"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="x"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_snapshot_page&lt;/span&gt;&lt;span class="x"&gt;())],&lt;/span&gt;
    &lt;span class="x"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That snapshot is not elegant. It is intentionally blunt. The run carries the policy, warehouses, SKUs, inventory positions, demand history, and transfer lanes it used. &lt;code&gt;SNAPSHOT_MAX_ROWS&lt;/code&gt; is set to &lt;code&gt;1_000_000&lt;/code&gt;, which tells you the tradeoff plainly: this is a batch planning system, not a real-time transfer executor.&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;The snapshot decision also fixed a forecasting bug before it could become a solver bug.&lt;/p&gt;

&lt;p&gt;Stockout periods are dangerous because they make demand look low. In &lt;code&gt;clean_demand_history&lt;/code&gt;, the system stores both observed units and adjusted units. If demand was zero and lost sales were 82, the cleaned demand is 82. That value feeds the scenario generator. The stockout row also inflates uncertainty, because a period with unavailable inventory is less trustworthy than normal sales.&lt;/p&gt;

&lt;p&gt;The Batch 019 test mutates live inventory and demand after a run is created. Then it runs the worker and checks that the scenario baseline still comes from the frozen snapshot, not the updated live row. That was the test that made the architecture feel real. It did not just prove a function. It proved the system can remember what it believed when the recommendation was created.&lt;/p&gt;

&lt;p&gt;The alternative was versioning every table. That would give better diff history, but it would also make the MVP harder to operate. I chose the snapshot because the project needed completed-run honesty more than a general temporal database. A future version could move to event-sourced planning records. This one needed a concrete audit boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the solver fits
&lt;/h2&gt;

&lt;p&gt;The solver reads the frozen snapshot and stored scenarios, then builds a JuMP model over lanes, SKUs, inventory, service level, warehouse capacity, transfer cost, and safety stock. The model is allowed to fail. In fact, readable failure is part of the contract.&lt;/p&gt;

&lt;p&gt;A region rule can block every feasible transfer. A max transfer cost can make the plan infeasible. A timeout can return no acceptable incumbent. Those cases return diagnostics instead of pretending every scenario has a transfer.&lt;/p&gt;

&lt;p&gt;That mattered for the Journal topic because failure diagnostics have to describe the same world the solver saw. The &lt;code&gt;_constraint_report&lt;/code&gt; function can name &lt;code&gt;max_transfer_cost_cents&lt;/code&gt;, region blocking, sender safety stock, and receiver service-level constraints. If those facts came from live tables, the failure message could drift too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The result
&lt;/h2&gt;

&lt;p&gt;The deterministic solver fixture produces a 30-unit transfer from &lt;code&gt;WH-SURPLUS&lt;/code&gt; to &lt;code&gt;WH-NEED&lt;/code&gt;, with &lt;code&gt;lane_capacity&lt;/code&gt; and &lt;code&gt;receiver_service_level&lt;/code&gt; as binding constraints and a 30,900-cent net value. The large benchmark ran 50 warehouses, two thousand SKUs, and 100 scenarios in 17,928.4753 ms and generated two thousand recommendations.&lt;/p&gt;

&lt;p&gt;Those numbers matter less than the contract behind them. A completed run is not a report over current data. It is a preserved decision record. Once I made that boundary explicit, the rest of the system had somewhere honest to stand.&lt;/p&gt;

</description>
      <category>julia</category>
      <category>optimization</category>
      <category>simulation</category>
      <category>auditability</category>
    </item>
    <item>
      <title>Why I Moved Redis Acknowledgement Outside the Database Transaction</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Fri, 22 May 2026 10:22:55 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/why-i-moved-redis-acknowledgement-outside-the-database-transaction-1mdl</link>
      <guid>https://dev.to/kingsleyonoh/why-i-moved-redis-acknowledgement-outside-the-database-transaction-1mdl</guid>
      <description>&lt;p&gt;One good parcel event, one bad parcel event, one batch. That was enough to lose the good one.&lt;/p&gt;

&lt;p&gt;The consumer read both from Redis Streams. The first event was valid, so the app wrote a shipment event and a claim case, then acknowledged the Redis message. The second event had no tenant id, threw an exception, and rolled back the PostgreSQL transaction. Redis had already been told the first message was done.&lt;/p&gt;

&lt;p&gt;The database said nothing happened. Redis said the message was gone.&lt;/p&gt;

&lt;p&gt;That is the kind of bug that looks like an operations mystery later. A carrier says it sent the event. The stream no longer shows it as pending. The claim queue has no case. Everyone starts looking at logs, but the data has already contradicted itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I got wrong
&lt;/h2&gt;

&lt;p&gt;I treated Redis acknowledgement as part of processing. That was too early.&lt;/p&gt;

&lt;p&gt;The first version of &lt;code&gt;DeliveryEventConsumerJob.pollOnce()&lt;/code&gt; processed each stream record inside a &lt;code&gt;TransactionTemplate&lt;/code&gt;. It wrote through &lt;code&gt;ShipmentEventIngestionService.upsertFromEvent(...)&lt;/code&gt;, then acknowledged the message in the same loop. It felt clean because both actions sat inside the same method and both happened after the event handler returned.&lt;/p&gt;

&lt;p&gt;But Redis is not inside the PostgreSQL transaction. &lt;code&gt;XACK&lt;/code&gt; does not care whether the database later commits. Once Redis removes the message from the pending list, the recovery path changes. If the transaction rolls back after that ack, PostgreSQL loses the row and Redis loses the retry handle.&lt;/p&gt;

&lt;p&gt;That is not a duplicate problem. It is a vanished work problem.&lt;/p&gt;

&lt;p&gt;The codebase already had idempotency in the right place for duplicates. &lt;code&gt;ShipmentEventIngestionService&lt;/code&gt; builds a tenant-scoped dedup key from the event source and takes a PostgreSQL advisory transaction lock before inserting into &lt;code&gt;shipment_events&lt;/code&gt;. The table also has a &lt;code&gt;(tenant_id, dedup_key)&lt;/code&gt; uniqueness constraint. A replay can create at most one shipment event row and one claim case.&lt;/p&gt;

&lt;p&gt;I was wrong to optimize for avoiding replay. Replay was safe. Premature ack was not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the fix
&lt;/h2&gt;

&lt;p&gt;The fix was small, but the boundary it moved mattered.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;pollOnce&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TRUE&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;equals&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;processedMessageIds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transactionTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;lockManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tryAcquireTransactionLock&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;LOCK_NAME&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;streamClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensureConsumerGroup&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumerName&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;DeliveryGatewayTrackingEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;streamClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readPendingEvents&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ZERO&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;streamClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readNewEvents&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;messageIds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
        &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TenantContext&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tenantsById&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DeliveryGatewayTrackingEvent&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantsById&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;messageIds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messageIds&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processedMessageIds&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;processedMessageIds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;streamClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;acknowledgeAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processedMessageIds&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;processedMessageIds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The transaction now returns the message ids only after the database work succeeds. Only then does the job call &lt;code&gt;acknowledgeAll&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That creates two possible failure states, and both are tolerable.&lt;/p&gt;

&lt;p&gt;If the database fails, no ack happens. Redis still has the pending messages. The next poll retries them.&lt;/p&gt;

&lt;p&gt;If the database commits and the ack fails, Redis may replay messages whose database rows already exist. That is fine. The dedup key, advisory lock, and unique index collapse the replay back to the existing shipment event. The system may do extra work, but it will not create a second claim.&lt;/p&gt;

&lt;p&gt;This is the tradeoff I should have chosen from the start: duplicate work over lost work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the batch cache came later
&lt;/h2&gt;

&lt;p&gt;The ack fix uncovered a different problem. The 50 events/sec acceptance test passed in isolation, then failed during full regression. The batch took 1,805 ms against a 1,000 ms target.&lt;/p&gt;

&lt;p&gt;At first glance, that looked like Redis overhead. It wasn't. The consumer was reading 50 messages and doing one batched ack, but &lt;code&gt;process(...)&lt;/code&gt; still resolved the same tenant and integration setting once per event. One tenant, 50 events, 50 database lookups before the hot path even reached shipment ingestion.&lt;/p&gt;

&lt;p&gt;The fix was not a global cache. A global tenant cache would create stale security behavior and make feature flags harder to trust. The right cache lived inside one poll batch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;UUID&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TenantContext&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tenantsById&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DeliveryGatewayTrackingEvent&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantsById&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;messageIds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;process(...)&lt;/code&gt; now uses &lt;code&gt;computeIfAbsent&lt;/code&gt; to authorize each tenant once per poll. Every poll still checks whether the tenant has &lt;code&gt;delivery-gateway&lt;/code&gt; enabled. Nothing survives across polls. The latency win comes from removing repeated reads, not weakening the gate.&lt;/p&gt;

&lt;p&gt;What surprised me was how easy this bug was to miss. The system could pass duplicate-message tests, tenant-scope tests, and manual exception flow tests while still failing a real throughput target. It needed the acceptance test with 50 actual Redis messages and PostgreSQL writes to reveal the hidden cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The database boundary still does the hard work
&lt;/h2&gt;

&lt;p&gt;The consumer job is only the outer shell. The correctness boundary lives in &lt;code&gt;ShipmentEventIngestionService.upsertFromEvent(...)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That service takes the event, normalizes carrier, tracking number, status, timestamps, snapshots, and metadata. It builds an idempotency key under the tenant id. It locks that key with &lt;code&gt;pg_advisory_xact_lock&lt;/code&gt;. It upserts the shipment, inserts the event with &lt;code&gt;on conflict do nothing&lt;/code&gt;, updates the visible shipment status only when the event timestamp is not older, writes audit records, then asks &lt;code&gt;ClaimAutoCaseService&lt;/code&gt; whether the status deserves a case.&lt;/p&gt;

&lt;p&gt;Only three statuses create default cases: &lt;code&gt;failed_attempt&lt;/code&gt;, &lt;code&gt;returned&lt;/code&gt;, and &lt;code&gt;exception&lt;/code&gt;. A delivered scan does not open a claim. An unknown scan does not open a claim. The tracking timeline records context, but operations work begins only when the event status represents an operational exception.&lt;/p&gt;

&lt;p&gt;That separation matters because not every carrier signal should become work. A tracking system records facts. A claims system creates ownership.&lt;/p&gt;

&lt;h2&gt;
  
  
  The result
&lt;/h2&gt;

&lt;p&gt;The regression test that mattered is blunt: publish a valid event and then an invalid event in the same Redis batch. Run &lt;code&gt;pollOnce()&lt;/code&gt;. Assert that no shipment event and no claim case exist, and that both Redis messages remain pending.&lt;/p&gt;

&lt;p&gt;That test now passes.&lt;/p&gt;

&lt;p&gt;The throughput proof also passes: 50 delivered events process in under 1 second locally, and because they are delivered events, they create zero claim cases. The duplicate stream test publishes two messages with the same source event and gets one shipment event, one claim case, and zero pending Redis messages after successful processing.&lt;/p&gt;

&lt;p&gt;The lesson here is not "ack after commit" as a slogan. The real rule is narrower: if one system owns retry visibility and another system owns business durability, the retry signal must not be cleared until the business fact is committed.&lt;/p&gt;

</description>
      <category>redisstreams</category>
      <category>springboot</category>
      <category>idempotency</category>
      <category>postgres</category>
    </item>
    <item>
      <title>The Document Number Is Reserved Before the PDF Exists</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Mon, 11 May 2026 08:21:48 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/the-document-number-is-reserved-before-the-pdf-exists-5fdp</link>
      <guid>https://dev.to/kingsleyonoh/the-document-number-is-reserved-before-the-pdf-exists-5fdp</guid>
      <description>&lt;p&gt;The hard part of document numbering is not incrementing an integer.&lt;/p&gt;

&lt;p&gt;It is deciding what happens when the integer is reserved, rendering starts, and the render fails.&lt;/p&gt;

&lt;p&gt;PostgreSQL sequences are built for speed. They are not built for legal numbering. A sequence advances even if the surrounding transaction rolls back. For most applications that is fine. For invoices and board records, a gap is not invisible. If number 41 exists and number 43 exists, someone will ask what happened to 42.&lt;/p&gt;

&lt;p&gt;I needed a numbering system that could do three things at once: serialize per entity, type, and year; let different sequences run in parallel; and preserve a reason when a number is skipped.&lt;/p&gt;

&lt;p&gt;That became the two-phase allocator in &lt;code&gt;src/documents/numbering.ts&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Phase one reserves the next number. The allocator locks the tuple &lt;code&gt;(entity_id, document_type, year)&lt;/code&gt; with &lt;code&gt;pg_advisory_xact_lock&lt;/code&gt;, checks for the oldest unclaimed gap, and only then advances the high-water mark. It writes the reservation to &lt;code&gt;pending_allocations&lt;/code&gt; with a TTL.&lt;/p&gt;

&lt;p&gt;Phase two either finalizes the reservation against a document row or releases it into &lt;code&gt;sequence_gaps&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The core reservation shape is small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;acquireAllocatorLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;reservedNumber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;tryReclaimOldestGap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reservedNumber&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;reservedNumber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;computeNextHighWater&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`INSERT INTO pending_allocations
     (id, entity_id, document_type, year, reserved_number,
      reserved_by_api_key_id, reserved_at, expires_at, metadata)
   VALUES ($1, $2, $3, $4, $5, $6, now(),
           now() + ($7::text)::interval,
           $8::jsonb)`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;allocationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reservedNumber&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;apiKeyId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The advisory lock boundary matters. FZE invoices for 2026 serialize against other FZE invoices for 2026. LLC board resolutions do not wait on them. A document type has its own counter space because letterhead #1 and compliance letter #1 can both be real first documents in their own family.&lt;/p&gt;

&lt;p&gt;The part I got wrong early was the SQL shape for reclaiming a gap.&lt;/p&gt;

&lt;p&gt;The first version used an update with a subquery and a limit against the same table. PostgreSQL flattened the plan in a way that ignored the limit and updated every eligible row. That is the kind of bug that looks impossible until you inspect the affected rows. The fixed version uses a CTE target so the candidate set is materialized before the update runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`WITH target AS (
     SELECT id FROM sequence_gaps
      WHERE entity_id = $2 AND document_type = $3 AND year = $4
        AND reclaimed_at IS NULL
      ORDER BY reaped_at ASC
      LIMIT 1
      FOR UPDATE SKIP LOCKED
   )
   UPDATE sequence_gaps
      SET reclaim_allocation_id = $1, reclaimed_at = now()
    WHERE id IN (SELECT id FROM target)
    RETURNING id, gap_number`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;documentType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one CTE is the difference between reclaiming one documented gap and mutating the whole backlog.&lt;/p&gt;

&lt;p&gt;The reaper is the other half of the design. A reservation can expire before it is attached to a document. Maybe Puppeteer failed. Maybe the sidecar was down. Maybe storage rejected the upload. The service cannot just delete the reservation and pretend the number never happened. It moves the number to &lt;code&gt;sequence_gaps&lt;/code&gt; with &lt;code&gt;reason = 'reaper_swept'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There are also explicit reasons: abandoned reservation, admin documented gap, manual void. That vocabulary is important because auditors do not need a philosophical explanation. They need a row that says why the number did not produce a document.&lt;/p&gt;

&lt;p&gt;What surprised me was how much of the design was about not being too clever. I could have hidden this behind one &lt;code&gt;nextDocumentNumber()&lt;/code&gt; helper and let failures be retried. That would make the happy path smaller and the audit story weaker. The split between pending allocation and sequence gap is more verbose, but the data tells the truth.&lt;/p&gt;

&lt;p&gt;The same pattern shows up in the tests. The reaper-race gate is not a decorative concurrency test. It exists because the allocator is only correct under pressure: concurrent reservations, expired rows, reclaimed gaps, and failed finalization must all preserve one property. No two documents get the same number in the same sequence, and no missing number lacks a reason.&lt;/p&gt;

&lt;p&gt;The document number is reserved before the PDF exists because rendering is fallible. The audit trail has to survive that fallibility.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>concurrency</category>
      <category>numbering</category>
      <category>audit</category>
    </item>
    <item>
      <title>The Audit Trail Is a Data Structure, Not a Log Message</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Mon, 11 May 2026 08:21:05 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/the-audit-trail-is-a-data-structure-not-a-log-message-42mj</link>
      <guid>https://dev.to/kingsleyonoh/the-audit-trail-is-a-data-structure-not-a-log-message-42mj</guid>
      <description>&lt;p&gt;Logs can explain what a service thought happened.&lt;/p&gt;

&lt;p&gt;They do not prove what happened.&lt;/p&gt;

&lt;p&gt;Klevar Docs needed an audit trail for rendered documents, invoice events, credit note applications, signatures, voids, and attachments. The usual answer is an events table. Insert a row whenever something happens. Add timestamps. Keep it forever. That is useful, but it is still just a table unless the table can detect tampering.&lt;/p&gt;

&lt;p&gt;The hash chain is the difference.&lt;/p&gt;

&lt;p&gt;Each entity gets its own ordered chain. A row stores the event payload, a SHA-256 hash of the canonical payload, the previous row hash, the chain index, and a link hash that binds those values together. If someone changes a payload, deletes a prior row, swaps entity rows, or reorders entries, verification fails.&lt;/p&gt;

&lt;p&gt;The append path is transactional with the document change. That detail matters more than the hashing. If the document row commits and the chain row rolls back, the proof is incomplete. If the chain row commits and the document row rolls back, the proof references a thing that does not exist. &lt;code&gt;insertChainEntry()&lt;/code&gt; is designed to run inside the caller's transaction.&lt;/p&gt;

&lt;p&gt;The core logic is direct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allocRes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`SELECT fn_allocate_chain_index(&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;::uuid)::text AS idx`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chainIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BigInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allocRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;previousHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chainIndex&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;priorRes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`SELECT payload_hash FROM document_hash_chain
         WHERE entity_id = &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;::uuid
           AND chain_index = &lt;/span&gt;&lt;span class="p"&gt;${(&lt;/span&gt;&lt;span class="nx"&gt;chainIndex&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;::bigint`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;previousHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;priorRes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload_hash&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chainLinkHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;computeChainLinkHash&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;content_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;contentHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;previous_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;previousHash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;chain_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chainIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;entity_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two design choices hidden in that snippet.&lt;/p&gt;

&lt;p&gt;The index comes from &lt;code&gt;fn_allocate_chain_index(entity_id)&lt;/code&gt;, not a PostgreSQL sequence. The same rollback problem that makes sequences wrong for legal document numbers also applies to chain indices. A verifier expects the chain to be contiguous. If index 19 is missing because a transaction rolled back after a sequence increment, the verifier cannot know whether that is harmless or tampering.&lt;/p&gt;

&lt;p&gt;The link hash includes &lt;code&gt;entity_id&lt;/code&gt;. That prevents a row from one entity being copied into another entity's chain without detection. Klevar has one group boundary, but the legal proof is per entity. FZE, LLC, and Ltd cannot share a chain just because the service is single-tenant.&lt;/p&gt;

&lt;p&gt;The verifier is a walker, not a database query. It receives rows sorted by &lt;code&gt;chain_index&lt;/code&gt;, checks ordering, checks the genesis row, checks each &lt;code&gt;previous_hash&lt;/code&gt;, recomputes each link hash, and returns the first mismatch. It also reports intentionally broken indices. That last category is important because some retention or force-purge action may be documented rather than hidden. A broken chain can be honest if the break is recorded and visible.&lt;/p&gt;

&lt;p&gt;What surprised me was the dependency on canonical JSON. Hashing JavaScript objects directly is a trap because key order and serialization details can drift. The service pins &lt;code&gt;canonicalize@2.0.0&lt;/code&gt; and runs an RFC 8785 boot assertion before Fastify binds a port. If canonicalization changes, the server refuses to start. That is not paranoia. It is the cost of using hashes as legal proof.&lt;/p&gt;

&lt;p&gt;The hash chain also changed how I think about events. &lt;code&gt;events_emitted&lt;/code&gt; is the integration outbox for Hub and Webhook Engine fanout. It is operational. &lt;code&gt;document_hash_chain&lt;/code&gt; is proof. Those two surfaces overlap, but they are not the same thing. A notification can be retried, delayed, or dropped without changing the legal document. A chain append cannot be treated that way.&lt;/p&gt;

&lt;p&gt;The biggest tradeoff is operational weight. A chain gives you another invariant to maintain, another verifier to run, another repair story to document, and another failure mode to alert on. The alternative is worse: a document archive that can produce files but cannot prove nobody altered the history.&lt;/p&gt;

&lt;p&gt;For this system, the archive without the chain would be storage. The chain turns it into evidence.&lt;/p&gt;

</description>
      <category>hashchain</category>
      <category>audit</category>
      <category>canonicaljson</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Why a Rendered Invoice Can Still Fail Send</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Mon, 11 May 2026 08:20:52 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/why-a-rendered-invoice-can-still-fail-send-32ka</link>
      <guid>https://dev.to/kingsleyonoh/why-a-rendered-invoice-can-still-fail-send-32ka</guid>
      <description>&lt;p&gt;An invoice PDF can exist and still not be an invoice package.&lt;/p&gt;

&lt;p&gt;That sentence shaped the send path. The easy implementation would render the invoice, store the PDF, try to build the XML, and log a warning if the compliance layer failed. The client still gets a document. The API still returns success. The business can "fix it later."&lt;/p&gt;

&lt;p&gt;That is exactly the failure I did not want.&lt;/p&gt;

&lt;p&gt;Klevar Docs treats e-invoicing as part of finalization, not as decoration after rendering. For an invoice that routes to Factur-X, the send path has to build CII XML, validate it, embed it into the PDF, convert the container to PDF/A-3b, validate that with veraPDF, upload the XML, and replace the PDF with the conformant output. If any hard requirement fails, the send fails.&lt;/p&gt;

&lt;p&gt;The orchestrator is intentionally thin. &lt;code&gt;buildComplianceArtifacts()&lt;/code&gt; does profile resolution and dispatches to a branch. The Factur-X branch owns the hard path. The XRechnung branch owns the public-sector German XML path.&lt;/p&gt;

&lt;p&gt;The Factur-X branch is where the philosophy is visible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;valid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;validationRaw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;valid&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;markDocumentFacturXFailed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FacturXValidationRejectError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;schematron_failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;validationRaw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;validationRaw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="na"&gt;invoice_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;factur_x_validation_status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fail&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fallbackResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embedAndValidateWithFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;plainPdfBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;fallbackContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;fallbackDeps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;fallbackResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;conformant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PdfARequiredRejectError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pdf_a_non_conformant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stage_failed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fallbackResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stage_failed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;invoice_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two details in that snippet that matter.&lt;/p&gt;

&lt;p&gt;First, validation failure is a typed 422, not an internal server error. The invoice shape is wrong. The operator needs to fix the invoice, not restart the service. Missing BT fields, sidecar 4xx failures, and schematron failures all become the same class of business-visible rejection.&lt;/p&gt;

&lt;p&gt;Second, the document row is marked as failed before throwing. That was a later correction. Without it, a failed send could leave the row looking cleaner than reality because the broader transaction never committed. The code now best-effort updates &lt;code&gt;documents.factur_x_validation_status = 'fail'&lt;/code&gt; so a polling operator sees the truth after a rejection.&lt;/p&gt;

&lt;p&gt;What surprised me was how much logic had to live before the sidecar call. I expected mustangproject to be the hard part. The harder part was building the DTO without lying. Seller name, seller VAT ID, registration ID, address, client identity, tax category, reverse charge handling, unit code, document type, original invoice reference, payment terms, issue date, due date. Every one of those fields has a business rule.&lt;/p&gt;

&lt;p&gt;The builder is pure TypeScript for that reason. It reads frozen invoice, client, and entity snapshot data and returns a transport DTO for the Java sidecar. It does not query the database. It does not default from live entity state. It does not mutate the invoice.&lt;/p&gt;

&lt;p&gt;That purity paid off when credit notes entered the path. A credit note is not just a negative invoice. It carries a different document type code and can need a preceding invoice reference. The builder got an explicit discriminator for &lt;code&gt;invoice&lt;/code&gt; versus &lt;code&gt;credit_note&lt;/code&gt;, and the sidecar maps that into the XML. The alternative was to infer from amounts, which would have been clever and wrong.&lt;/p&gt;

&lt;p&gt;XRechnung is intentionally different. German public-sector invoices route to standalone XML. The human-readable PDF remains a companion, not the legal container. That means the branch skips PDF/A-3b embedding work and persists XML as the authoritative artifact. Today the validator status can be &lt;code&gt;pass&lt;/code&gt;, &lt;code&gt;fail&lt;/code&gt;, or &lt;code&gt;skipped&lt;/code&gt; because the KoSIT validation lane had an acknowledged gap. I kept that explicit instead of pretending both branches had identical maturity.&lt;/p&gt;

&lt;p&gt;The system now has an uncomfortable but correct behavior: it can render a beautiful PDF, then refuse to send it.&lt;/p&gt;

&lt;p&gt;That is the point. A valid-looking artifact is not enough. The service needs to know whether the document is acceptable for the legal path it is taking. For a plain letterhead, PDF bytes may be enough. For an invoice that routes to Factur-X, the XML and PDF/A container are part of the document. If they fail, the document failed.&lt;/p&gt;

&lt;p&gt;I would rather make the operator fix a rejection than let a client receive a file that only looks complete.&lt;/p&gt;

</description>
      <category>facturx</category>
      <category>xrechnung</category>
      <category>pdfa</category>
      <category>invoices</category>
    </item>
    <item>
      <title>The PDF Looked Correct Because the Template Was Wrong</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Mon, 11 May 2026 08:20:09 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/the-pdf-looked-correct-because-the-template-was-wrong-3hia</link>
      <guid>https://dev.to/kingsleyonoh/the-pdf-looked-correct-because-the-template-was-wrong-3hia</guid>
      <description>&lt;p&gt;The first FZE letterhead looked fine.&lt;/p&gt;

&lt;p&gt;That was the problem.&lt;/p&gt;

&lt;p&gt;The rendered PDF had the right legal name, the right registration label, the right address, the right contact line, and the right visual structure. It passed the visual check because every value in the template matched the entity I was testing with. Then I looked at what would happen if the same bundle rendered an LLC document.&lt;/p&gt;

&lt;p&gt;It would still say FZE.&lt;/p&gt;

&lt;p&gt;That failure is more dangerous than a crash. A crash stops the send path. A wrong legal identity in a PDF can leave the system looking healthy while the document is unusable. The template authoring lane had hardcoded FZE identity strings because the snapshot did not expose the fields the template needed. The implementation had chosen the fastest path to a green render instead of stopping at the missing data contract.&lt;/p&gt;

&lt;p&gt;I was wrong to treat the template as the place where legal identity could be finished. The template is presentation. The entity row is state. The document snapshot is the contract between them.&lt;/p&gt;

&lt;p&gt;The fix started by widening the entity surface. &lt;code&gt;captureEntitySnapshot()&lt;/code&gt; now freezes address, registration, contact, legal names, country code, VAT identifier, officer data, brand, banking config, retention posture, and document policy into the document row. It also redacts fields that should not land on PDFs, like private tax identifiers and operational banking rollout notes.&lt;/p&gt;

&lt;p&gt;The load-bearing part is that the snapshot always returns stable shapes. Handlebars runs in strict mode. If a template asks for &lt;code&gt;entity.address.single_line&lt;/code&gt;, the &lt;code&gt;address&lt;/code&gt; path must exist even when the row is old. Empty object is better than &lt;code&gt;undefined&lt;/code&gt; because an old document can still render through the same code path.&lt;/p&gt;

&lt;p&gt;The function is blunt about that boundary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;captureEntitySnapshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EntityRowForSnapshot&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;EntitySnapshot&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;entity_snapshot_invalid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id missing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;entity_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entity_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;cloneJsonLike&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;banking_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;redactBankingConfigForDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;banking_config&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;cloneJsonObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;registration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;redactRegistrationForDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;registration&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;contact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;cloneJsonObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contact&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;legal_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;toNullableString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;legal_name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;country_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;toNullableString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;country_code&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;vat_identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;toNullableString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vat_identifier&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;officers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;cloneJsonObjectArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;pickOfficers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That code does not look dramatic. It is the reason a document can be re-rendered later without asking the live &lt;code&gt;entities&lt;/code&gt; row what the company looks like today.&lt;/p&gt;

&lt;p&gt;The surprise was CSS. The HTML literals were obvious once I started searching. The comments in the stylesheet were not. The composer injects CSS directly into a &lt;code&gt;&amp;lt;style&amp;gt;&lt;/code&gt; block and hashes the full rendered HTML. A comment like &lt;code&gt;Klevar FZE primary&lt;/code&gt; is still rendered output. It becomes part of the document hash. It can leak into the PDF byte stream. It can fail an entity-neutral regression even if the visible page looks correct.&lt;/p&gt;

&lt;p&gt;That changed the template rule: anything inside the bundle is document output. Body HTML, shared partials, stylesheet comments, labels, helper inputs. None of it gets to carry entity identity unless it comes from the snapshot or the body payload.&lt;/p&gt;

&lt;p&gt;The renderer also became stricter in a different direction. &lt;code&gt;composeHtml()&lt;/code&gt; uses a private Handlebars instance for each render, not the global singleton. Helpers like &lt;code&gt;formatCurrency&lt;/code&gt;, &lt;code&gt;formatDate&lt;/code&gt;, &lt;code&gt;markdown&lt;/code&gt;, &lt;code&gt;eq&lt;/code&gt;, and &lt;code&gt;officerRoleLabel&lt;/code&gt; live inside that private instance. That prevents a test, module, or future template family from registering a helper globally and changing another document type by accident.&lt;/p&gt;

&lt;p&gt;The snapshot fix did not end at source code. The regression had to prove the failure could not return. The gate renders every authored bundle against multiple seeded entities and searches for identity strings from the wrong entity. If an LLC render contains an FZE registration value, the test fails. If a stylesheet comment leaks a company name, the test fails. If someone adds a new bundle and hardcodes a legal name because it is faster, the gate catches it.&lt;/p&gt;

&lt;p&gt;The deeper lesson is that PDF generation is not the domain. Legal attribution is the domain.&lt;/p&gt;

&lt;p&gt;A renderer that accepts a body and returns bytes is easy to build. A renderer that can explain where every legal identity field came from is the system. Once I saw that, the architecture became clearer: templates do not own identity, live rows do not own old documents, and snapshots do not carry private operational fields just because the database has them.&lt;/p&gt;

&lt;p&gt;That distinction now runs through the rest of Klevar Docs. Factur-X builder data comes from the snapshot. Board resolution officer data can default from the snapshot. Payment details freeze at issue time. Hash-chain verification depends on content hashes that include the rendered output. If a lower layer cheats, every upper layer can look correct while proving the wrong thing.&lt;/p&gt;

&lt;p&gt;The PDF looking correct was the warning. The system only became correct when the source of correctness moved out of the template.&lt;/p&gt;

</description>
      <category>documentrendering</category>
      <category>snapshots</category>
      <category>templates</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Reachable Is Not the Same as Correct</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Mon, 11 May 2026 08:19:56 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/reachable-is-not-the-same-as-correct-55pc</link>
      <guid>https://dev.to/kingsleyonoh/reachable-is-not-the-same-as-correct-55pc</guid>
      <description>&lt;p&gt;The CLI could create a credit note.&lt;/p&gt;

&lt;p&gt;That was the bad news.&lt;/p&gt;

&lt;p&gt;The umbrella &lt;code&gt;documents compose&lt;/code&gt; command was built to make all document types reachable through one stable surface. Pass a &lt;code&gt;type&lt;/code&gt;, an entity, and a JSON body. The server looks up the per-type Zod schema, validates the body, renders the document, and returns the same envelope shape every time.&lt;/p&gt;

&lt;p&gt;For generic documents, that is exactly right. A letterhead, proposal, quote, statement, receipt, minutes document, reference letter, or compliance letter can be a schema-gated render through the generic pipeline.&lt;/p&gt;

&lt;p&gt;Credit notes were different. So were invoices, pro-forma invoices, and board resolutions.&lt;/p&gt;

&lt;p&gt;Those document types do not just render. They create specialized rows, allocate numbers, enforce state machines, inherit fields from source invoices, attach payment behavior, and feed later compliance paths. When &lt;code&gt;documents compose --type credit_note&lt;/code&gt; went through the generic renderer, it produced a document row but bypassed &lt;code&gt;createCreditNote()&lt;/code&gt;. That meant the credit note skipped the inheritance logic that copies terms from the original invoice.&lt;/p&gt;

&lt;p&gt;The symptom appeared later as a Factur-X validation problem. The XML layer complained about missing payment terms. The real bug was earlier: the API made a specialized document reachable through the wrong path.&lt;/p&gt;

&lt;p&gt;I had treated coverage as correctness. It was not.&lt;/p&gt;

&lt;p&gt;The fix was not to delete the umbrella. The umbrella is still the right operator surface. The fix was to split the server path into two routes inside the same endpoint: specialized dispatch first, generic render second.&lt;/p&gt;

&lt;p&gt;The registry is explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;COMPOSE_SPECIALIZED_DISPATCH&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Partial&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TemplateType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ComposeDispatcher&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoiceDispatcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;pro_forma_invoice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;proFormaDispatcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;credit_note&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;creditNoteDispatcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;board_resolution&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boardResolutionDispatcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tiny map carries a big rule. If a type has specialized business logic, compose must call the specialized service. If it does not, compose falls through to &lt;code&gt;renderDocument()&lt;/code&gt;. Exactly one path fires.&lt;/p&gt;

&lt;p&gt;The dispatcher contract is intentionally boring. It maps the umbrella body into the specialized service input, calls the service, and wraps the returned row in the same envelope shape as a rendered document. It does not recompute totals. It does not duplicate inheritance. It does not allocate its own number. If the specialized service owns the rule, the dispatcher is only an adapter.&lt;/p&gt;

&lt;p&gt;Here is the credit note path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createCreditNote&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKeyId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKeyId&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;wrapRowAsOutput&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;credit_note&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;documentNumber&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;document_number&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;draft&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;locale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;locale&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That looks like extra ceremony until you compare it to the failure. The old path returned a rendered artifact while skipping the domain rule. The new path returns a draft row because specialized documents often require a later send or execute step to produce their final artifact. The response shape stays stable, but the lifecycle is honest.&lt;/p&gt;

&lt;p&gt;What surprised me was that the bug survived because both layers were internally correct. The CLI sent a valid body. The compose endpoint validated it. The generic renderer produced a document. The Factur-X validator was right to reject the later XML. No single module was obviously broken in isolation.&lt;/p&gt;

&lt;p&gt;The boundary was wrong.&lt;/p&gt;

&lt;p&gt;That forced a different kind of test. Unit tests on the dispatcher are not enough. The integration test has to assert the side effect that only the specialized service can create: invoice numbering, credit-note inheritance, board-resolution numbering, generic letterhead finalization. The test is not "does compose return 201?" It is "did the right domain path fire?"&lt;/p&gt;

&lt;p&gt;This is one of the more useful patterns in the project because it applies outside documents. An umbrella command is good when operators need one muscle memory. It becomes dangerous when the umbrella erases domain-specific behavior. Reachability is a UX property. Correctness is a domain property.&lt;/p&gt;

&lt;p&gt;The compose endpoint now carries both.&lt;/p&gt;

</description>
      <category>apidesign</category>
      <category>cli</category>
      <category>dispatch</category>
      <category>documenttypes</category>
    </item>
    <item>
      <title>Confidence Is Not Ownership</title>
      <dc:creator>Kingsley Onoh</dc:creator>
      <pubDate>Wed, 06 May 2026 16:00:28 +0000</pubDate>
      <link>https://dev.to/kingsleyonoh/confidence-is-not-ownership-2d80</link>
      <guid>https://dev.to/kingsleyonoh/confidence-is-not-ownership-2d80</guid>
      <description>&lt;p&gt;What should a finance queue do when two credible records point at the same case?&lt;/p&gt;

&lt;p&gt;An invoice discrepancy and a contract breach can describe the same dispute. They can also describe two different disputes with the same counterparty, the same currency, and nearly the same amount. That is the trap in a finance operations queue. The data looks related before anyone has proved ownership.&lt;/p&gt;

&lt;p&gt;The Workbench ingests exceptions from invoice reconciliation, transaction reconciliation, contract lifecycle events, webhook dead letters, manual operator entry, and signed Hub fanout. Every source carries its own identifiers. Some are reliable. Some are only reliable inside the upstream tool that produced them.&lt;/p&gt;

&lt;p&gt;I had to decide what the system should do when a new exception looks like it belongs to an existing dispute.&lt;/p&gt;

&lt;p&gt;The tempting version is simple: compute a score, pick the highest dispute, attach the exception. That makes demos feel clean. Exceptions flow in, disputes become richer, and the queue stays small.&lt;/p&gt;

&lt;p&gt;It is also how a finance system quietly corrupts its own audit trail.&lt;/p&gt;

&lt;p&gt;Once an exception is attached to a dispute, every later action inherits that fact. SLA timers, resolution playbooks, Notification Hub events, audit PDF exports, and operator comments all treat the relationship as true. If the relationship was only probable, the system has converted probability into evidence.&lt;/p&gt;

&lt;p&gt;That conversion is the real design problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scoring Shape
&lt;/h2&gt;

&lt;p&gt;The correlator in &lt;code&gt;src/drw/domain/correlator.clj&lt;/code&gt; uses seven signals. Source reference and entity id each carry 0.15. Counterparty carries 0.25. Currency carries 0.10. Amount carries 0.15. Date carries 0.10. Category carries 0.10.&lt;/p&gt;

&lt;p&gt;Those weights are not magic in the sense of being secret. They are visible because this is a spec project. But the structure matters more than the numbers.&lt;/p&gt;

&lt;p&gt;Counterparty is the gate. A candidate dispute is not eligible unless it belongs to the same tenant, is not terminal, has the same counterparty, and falls within the correlation window. Only then does scoring begin.&lt;/p&gt;

&lt;p&gt;That means the correlator is not a general similarity search. It is a tenant-scoped dispute ownership test.&lt;/p&gt;

&lt;p&gt;The amount signal has a 10 percent tolerance and only scores when the currency also matches. The date signal checks whether the exception was observed within 72 hours of the dispute creation time. The source reference and entity id signals compare against exceptions already attached to the candidate dispute, not just fields on the dispute itself.&lt;/p&gt;

&lt;p&gt;That last part matters. A dispute becomes easier to recognize as it accumulates evidence. The first invoice mismatch may create the dispute. A later webhook dead letter with the same upstream reference can now match the attached evidence, even if the dispute record itself does not carry that reference.&lt;/p&gt;

&lt;p&gt;The core function is plain Clojure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight clojure"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;defn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;score-candidates&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;tenant-id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;disputes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;score-candidates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tenant-id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;disputes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}))&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;tenant-id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;disputes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;merge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;default-config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
         &lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get-in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="no"&gt;:thresholds&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;:review&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;disputes&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map-indexed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dispute&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;candidate-eligible?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tenant-id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dispute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dispute&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="w"&gt;
                 &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;assoc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;score-candidate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tenant-id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dispute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
                        &lt;/span&gt;&lt;span class="no"&gt;:sort-index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;:score&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;%&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sort-by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;juxt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;comp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;:score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;:sort-index&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mapv&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dissoc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;:sort-index&lt;/span&gt;&lt;span class="p"&gt;))))))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two details in that function I care about.&lt;/p&gt;

&lt;p&gt;First, eligibility happens before scoring. A cross-tenant dispute receives no score. A terminal dispute receives no score. A different counterparty receives no score. The function does not let a high amount or date match compensate for a broken boundary.&lt;/p&gt;

&lt;p&gt;Second, ties preserve input order through &lt;code&gt;:sort-index&lt;/code&gt;. That is not glamorous. It prevents unstable review queues where two equal candidates swap positions between renders and make operators think the system changed its mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Was Wrong About
&lt;/h2&gt;

&lt;p&gt;I initially treated the correlation score as the hard part.&lt;/p&gt;

&lt;p&gt;It was not. The harder question was what the system does with the score.&lt;/p&gt;

&lt;p&gt;There are three possible outcomes in &lt;code&gt;src/drw/domain/exceptions.clj&lt;/code&gt;. If no candidate passes review, the exception creates a new dispute and attaches immediately. If the best candidate hits the auto-merge band and auto-merge is explicitly enabled, the exception attaches and records an auto-merged correlation. Otherwise, the system creates pending correlation records and emits a &lt;code&gt;dispute.correlation_pending&lt;/code&gt; event.&lt;/p&gt;

&lt;p&gt;That middle branch is intentionally hard to reach. The &lt;code&gt;.env.example&lt;/code&gt; values set &lt;code&gt;AUTO_MERGE_THRESHOLD=0.92&lt;/code&gt; and &lt;code&gt;REVIEW_THRESHOLD=0.70&lt;/code&gt;, while the source correlator defaults are lower for unit-level behavior. The runtime config is stricter because this is finance operations. False attachment costs more than a larger review queue.&lt;/p&gt;

&lt;p&gt;What surprised me is that pending correlation became a domain object, not a UI convenience.&lt;/p&gt;

&lt;p&gt;The queue needed an id, a tenant id, an exception id, a target dispute id, a score, a rationale, a status, a decided-by user, and decision timestamps. That is a lot of structure for something that could have been a modal row.&lt;/p&gt;

&lt;p&gt;But the moment an operator accepts or rejects a candidate, that decision becomes part of the case history. A rejected match is useful evidence. It says someone looked at the overlap and decided the exception did not belong there. If the same upstream source sends a related item later, the prior rejection explains why the system did not combine the cases earlier.&lt;/p&gt;

&lt;p&gt;That is why correlation records live next to exceptions and disputes instead of inside a transient UI response.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Failure Mode Hidden In Good Matches
&lt;/h2&gt;

&lt;p&gt;The most dangerous false match is not a ridiculous one.&lt;/p&gt;

&lt;p&gt;It is the match that looks reasonable.&lt;/p&gt;

&lt;p&gt;Same counterparty. Same currency. Amount within 10 percent. Observed inside three days. Category is billing. If those signals point to the wrong open dispute, the system does not look broken. It looks efficient.&lt;/p&gt;

&lt;p&gt;The damage appears later. A Workflow Engine playbook starts against the wrong case. A Notification Hub event tells an operator that the dispute is ready for resolution. The audit PDF now contains an exception that belongs somewhere else. Nobody sees the root mistake because every downstream artifact is internally consistent.&lt;/p&gt;

&lt;p&gt;That is the kind of bug that worries me more than a 500 response.&lt;/p&gt;

&lt;p&gt;A 500 stops the flow. A wrong attachment keeps moving.&lt;/p&gt;

&lt;p&gt;The design answer was to make confidence create a decision, not mutate the dispute. A review-band candidate becomes work for an operator. The UI exposes accept and reject actions. The API carries the same boundary. The audit log records correlation creation and later decisions.&lt;/p&gt;

&lt;p&gt;The Workbench still supports auto-merge, but it is a policy choice. It has to be enabled. The score has to clear the higher band. The code does not pretend the existence of a scoring function means the business has accepted the risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Tenant Scope Belongs Inside The Algorithm
&lt;/h2&gt;

&lt;p&gt;Tenant isolation is usually discussed at the HTTP layer. API key comes in, tenant id gets attached to the request, handlers filter queries.&lt;/p&gt;

&lt;p&gt;That is necessary, but it is not enough here.&lt;/p&gt;

&lt;p&gt;Correlation is a cross-entity operation by nature. It compares a new exception against many existing disputes and attached exceptions. If the algorithm accepts a list that accidentally contains another tenant's disputes, the HTTP layer is already too far away to save it.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;score-candidate&lt;/code&gt; checks tenant equality itself. &lt;code&gt;score-candidates&lt;/code&gt; filters candidates through &lt;code&gt;candidate-eligible?&lt;/code&gt;, which repeats tenant, status, counterparty, and time-window checks before scoring.&lt;/p&gt;

&lt;p&gt;This is defensive duplication with a purpose. The route should pass tenant-scoped collections. The domain should still reject anything outside the tenant boundary. In a single-tenant fixture, this looks redundant. In a two-tenant test, it is the difference between "the route behaved" and "the invariant held."&lt;/p&gt;

&lt;p&gt;The same philosophy appears in reports. The audit PDF renderer captures a tenant snapshot, renders with strict token lookup, and the setup check renders two tenants to make sure one tenant's identity literals never appear in the other tenant's output.&lt;/p&gt;

&lt;p&gt;The theme is the same: do not trust a boundary because a previous layer probably handled it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;The finished local build passed 160 tests with 869 assertions. The full-flow E2E drives invoice adapter polling into exception creation, assignment, investigation, Workflow Engine resolution polling, Notification Hub event capture, and audit PDF generation.&lt;/p&gt;

&lt;p&gt;The number I care about most is smaller: the dashboard guard. It caught a practical operations failure. The first dashboard shape rendered every dispute link in the tenant fixture. The fix capped the overview at 50 open disputes and kept totals intact.&lt;/p&gt;

&lt;p&gt;That is the Workbench in miniature. Preserve the facts. Limit the surface. Make the operator decide when the machine only has a probability.&lt;/p&gt;

&lt;p&gt;Confidence is useful. Ownership is a human or policy decision.&lt;/p&gt;

</description>
      <category>clojure</category>
      <category>correlation</category>
      <category>financeoperations</category>
      <category>tenantisolation</category>
    </item>
  </channel>
</rss>
