<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: J.S_Falcon</title>
    <description>The latest articles on DEV Community by J.S_Falcon (@_d3709cf9e80fc6babbff).</description>
    <link>https://dev.to/_d3709cf9e80fc6babbff</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898219%2F048d871b-c38b-4948-87a5-cc7602c5b123.webp</url>
      <title>DEV Community: J.S_Falcon</title>
      <link>https://dev.to/_d3709cf9e80fc6babbff</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_d3709cf9e80fc6babbff"/>
    <language>en</language>
    <item>
      <title>"Beating 250,000 Mental Comparisons: A Cross-Domain Engineer's Entity Resolution Case Study"</title>
      <dc:creator>J.S_Falcon</dc:creator>
      <pubDate>Sun, 26 Apr 2026 08:41:34 +0000</pubDate>
      <link>https://dev.to/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-3j1b</link>
      <guid>https://dev.to/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-3j1b</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Operations/Systems engineer recently moved to the software side via AI collaboration.&lt;/li&gt;
&lt;li&gt;Built a domain-specific entity resolution tool in a handful of evening sessions with Claude Code.&lt;/li&gt;
&lt;li&gt;Caught about 99.2% of human-detected reconciliation errors when replayed against 8 weeks of historical data.&lt;/li&gt;
&lt;li&gt;Turned a "skilled-veterans-only" weekly task into something anyone on the team can run.&lt;/li&gt;
&lt;li&gt;Design retrofitted unexpectedly well to dual process theory, Gestalt psychology, and anchoring-bias defense.&lt;/li&gt;
&lt;li&gt;Source business records never reached an LLM. Deterministic pipeline + human review only.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. The Hidden Problem: When 500 × 500 Becomes a Cognitive Wall
&lt;/h2&gt;

&lt;p&gt;Many companies maintain the same business entities across multiple systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A retailer tracks SKUs in an internal master AND on Amazon / Rakuten / Shopify exports.&lt;/li&gt;
&lt;li&gt;A clinic carries patient records in both an EMR and an insurance billing system.&lt;/li&gt;
&lt;li&gt;A manufacturer holds internal inventory but also receives partner inventory feeds.&lt;/li&gt;
&lt;li&gt;An accounting team reconciles general ledger entries against bank statements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These pairs need periodic reconciliation. In the technical literature this is &lt;strong&gt;Entity Resolution&lt;/strong&gt; or &lt;strong&gt;Data Reconciliation&lt;/strong&gt; — a universal problem that nearly every mid-to-large business hits eventually.&lt;/p&gt;

&lt;p&gt;The case study here uses the &lt;strong&gt;retail SKU vs marketplace listing&lt;/strong&gt; framing. (The actual industry I work in is intentionally abstracted, but the structure transfers cleanly.) Two systems, ~500 rows each, weekly reconciliation. Skilled humans needed about 3 hours per week. Newcomers, half a day to a full day. Hidden detail: the small row count masks the real difficulty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is 500 × 500 hard?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The 250,000 problem
&lt;/h4&gt;

&lt;p&gt;Manually reconciling 500 × 500 pairs forces a person to evaluate up to &lt;strong&gt;250,000 combinations&lt;/strong&gt; in their head. Not 1,000 — 250,000. Plus typo tolerance, format variation (full-width vs half-width, mixed scripts, abbreviations, punctuation), and partial matches. Each pairwise judgment is not O(1).&lt;/p&gt;

&lt;p&gt;Brute-forcing this is computationally similar to running a 1,000-node full-mesh ping check vs a flat 1,000-node liveness check. Order-of-magnitude different load.&lt;/p&gt;

&lt;h4&gt;
  
  
  Working memory overflow
&lt;/h4&gt;

&lt;p&gt;Miller's "magical number" puts our short-term memory at 7 ± 2 chunks (Miller, 1956). Hunting matches across 1,000 candidates with format drift continuously overflows working memory and pegs System 2 (slow thinking) for the entire session. The 3-hour exhaustion experienced by veterans isn't a complaint — it's a neurological inevitability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Short to do" doesn't equal "easy to do"&lt;/strong&gt; for cognitive labor.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reproducibility decay
&lt;/h4&gt;

&lt;p&gt;A one-off reconciliation can be brute-forced. But when the task repeats weekly across 10+ weeks, judgment drift becomes unavoidable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Last week I matched 'A Co.' and 'A. Company' as the same entity. This week I treated them as different."&lt;/li&gt;
&lt;li&gt;"Last week I tolerated typo X. This week I rejected it."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This drift is what really breaks data quality long-term. It's the same structural failure mode as "config review standards differ by reviewer" in infrastructure operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The actual target
&lt;/h3&gt;

&lt;p&gt;So the real problem the tool solved was not "shorten 3 hours per week" but:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;250,000 judgments × 10 weeks of consistent reproducibility — a quality bar humans can't physically sustain — backed by a deterministic machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Plus removing the skill dependency. "Only one veteran can do this in 3 hours" is a single point of failure. After the tool: anyone could run it with consistent quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Background: Who I Am and What I Was Solving
&lt;/h2&gt;

&lt;p&gt;I'm an Operations/Systems engineer. Configuration, validation, runbook authoring, monitoring, troubleshooting — that side of the house. Software development was not my primary craft, though scripting was always part of the job.&lt;/p&gt;

&lt;p&gt;I'd recently moved into a new business domain (about 2 months in) and the tooling target system was something I'd only been touching for ~1 month. From the user side I'd seen the workflow longer, but not as a developer.&lt;/p&gt;

&lt;p&gt;Translation: design / validation / runbook discipline solid. Python and application development essentially unfamiliar.&lt;/p&gt;

&lt;p&gt;This article is &lt;strong&gt;not a "look what I shipped" piece&lt;/strong&gt;. It's a record of how operations-side disciplines transferred unchanged into AI-assisted software work in an unfamiliar domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who this article is for
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reader&lt;/th&gt;
&lt;th&gt;Useful sections&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Operations / SRE engineers exploring AI assistance&lt;/td&gt;
&lt;td&gt;Everything&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-career engineers moving across technical domains&lt;/td&gt;
&lt;td&gt;Background, Architecture, Cognitive Design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineers new to AI-assisted development&lt;/td&gt;
&lt;td&gt;Architecture, Cognitive Design, PII&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managers thinking about AI for their teams&lt;/td&gt;
&lt;td&gt;Results and the cognitive-load argument&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. PII / Compliance Considerations
&lt;/h2&gt;

&lt;p&gt;A question that always comes up in comments on entity-resolution articles: &lt;strong&gt;where does the data go?&lt;/strong&gt; Worth answering up front.&lt;/p&gt;

&lt;p&gt;In this implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source business records never reach any LLM.&lt;/strong&gt; Both input files (internal master + external system export) are read locally by a Python script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Matching is fully deterministic.&lt;/strong&gt; Pandas, openpyxl, and &lt;code&gt;difflib.SequenceMatcher&lt;/code&gt; for similarity. No embedding API. No remote inference at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The LLM's role is code-side, not data-side.&lt;/strong&gt; Claude Code helped write the matching logic, the validation scripts, the design review, and the documentation. None of the actual records were ever sent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For testing only&lt;/strong&gt;, masked synthetic data was used in prompts. Real names, amounts, and addresses were replaced with synthetic equivalents before any prompt left the local environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases stay with humans.&lt;/strong&gt; When the deterministic pipeline can't decide, it surfaces a flagged row for human review — not for LLM second opinion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation is intentional. The matching task is well-suited to deterministic logic. LLMs would only add cost, latency, and compliance exposure for no quality gain.&lt;/p&gt;

&lt;p&gt;If your team has even a soft "no business data into external AI" policy, this pattern is fully compatible.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Architecture: Two-Stage Matching + Cognitive Gates
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11&lt;/li&gt;
&lt;li&gt;pandas + openpyxl (Excel I/O, color-coded output)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;difflib.SequenceMatcher&lt;/code&gt; for fuzzy similarity&lt;/li&gt;
&lt;li&gt;Rule-based throughout. No machine learning.&lt;/li&gt;
&lt;li&gt;~1,100 lines, single script.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phases
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Match by exact stakeholder name (or alias group)
Phase 2: Cross-match by name similarity ≥ 0.6 (rescue typos)
Phase 3: Last-name-only + structural match (single-typo tolerance)
Phase 4: Duplicate-registration detection (same stakeholder + similarity ≥ 0.8)
Phase 5: Rescue rows with no stakeholder name (attribute match)
Phase 5.5: Attribute-mismatch pair rescue (identifier similarity ≥ 0.7, stage 2)
Phase 6: Row generation + color decision
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The score function (key gates)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard gate: region must match — kills cross-region false positives
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;region_a&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;region_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard gate: numeric attribute must be close enough
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value_a&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;value_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Identifier gate: row_b's identifier must be embeddable in row_a's identifier
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;is_identifier_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Sub-identifier gate: anchoring-bias defense
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sub_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Soft scoring (only after every hard gate passed)
&lt;/span&gt;    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;identifier_match_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value_fallback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this shape?
&lt;/h3&gt;

&lt;p&gt;The retail SKU framing helps here. The same product on a marketplace might appear as &lt;code&gt;iPhone15&lt;/code&gt; in your master and &lt;code&gt;iPhone 15 Pro Max&lt;/code&gt; on the marketplace. Same item family, different surface form. Two key insights:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hard gates first.&lt;/strong&gt; "Different region" or "value difference &amp;gt; N" are absolute disqualifiers. Run them before any expensive similarity computation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Soft scoring last.&lt;/strong&gt; Once hard gates pass, compute similarity — but cap below 0.6 as "uncertain, surface to human."&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why not ML / Vector DB / embeddings?
&lt;/h3&gt;

&lt;p&gt;Deterministic rule-based was chosen on purpose. Auditability was the requirement. When a flagged row is wrong, the operations team has to be able to trace exactly which gate fired and why. A black-box similarity score of 0.81 with no explanation cannot be reviewed, cannot be unit-tested, and cannot be defended in a compliance audit.&lt;/p&gt;

&lt;p&gt;ML is a fine choice when you have labeled training data, training infrastructure, and a continuous evaluation pipeline. None of these applied here. The operating constraint was: "anyone on the team should be able to read the code and know why it decided what it decided." That constraint forces deterministic logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Abstracted structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain-specific term&lt;/th&gt;
&lt;th&gt;Abstract concept&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Item / SKU&lt;/td&gt;
&lt;td&gt;Entity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stakeholder (vendor / agent)&lt;/td&gt;
&lt;td&gt;Stakeholder attribute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price / Amount&lt;/td&gt;
&lt;td&gt;Primary numeric attribute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address / Location&lt;/td&gt;
&lt;td&gt;Identifier (multi-attribute)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building / SKU name&lt;/td&gt;
&lt;td&gt;Auxiliary identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detail number / barcode&lt;/td&gt;
&lt;td&gt;Sub-identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format variation (kana/latin/case)&lt;/td&gt;
&lt;td&gt;Data quality issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain judgment&lt;/td&gt;
&lt;td&gt;Tacit knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is a universal "match entities across two systems with format drift" problem. The pattern reappears in EC, healthcare, HR, accounting, manufacturing, publishing — anywhere two systems represent the same business object differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Cognitive-Science Design Principles (the Twist)
&lt;/h2&gt;

&lt;p&gt;I didn't design this thinking about cognitive science. I built it, it worked, and only afterwards in a structured Gemini conversation did the underlying principles surface. The retrofit fits unsettlingly well.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Dual process theory (Daniel Kahneman)
&lt;/h3&gt;

&lt;p&gt;The two phases map onto two thinking modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System 1 (fast) = Phases 1–5.&lt;/strong&gt; Fuzzy "is this roughly the same thing?" — similarity scores, identifier matching, attribute closeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System 2 (slow) = &lt;code&gt;determine_color()&lt;/code&gt;.&lt;/strong&gt; Strict checks for value mismatch, format inconsistency, identifier mixing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Color-coded human review gets the System 1 fuzzy pass plus the System 2 strictness annotation, which is exactly the input shape humans need to make a final call.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Gestalt psychology
&lt;/h3&gt;

&lt;p&gt;Humans recognize "wholes," not character sequences. &lt;code&gt;iPhone15&lt;/code&gt; and &lt;code&gt;iPhone 15 Pro Max&lt;/code&gt; feel like the same product family even though strict string equality fails. So:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_identifier_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Recognize chunked identity even with mixed scripts and separators.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[A-Za-z0-9\s\-_]+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;addr_a&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Matching by chunks survives whitespace, separator, and script variation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Anchoring &amp;amp; confirmation bias defenses
&lt;/h3&gt;

&lt;p&gt;Hard gates exist to deny human-style intuitive shortcuts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Same price, must be the same item" — rejected by sub-identifier gate.&lt;/li&gt;
&lt;li&gt;"Same name, must be the same person" — rejected by region gate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The machine's job is to be coldly skeptical exactly where humans get over-confident.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Reducing human cognitive load (Human-in-the-Loop)
&lt;/h3&gt;

&lt;p&gt;When a human is asked to confirm a flagged row, they don't get an opaque "match score 0.62". They get a one-line annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Same entity matched | [Value mismatch] diff ¥2,000,000 (5.4%)
(A: ¥34,900,000 / B: ¥36,900,000) · identifier format inconsistent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The human doesn't waste cycles re-deriving why the row was flagged. Cognitive load drops sharply.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 Don't automate the ghost
&lt;/h3&gt;

&lt;p&gt;This part borrows from &lt;em&gt;Ghost in the Shell&lt;/em&gt;. Some judgments depend on tacit business knowledge that can't be reduced to rules. Don't build heuristics that pretend to encode them. Surface the row as a &lt;strong&gt;caution signal&lt;/strong&gt; and let a human apply the tacit layer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tightening the logic isn't a path to recreating the ghost.&lt;br&gt;
It's a path to revealing where the ghost is needed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Mapping summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cognitive concept&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System 1 (fast)&lt;/td&gt;
&lt;td&gt;Phases 1–5 (fuzzy matching)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System 2 (slow)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;determine_color()&lt;/code&gt; strict checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-stage / dual-pass&lt;/td&gt;
&lt;td&gt;Stage 1 + Stage 2 (Phase 5.5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gestalt grouping&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;similarity&lt;/code&gt; / &lt;code&gt;is_identifier_match&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anchoring defense&lt;/td&gt;
&lt;td&gt;Sub-identifier gate, identifier gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cognitive load reduction&lt;/td&gt;
&lt;td&gt;Aggregated &lt;code&gt;[reason] diff X&lt;/code&gt; annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-Loop&lt;/td&gt;
&lt;td&gt;Caution signals for tacit-knowledge zones&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Recall on 8 weeks of historical data
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Errors flagged by humans (excluding outlier weeks)&lt;/td&gt;
&lt;td&gt;~130&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errors caught by the tool&lt;/td&gt;
&lt;td&gt;~129&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~99.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single missed case was annotated by the human reviewer as "even a human couldn't decide here." Effectively the tool catches every case where a human commits a confident verdict.&lt;/p&gt;

&lt;p&gt;(Caveat: this is recall against 8 weeks of one team's data, not a benchmark claim. Different domains will need their own measurement.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Time and skill load
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Skilled veteran throughput&lt;/td&gt;
&lt;td&gt;~3 hrs/week&lt;/td&gt;
&lt;td&gt;~30 min/week (review only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcomer throughput&lt;/td&gt;
&lt;td&gt;half a day to full day&lt;/td&gt;
&lt;td&gt;~30 min/week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill dependency&lt;/td&gt;
&lt;td&gt;Yes (single point of failure)&lt;/td&gt;
&lt;td&gt;No (anyone can run it)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The time number understates the value. The real shift is &lt;strong&gt;breaking the skill SPOF&lt;/strong&gt;. Veteran out sick, leaves, or buried in another priority — work continues at the same quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on false positives
&lt;/h3&gt;

&lt;p&gt;Recall is 99.2%, but the tool is intentionally tuned for higher recall over higher precision. False positives — pairs flagged for human review that turn out to be fine — are accepted as the trade-off. The ~30 min/week of human review handles them without strain.&lt;/p&gt;

&lt;p&gt;In a no-human-in-the-loop deployment this trade-off would be very different. Here, false positives are cheap (a glance from a human reviewer) and false negatives (missed reconciliation errors) are expensive (data drift propagates into business reports).&lt;/p&gt;

&lt;h2&gt;
  
  
  7. The Flowchart
&lt;/h2&gt;

&lt;p&gt;Drawing the judgment flow as diagrams surfaced things the code review didn't. Below are the four phases as separate figures, in execution order.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 Phase 1: Hard Gates (sequential disqualifiers)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dyd67f8r7e2kwvxzcf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dyd67f8r7e2kwvxzcf.png" alt=" " width="800" height="1143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Region → numeric value → auxiliary identifier → sub-identifier. Each gate is an absolute disqualifier: any "No" drops the pair. The order matters — cheapest disqualifiers run first.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.2 Phase 2: Soft Match
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlft07vl32heaowckcsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlft07vl32heaowckcsg.png" alt=" " width="542" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once a pair clears all hard gates, &lt;code&gt;compute_score&lt;/code&gt; evaluates a soft similarity. Below 0.6 → drop. At or above → lock the pair as the same entity.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.3 Phase 3: Parallel Flag Checks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ozelqo0xcqb643e4405.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ozelqo0xcqb643e4405.png" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For confirmed matches, six independent checks fire in parallel. Each surfaces a "this matched, but here's a discrepancy" signal. Tags are aggregated; there is no early-return contamination between checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.4 Phase 4: Final Verdict and Drop Aggregation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6javxmbdxn53iioak5uw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6javxmbdxn53iioak5uw.png" alt=" " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aggregate the tags into a color verdict. Drops from Phase 1 and Phase 2 converge into the "Unmatched" lane, surfaced standalone in the human-review output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Things visible only after rendering as a diagram
&lt;/h3&gt;

&lt;p&gt;These were invisible while reading code, only obvious once drawn:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Phase 1 hard gates are ordered by computational cost.&lt;/strong&gt; Region → numeric → auxiliary → sub-identifier. I placed them by intuition; the diagram showed they were already optimal — cheapest disqualifiers first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 3 parallel flag checks are genuinely independent.&lt;/strong&gt; Six checks fire in parallel with no early-return contamination. The diagram confirmed there was no silent dependency between them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All &lt;code&gt;Drop1&lt;/code&gt;–&lt;code&gt;Drop5&lt;/code&gt; paths converge to the same &lt;code&gt;Unmatched&lt;/code&gt; node.&lt;/strong&gt; I was throwing away the drop reason. Re-running "why was this pair rejected?" was impossible. Fix: log the drop reason in the row annotation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Drawing the flowchart is roughly the same act as drawing an infrastructure topology before going live. The diagram is the rubber duck.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Wrap-up
&lt;/h2&gt;

&lt;p&gt;Three transferable lessons from this build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive load is the hidden cost&lt;/strong&gt; of "short" repetitive judgment tasks. Headcount-hour math undersells the burnout reality and skill-SPOF risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive science principles fall out of good design retroactively.&lt;/strong&gt; I didn't design with them in mind; the principles became visible only through structured review (with a second AI). If your design retrofits to known principles, that's confirmation. If it doesn't, that's a smell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLMs do NOT have to touch your data.&lt;/strong&gt; Most entity resolution work doesn't need them at all. Use them for code, design review, and documentation. Keep the business records local and deterministic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation itself is internal-use only and won't be open-sourced. The patterns generalize cleanly to any two-system entity reconciliation: EC, healthcare, HR, accounting, manufacturing, publishing.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Coming in Part 2&lt;/strong&gt;: how this whole thing got built in the first place — the AI collaboration patterns, the anti-patterns I hit, and the cross-domain disciplines that transferred from operations to software development. (Link to A2 once published.)&lt;/p&gt;

&lt;p&gt;Comments on entity resolution, cognitive load in repetitive tasks, or cross-domain engineering experiences are welcome.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>architecture</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
