DEV Community

Cover image for From Fuzzy Matching to Evidence Capsules: Building an Explainable Sanctions Screening Engine
Verifex
Verifex

Posted on

From Fuzzy Matching to Evidence Capsules: Building an Explainable Sanctions Screening Engine

Sanctions screening looks simple from the outside.

Take a name, compare it against a list, return a score above a threshold, send it to review.

That was how I thought about it before I started building Verifex.

The reality is different.

The problem nobody talks about

A compliance reviewer does not just need to know that two names are similar. They need to understand why a match was created, what evidence supports it, what weakens it, and whether the decision holds up during an audit six months later.

A score alone does not answer any of those questions.

When the engine returns 0.92, the reviewer is still left asking: was that the surname? The alias? The date of birth? The country? The source list?

Without that breakdown, every review is manual reconstruction from scratch.

What fuzzy matching misses

Fuzzy string matching works fine for clean data.

John Smith vs John Smith -- no problem.
ACME Ltd vs ACME Limited -- no problem.

But real sanctions data is messier than that.

Names get reordered. Transliteration varies across source lists. Some entries have aliases, some do not. Dates of birth are missing or partial. Nationalities are stored inconsistently. Common names create noise. Some lists store names as SURNAME, Given Patronymic and a naive parser flips them.

That last one caused a real bug in early versions of the engine. The parser was treating PUTIN as a given name because it appeared before the comma. The match score dropped even though the match was obvious to any human reviewer.

A single final score would have only told me something was wrong. The evidence breakdown told me exactly where.

Evidence Capsules

The idea I have been building around is simple.

Instead of returning only a score, the engine produces a structured evidence object for every candidate match. I call this an Evidence Capsule.

Each capsule contains:

  • the query name and the candidate name
  • source list information
  • token-level name comparison
  • date of birth signal
  • country and nationality signal
  • identifier signals
  • a list of supporting evidence
  • a list of weakening evidence
  • reason codes
  • audit warnings

The goal is not to replace the reviewer. The goal is to give the reviewer a structured explanation so they are not starting from zero every time.

Scoring as evidence weighting

Fuzzy matching produces a similarity score.

What I wanted was something closer to evidence-weighted reasoning.

The internal model follows a log-odds structure:

log_odds = prior_log_odds + sum(evidence_weights)
posterior = sigmoid(log_odds)
Enter fullscreen mode Exit fullscreen mode

Each signal contributes independently. An exact surname match increases the score. An exact date of birth increases it strongly. A country mismatch pulls it down. A match based only on a common given name gets penalized. Missing context is recorded explicitly rather than ignored.

This is not the same as saying the output is a calibrated probability. That distinction matters.

Why calibration matters

If the engine outputs 0.90, that does not automatically mean the result is 90% likely to be a true match. To know that, you need calibration data.

The measurement layer I added tracks:

  • Brier Score
  • Expected Calibration Error
  • Reliability curves
  • Threshold sweeps across source families

These answer the practical questions. When the engine says 0.9, how often is it right? Which source family is overconfident? What threshold increases review burden without catching more true matches?

Compliance systems should not hide behind vague scores. They need measurable behavior.

What this does not claim

This is not a claim that the engine has zero false negatives.

It is not a claim that human review is unnecessary.

The current goal is more limited and more honest: build a screening engine that can explain its own reasoning, persist that reasoning for audit, and measure whether its scores reflect reality.

A proper benchmark against labeled outcomes is still in progress.

Why this direction matters

The hard part of sanctions screening is rarely finding a possible match. The hard part is explaining why it was escalated, cleared, or reviewed, in a way that holds up later.

That is the shift I think compliance infrastructure needs:

from fuzzy scores to structured evidence to defensible review workflows.

That is what I am building with Verifex.

Top comments (0)