NexGenData

Posted on Jun 24 • Originally published at thenextgennexus.com

Sanctions Data Tools for Due Diligence and Risk Research

#automation #finance #api #webscraping

Sanctions screening sits at the operational core of every AML program, KYC onboarding workflow, and vendor risk file — and yet the underlying primary-source data is scattered across four government publishers, each with its own format, refresh cadence, and quirks. This post walks compliance teams, financial-crime investigators, and ExportControl analysts through a structured-data approach to OFAC SDN, UN consolidated, EU consolidated, and UK HMT financial sanctions lists, using a set of dedicated Apify actors now in early access.

1. The problem: four lists, four formats, no shared schema

A mid-market financial institution screens against — at minimum — the US Treasury OFAC SDN list, the OFAC non-SDN Consolidated Sanctions List (SSI under Directives 1–4 of E.O. 13662, the FSE list, and the NS-PLC list), the UN Security Council Consolidated List, the EU Consolidated Financial Sanctions list, and the UK HM Treasury OFSI Consolidated List. That is five reference universes before sectoral programs like CAATSA, Cuba EO, or Russia/Belarus directives.

Each publisher ships the data in a different shape. OFAC offers SDN.XML, SDN_ENHANCED.XML, fixed-width DEL files, and a CSV that drops secondary identifiers. The UN publishes one XML with nested aka groups. The EU exposes a token-gated XML and rotates schema roughly twice a year. UK HMT publishes a CSV plus a PDF "list of changes" — and CSV column order has shifted twice in 24 months. None of these sources share a common entity ID, alias normalization rule, address format, or program code taxonomy.

The result: most compliance teams either pay a screening vendor a six-figure annual fee for a re-bundled version of these public lists with name-matching bolted on, or maintain a brittle in-house pipeline of scheduled downloads and parsers that breaks every time a publisher rotates a schema. The vendor route hides the source of truth; the DIY route burns engineering cycles that should go to detection logic.

2. Why structured sanctions data matters in practice

Clean, structured sanctions data is the substrate on which a defensible AML program rests. The downstream use cases all assume the underlying list is current, normalized, and queryable:

KYC onboarding screening — every new customer must be screened against the relevant sanctions universe before the relationship is opened. Under the Bank Secrecy Act and OFAC regulations, a US person who processes a transaction for an SDN faces strict-liability civil exposure regardless of intent.
Payment screening — wire transfers, ACH batches, and correspondent banking flows are screened against SDN, EU, and UN lists in real time. Sub-second screening latency requires the list to be in-memory and pre-indexed.
Vendor due-diligence — procurement and TPRM teams screen suppliers, their officers, and their ultimate beneficial owners against the same lists, often extending into the OFAC 50% rule analysis.
Periodic refresh cycles — most programs re-screen the entire customer book against an updated sanctions snapshot weekly or monthly. A delta-only refresh requires a stable entity-key strategy across snapshots.
ExportControl screening — BIS Entity List, OFAC SDN, and EU dual-use restrictions all converge in the export-licensing workflow. Freight forwarders and ITAR-regulated manufacturers run combined screens before issuing a shipping document.
Secondary sanctions exposure mapping — non-US persons trading with a CAATSA-designated entity can themselves face US secondary sanctions. Mapping this exposure across a counterparty graph requires the full SDN dataset, not a name-only feed.

In each case, the analyst is not just asking "is this person on the list?" They are asking "is this person, or any entity in which they hold 50% or more, or any alias spelling or transliteration, on any of the lists my program is required to screen against, as of the most recent publication?" That question is only answerable when the underlying lists are pulled together, normalized, and refreshed on a known cadence.

3. What the actors extract

Four actors cover the core sanctions universe. All four are now in early access on the Apify platform; per-entity pricing of $0.05 activates on June 7. Each returns between roughly 10,000 and 50,000 records depending on the snapshot date.

OFAC SDN + non-SDN watchlist scraper

ofac-sdn-watchlist-scraper pulls the full SDN and Consolidated Sanctions List (SSI, FSE, NS-PLC, 13599, sectoral SSIs). Per record: primary name, all akas (strong and weak), entity type (individual, entity, vessel, aircraft), DOBs and POBs, addresses with country normalization, citizenships, identification documents (passport, national ID, tax ID, registration number), all sanction programs (SDGT, IRAN-EO13599, RUSSIA-EO14024, CAATSA-RUSSIA, CUBA, VENEZUELA-EO13884), date listed or amended, vessel call signs and IMO numbers, and a direct URL to the Treasury designation memo.

UN consolidated sanctions tracker

un-sanctions-consolidated-tracker covers every active UN Security Council sanctions committee (1267/1989/2253 ISIL/Al-Qaida, 1518 Iraq, 1533 DRC, 1591 Sudan, 1718 DPRK, 1970 Libya, 1988 Taliban, 2127 CAR, 2140 Yemen, 2206 South Sudan, 2374 Mali, 2653 Haiti). The schema includes the UN reference number, name and akas in original script plus Latin transliteration, DOB, POB, nationality, designation and last-amended dates, the narrative summary of reasons for listing, the resolution number, and the asset-freeze or travel-ban scope.

EU consolidated sanctions tracker

eu-consolidated-sanctions-tracker extracts the EEAS / DG FISMA list. Per record: EU Logical ID, name and akas, DOB, POB, nationality, function, ID documents, the Council Regulation that imposes the restriction (e.g., 269/2014 Ukraine territorial integrity, 833/2014 Russia sectoral, 36/2012 Syria), the Official Journal publication date, and the program acronym.

UK HMT financial sanctions tracker

uk-hmt-financial-sanctions-tracker covers the OFSI Consolidated List — asset-freeze targets, TAFA counter-terrorism measures, Russia, Belarus, Iran (Human Rights), and country regimes. Schema includes OFSI group ID, primary name, akas, DOB, POB, nationality, position, addresses, passport and national ID numbers, the statutory instrument imposing the listing, listing date, and the "last updated" date that drives delta refresh.

Structural comparison. All four feeds return similar entity records but with meaningful differences. OFAC is richest on secondary identifiers (vessels, aircraft, weak akas, ENT/SDN type codes). UN is richest on narrative justification. EU couples each entry to a Council Regulation citation that matters for legal review. UK HMT is cleanest on addresses but sparsest on aliases. A combined pipeline must reconcile these into one internal entity schema before name-matching runs.

4. Example workflow: combined four-list screening

A defensible combined screening workflow looks like this:

Pull all four lists on a scheduled cadence — daily for OFAC and UK HMT (which publish updates throughout the week), every 48 hours for UN and EU. Apify schedules and webhooks handle the scheduling natively; each run drops a timestamped snapshot into your dataset store or S3 bucket.
Normalize into a unified entity schema — flatten each publisher's record into a common shape: entity_id, source_list, source_ref, entity_type, primary_name, aka_list[], dob_list[], pob_list[], nationality_list[], address_list[], id_doc_list[], programs[], listed_date, last_updated, source_url.
Dedupe across lists using a fuzzy name match (Jaro-Winkler or token-set ratio above 0.92) combined with DOB exact or DOB year-month match. A person listed by both OFAC and the EU under the same Russia program should collapse into a single internal record with two source_ref entries.
Flag cross-list overlaps — entries appearing on three or four lists are higher-confidence true positives and warrant a "consensus designation" flag in your case management tool.
Run against your customer and vendor master files — index the normalized sanctions universe in Elasticsearch or OpenSearch and run a name + DOB + country query against each KYC record. Tune your minimum match score per list (OFAC SDN should be more sensitive than UN, because OFAC strict liability is harsher).
Score and route alerts — combine match score, program severity (SDGT and CAATSA score higher than legacy programs), and customer risk class (PEP-adjacent retail customers warrant a different routing than a domestic SMB). Push alerts to your case management tool (Actimize, Verafin, ComplyAdvantage, in-house Jira workflow).
Maintain an audit trail — keep every snapshot, every match, and every analyst disposition with timestamps. OFAC examinations and FFIEC AML exam manuals both expect a full chain of evidence.

For a mid-sized institution screening 250,000 customers weekly, the four lists together (roughly 30,000 normalized entries) cost on the order of $1,500 per refresh in Apify usage — material but trivial against a six-figure incumbent vendor contract.

5. Use cases

KYC onboarding screening — block account opening before funds are accepted from a sanctioned party.
AML transaction screening — real-time wire and ACH screening against an in-memory normalized sanctions index.
Vendor and supplier risk — TPRM workflows that screen suppliers, their officers, and disclosed UBOs.
M &A due-diligence — counterparty and target shareholder screening during deal diligence, including the OFAC 50% rule cascade through ownership chains.
Secondary sanctions exposure — non-US counterparty mapping to identify CAATSA, Iran, and DPRK secondary exposure for non-US clients of US banks.
Investigative journalism and OSINT — researchers tracing illicit finance networks need fast, queryable access to all four lists with full alias coverage.
Periodic customer refresh — quarterly or annual re-screening of the full book against a fresh snapshot, with delta logic to surface newly-listed parties.
Embedded screening in payment processors — fintech and PSP platforms screening every payment intent through a hosted screening service backed by the four-list union.
ExportControl pre-shipment screening — combining SDN with BIS Entity List and EU dual-use restrictions before issuing an export licence determination.
Regulator reporting — periodic OFSI annual reporting, OFAC blocked-property reports, and EU competent authority filings all reference the live consolidated lists as the authoritative reference universe.

6. Get started — run the OFAC SDN watchlist scraper

The fastest way to evaluate the data is to run the OFAC SDN actor against the live Treasury feed and inspect the JSON. Run the OFAC SDN watchlist scraper on Apify-> — early access, no per-entity charge until June 7. A single run returns the complete current SDN plus Consolidated Sanctions universe (roughly 13,000 entries at the time of writing) in normalized JSON, ready to drop into a screening index.

From the actor page you can trigger a one-off run, schedule daily refreshes, wire up a webhook to your downstream pipeline, or call the actor synchronously from your screening service via the Apify API.

7. Related actors for adjacent compliance workflows

Sanctions screening rarely sits alone in a compliance program. The same toolkit covers adjacent regulatory and beneficial-ownership workflows:

uk-fca-enforcement-register — UK Financial Conduct Authority enforcement notices, final notices, and prohibition orders. Useful for adverse-media style screening against authorised-person history.
australia-asic-enforcement — ASIC banning orders, infringement notices, and court-ordered penalties from the Australian securities regulator. Pair with the related blog post on tracking ASIC enforcement.
uk-companies-house-officers — UK Companies House officer history, including disqualification status. A core input to UBO mapping for any UK-domiciled counterparty.
uk-psc-beneficial-ownership — Persons with Significant Control register, the UK's beneficial-ownership ledger, essential for OFAC 50% rule analysis on UK entities.

For broader context on the regulatory data catalogue, see the Regulatory Compliance data tools category and the related write-up on court-records due-diligence automation.

8. Frequently asked questions

How fresh is the data?

Each actor pulls directly from the publisher's authoritative source on every run — OFAC's SDN.XML endpoint, the UN's consolidated XML, the EU EEAS feed, and the OFSI consolidated CSV. There is no intermediate cache. Run the actor on whatever cadence your program requires; daily is typical for sanctions-sensitive institutions, real-time webhook triggers are available for payment-screening use cases.

Do you cover the OFAC 50% rule?

The actors return the OFAC SDN dataset itself; they do not pre-compute 50% rule cascades because that calculation requires beneficial-ownership data the SDN list does not contain. Pair the SDN scraper with uk-psc-beneficial-ownership and equivalent corporate-registry actors to build a 50% rule cascade against your counterparty population. OFAC's General Guidance on the 50% rule is the relevant interpretive source.

Can I screen against all four lists at once?

Yes — run all four actors on the same schedule, normalize each output into a unified entity schema (see section 4), and load the combined index into your screening engine. Many institutions also build a "consensus designation" flag for entries appearing on multiple lists, which materially reduces false-positive triage load.

What's the export format for AML tooling?

Each actor returns JSON natively. Apify dataset endpoints also expose CSV, XLSX, JSONL, and RSS. For Actimize, Verafin, ComplyAdvantage, NICE-Actimize WLF, and most in-house screening engines, JSONL flattened to one record per line is the cleanest ingest format. The actors expose a stable field schema, so once your ETL is wired up it should not need re-mapping when publishers rotate their own source schemas.

Are there false-positive controls?

False-positive management lives in your screening engine, not in the source data. That said, the actors return clean field-level data (separate first / middle / last name where the publisher provides it, separate aka strength flags for OFAC, separate weak / strong aka distinctions) which lets your matcher apply different thresholds per field. Combining name match with DOB and nationality typically reduces false positives by 60–80 percent relative to name-only matching.

What about sectoral sanctions and SSI?

The OFAC actor returns the full Non-SDN Consolidated Sanctions List, which includes the Sectoral Sanctions Identifications list under Directives 1–4 of Executive Order 13662, plus the various sector-specific designations under newer Russia and Belarus EOs. Each entry carries its directive citation in the programs field, so you can route SSI-only matches differently from full-block SDN matches.

Are the actors suitable for real-time payment screening?

The actors are designed for snapshot extraction, not sub-second per-payment screening. The right pattern is: run the actor on a schedule, load the snapshot into an in-memory index (Elasticsearch, Redis, an in-process trie), and have your payment-screening service query the index. End-to-end refresh latency from publisher to screening engine can be kept under 15 minutes with a webhook-triggered pipeline.

Does the data include PEP information?

No — sanctions designations and PEP status are distinct compliance signals. Many sanctioned individuals are also PEPs, but PEP screening requires a separate dataset (typically a commercial PEP list or a structured pull from parliamentary, judicial, and executive registries). Pair the sanctions actors with adjacent OSINT and corporate-registry actors to assemble a fuller adverse-media and PEP picture for high-risk customers.

DEV Community