Compliance research used to mean a stack of subscription databases and a call to a regional analyst. In 2026 the centre of gravity has shifted. The most authoritative records -- sanctions designations, regulatory enforcement notices, corporate registries, and rule-making dockets -- are published directly by the governments that produce them, often within hours. The challenge is integration: pulling structured records out of dozens of incompatible portals and feeding them into screening engines, case files, and analyst workflows.
"Compliance research" here means three overlapping disciplines: sanctions and watchlist screening , regulatory due diligence (is this firm licensed and has it been sanctioned?), and beneficial-ownership investigation. All three are served by free, official, primary-source datasets -- and all three benefit when those datasets are scraped into a normalised format rather than queried one entity at a time.
Public-data sources matter for two reasons. First, cost: commercial aggregators charge tens of thousands of pounds a year to repackage what governments publish for free. Second, defensibility -- "the official source, refreshed at 06:00 UTC" beats "our vendor's proprietary dataset" when a regulator asks where a screening hit came from. The list below is the working set of sources that AML analysts, KYC teams, GRC managers, journalists, and OSINT investigators return to day after day, with a NexGenData scraper recipe for each.
1. OFAC SDN List (U.S. Treasury)
Maintained by: U.S. Department of the Treasury, Office of Foreign Assets Control.
The Specially Designated Nationals and Blocked Persons List is the cornerstone of U.S. sanctions enforcement. It catalogs individuals, companies, vessels, and aircraft owned or controlled by sanctioned countries and threat actors -- narco-traffickers, terrorist financiers, cyber-criminals, and proliferators.
Contents: Roughly 12,000+ active entries with names, aliases, addresses, dates of birth, passport and ID numbers, vessel IMOs, and the sanctions program code (e.g., SDGT, IRAN, RUSSIA-EO14024). Refresh: Updated whenever designations or delistings occur -- often multiple times per week, within hours of major geopolitical events. Used by: Every U.S.-touching financial institution, fintech, payment processor, crypto exchange, and exporter. Outside the U.S., it is treated as a de-facto global standard.
Scraping recipe: OFAC SDN Watchlist Scraper -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
2. UN Consolidated Sanctions List
Maintained by: United Nations Security Council, via the Sanctions Committees and Secretariat.
Aggregates every individual and entity subject to sanctions imposed by the UN Security Council across its country and thematic regimes -- including Al-Qaida/ISIL (Da'esh), DPRK, Libya, Somalia, Yemen, and the 1267/1989/2253 regime.
Contents: Names, aliases, dates and places of birth, designation rationale, and the underlying resolution number. Available in XML, JSON, HTML, and PDF. Refresh: Updated within 24 hours of any Sanctions Committee decision. Used by: All 193 UN member states are legally obligated to implement these measures, making this the lowest-common-denominator screening source for any cross-border institution.
Scraping recipe: UN Sanctions Consolidated Tracker -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
3. EU Consolidated Sanctions List
Maintained by: European Commission (DG FISMA) and the European External Action Service.
Reflects every restrictive measure adopted by the Council of the European Union -- covering asset freezes, travel bans, and arms embargoes. Transposes UN measures and adds autonomous EU designations.
Contents: Subject identifiers, regulation references (e.g., (EU) No 269/2014 for Russia), legal basis, listing rationale. Distributed via the EU Sanctions Map and the Financial Sanctions Database. Refresh: Updated within hours of publication in the Official Journal; near-weekly updates since 2022 due to Russia-related amendments. Used by: EU-based banks, asset managers, insurers, MiCA-regulated crypto-asset service providers, and any non-EU firm with EU exposure.
Scraping recipe: EU Consolidated Sanctions Tracker -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
4. UK HMT Financial Sanctions List (OFSI)
Maintained by: His Majesty's Treasury, Office of Financial Sanctions Implementation.
Captures all financial sanctions in force in the UK -- the autonomous post-Brexit regime, UN measures transposed into UK law, and counter-terrorism designations under TAFA 2010 and SAMLA 2018.
Contents: Designated persons and entities with identifiers, sanctions regime, group ID, and listing date. Available in CSV, XML, and PDF on GOV.UK. Refresh: Continuous; OFSI typically publishes a notice the same day a designation takes effect. Used by: All UK persons (a defined statutory term), British nationals worldwide, and any firm conducting business through UK financial infrastructure.
Scraping recipe: UK HMT Financial Sanctions Tracker -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
5. UK FCA Enforcement Notices
Maintained by: Financial Conduct Authority (United Kingdom).
Final Notices, Decision Notices, and Supervisory Notices documenting enforcement action against authorised firms and approved individuals for conduct breaches, AML failings, market abuse, and consumer-protection failures.
Contents: Subject firm or individual, breach description, statutory basis (typically Principles for Businesses or SYSC), penalty amount, settlement discount, and a narrative factual matrix that often runs 30-80 pages. Refresh: Published as cases conclude -- historically 30-60 enforcement outcomes per year, plus dozens of supervisory notices. Used by: Compliance officers benchmarking control expectations, law firms advising on enforcement risk, and journalists tracking financial crime in the City.
Scraping recipe: UK FCA Enforcement Tracker -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
6. UK FCA Register
Maintained by: Financial Conduct Authority (United Kingdom).
The Financial Services Register is the official public record of every firm, individual, and fund the FCA and PRA regulate. It is the authoritative source for confirming whether a counterparty is authorised to conduct regulated activities in the UK.
Contents: Firm reference numbers (FRN), permitted regulated activities, individual reference numbers (IRN) for SM&CR-approved persons, trading names, and historical permissions. Refresh: Daily, as authorisations, variations of permission, and withdrawals are processed. Used by: KYC and onboarding teams, claims-management firms, and consumers verifying advisers -- the first check before any commercial relationship with a UK-regulated firm.
Scraping recipe: UK FCA Register Scraper -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
7. Australian ASIC Enforcement Actions
Maintained by: Australian Securities and Investments Commission.
ASIC's enforcement pages document actions against companies and individuals under the Corporations Act 2001, ASIC Act, and National Consumer Credit Protection Act -- covering director duty breaches, unlicensed financial services, and greenwashing.
Contents: Civil penalty proceedings, criminal prosecutions, banning orders, enforceable undertakings, and infringement notices, with media releases linking to court filings and outcomes. Refresh: Several enforcement media releases per week, as actions are commenced and resolved. Used by: Australian compliance teams, comparative regulators studying enforcement trends, and Asia-Pacific due-diligence practitioners screening directors and officers.
Scraping recipe: Australia ASIC Enforcement Tracker -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
8. US SEC EDGAR
Maintained by: U.S. Securities and Exchange Commission.
EDGAR is the SEC's filing system and the most comprehensive public archive of corporate disclosure in the world. Every registered issuer, investment adviser, hedge fund manager, and corporate insider files here.
Contents: 10-K, 10-Q, 8-K, 13F institutional holdings, Form 4 insider transactions, S-1 IPO prospectuses, Schedule 13D/G activist filings, Form D private placements, and proxy materials back to the early 1990s. Refresh: Real-time -- filings appear within minutes of acceptance, with bulk feeds and a documented REST API. Used by: Equity analysts, AML investigators tracing beneficial ownership, FCPA practitioners, and journalists covering corporate misconduct.
Scraping recipe: SEC EDGAR Filings Scraper -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
9. US Federal Register
Maintained by: Office of the Federal Register, U.S. National Archives, with the GPO.
The official daily journal of the U.S. government, publishing proposed rules, final rules, agency notices, and Presidential documents. The canonical source for tracking regulatory change at FinCEN, OFAC, the SEC, the CFTC, and every other federal agency.
Contents: Rule preambles, regulatory text, comment periods, effective dates, RIN identifiers, and CFR citations. Available as XML and JSON via the federalregister.gov API. Refresh: Every federal business day; the API exposes documents the morning of publication. Used by: Regulatory affairs teams, in-house counsel, lobbyists, and any compliance function needing early warning of new obligations.
Scraping recipe: Federal Register Rules Scraper -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
10. Companies House (United Kingdom)
Maintained by: Companies House, an executive agency of the UK Department for Business and Trade.
The UK's statutory corporate registry. Since 2016 it has operated the People with Significant Control (PSC) register -- one of the world's first open beneficial-ownership disclosures -- strengthened under the Economic Crime and Corporate Transparency Act 2023.
Contents: Company numbers, registered offices, officer histories, PSC declarations, annual accounts, confirmation statements, mortgage charges, and full filing histories with downloadable PDFs. Refresh: Near real-time as filings are accepted; the public API serves structured JSON. Used by: UK due-diligence analysts, OpenOwnership and OCCRP investigators, anti-corruption journalists, and KYC providers building UK entity-resolution graphs.
Scraping recipe: UK Companies House Officers Search -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
11. ACRA / BizFile+ (Singapore)
Maintained by: Accounting and Corporate Regulatory Authority of Singapore.
ACRA's BizFile+ is the official corporate registry for Singapore, recording every business entity registered under the Companies Act and Business Names Registration Act. The authoritative source for the Unique Entity Number (UEN) -- Singapore's universal organisational identifier.
Contents: Entity name, UEN, registration date, principal activity (SSIC code), registered address, share capital, directors, secretaries, and officers. Refresh: Continuous as filings are lodged; profile extracts are issued on demand. Used by: Southeast Asia due-diligence analysts, fund administrators screening Singaporean SPV counterparties, and MAS-regulated firms onboarding corporate clients.
Scraping recipe: Singapore ACRA / BizFile+ Company Lookup -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
12. India MCA (Ministry of Corporate Affairs)
Maintained by: Ministry of Corporate Affairs, Government of India.
MCA21 is the public face of India's corporate registry, administering filings under the Companies Act 2013 and LLP Act 2008. The only authoritative source for the Corporate Identification Number (CIN) and Director Identification Number (DIN).
Contents: Company master data, charges, directors, annual returns (MGT-7), financials (AOC-4), incorporation documents (INC-22, INC-32), and Significant Beneficial Owner (SBO) declarations. Refresh: Continuous; bulk data and structured queries are available through MCA's public services. Used by: India-focused due-diligence teams, FCPA investigators tracing Indian intermediaries, and global KYC providers building Asia coverage.
Scraping recipe: India MCA Companies Lookup -- runs on Apify, returns JSON or CSV ready for your screening pipeline.
Building a compliance workflow from these sources
No single source covers the full compliance research surface; a defensible workflow combines them in layers. The first layer is screening : every new counterparty matched against OFAC, UN, EU, and UK HMT sanctions lists, refreshed at least daily and hourly during heightened designation periods. The second is regulatory verification : confirming licensing status on the FCA Register, ASIC's professional registers, or the equivalent national authority, and checking the FCA's and ASIC's enforcement archives. The third is corporate transparency : resolving the legal entity in Companies House, ACRA, MCA, or EDGAR, walking the ownership chain to ultimate beneficial owners, and cross-checking those individuals back through the screening layer. The fourth is regulatory monitoring : tracking the Federal Register and equivalent gazettes for rule changes that affect the obligations themselves.
For deeper dives, see the companion guides on UK FCA enforcement data, sanctions data tools for due diligence, tracking ASIC enforcement actions, and company registry data tools. The full toolkit lives on the regulatory compliance data tools category page. To browse every scraper that powers these workflows, explore the NexGenData actor catalog on Apify -- each actor runs on demand or on a schedule and returns clean JSON for integration with your case-management, KYC, or analytics stack.
Frequently asked questions
Are all these data sources actually free?
Yes. Every source is published by a government body without subscription or paywall. Some -- notably Companies House and EDGAR -- also publish bulk-download files and REST APIs at no cost. The only costs are compute and engineering time.
Why scrape when I can use a paid aggregator?
Paid aggregators add entity resolution and risk scoring but are expensive, often opaque about provenance, and bound by redistribution licenses. Scraping primary sources gives full provenance, refresh control, and auditable raw records -- which matters when a regulator asks you to evidence a screening decision.
Which sources should I prioritize for AML?
Sanctions screening: OFAC SDN, UN, EU, UK HMT -- the baseline four. CDD on legal entities: the relevant national registry (Companies House, ACRA, MCA, EDGAR). Adverse-media and regulatory history: FCA and ASIC enforcement archives. Federal Register for horizon scanning.
How fresh is this data?
Sanctions lists update within hours of a designation. Corporate registries vary -- Companies House near real-time, others overnight. Enforcement archives update as cases close, which can mean weeks of latency. Scraper recipes above support scheduled refreshes -- hourly for sanctions in active periods, daily otherwise.
Can I bulk-export for offline workflows?
Yes. Every NexGenData scraper returns JSON or CSV through Apify's dataset storage, which can be downloaded in bulk, streamed via webhook into a warehouse (Snowflake, BigQuery, Postgres), or pushed to S3.
What about jurisdictions not covered here?
The twelve cover the U.S., U.K., E.U., U.N., Australia, Singapore, and India. The NexGenData catalog also includes Hong Kong, France, MAS Singapore, SEBI India, and the RBI, with additional sources added as demand emerges.
Top comments (0)