DEV Community

NexGenData
NexGenData

Posted on • Originally published at thenextgennexus.com

Company Registry Data Tools for Business Intelligence

Company registry data is the backbone of every credible KYC, KYB, and M &A workflow — yet it lives in 200+ fragmented government portals, each with its own login flow, CAPTCHA, and download cap. This guide walks BD teams, due-diligence analysts, sanctions investigators, and corporate-credit underwriters through the structured company-registry data tools we publish on Apify, which jurisdictions they cover, and how to assemble them into a production-grade entity-resolution pipeline.

1. The Problem: Company Registries Are Fragmented Across 200+ Jurisdictions

Every country runs its own corporate registry. The UK has Companies House. France has Pappers (a private aggregator over INPI/RCS). India has the Ministry of Corporate Affairs (MCA). Singapore has ACRA's BizFile+. Hong Kong has the Companies Registry's ICRIS portal. Australia has ASIC Connect. The United States has a fifty-state patchwork of Secretary-of-State filings, plus Delaware as the de facto incorporation hub. Cayman, BVI, Jersey, and Guernsey each run their own opaque, often-paywalled systems.

For a KYB analyst trying to onboard a single multinational supplier, this means logging into six portals, solving three CAPTCHAs, paying a £3 fee for one PDF in Hong Kong, and copy-pasting director names into a spreadsheet because nothing exports cleanly. For a sanctions investigator mapping the ultimate beneficial owner (UBO) of a shell-company chain across three jurisdictions, it means days of manual cross-referencing. For a sales operations team trying to enrich 50,000 inbound leads with corporate registration numbers (CRN, CIN, UEN, ABN), it means the project never ships.

The fragmentation isn't going away. Even where APIs exist (UK Companies House publishes one of the best), rate limits, schema drift, and bulk-download caps make them unworkable for production. The result: KYC/KYB workflows that can't scale, EDD that takes weeks instead of hours, and procurement teams approving vendors with no UBO visibility because lookup cost is too high. The fix is a thin, consistent layer of registry scrapers — one per jurisdiction, normalized, billed per result, runnable from CLI, n8n, Zapier, or any HTTP client.

2. Why This Data Matters: KYC, KYB, BD, M&A, Sanctions, Credit, OSINT

Structured company registry data sits underneath nearly every regulated and unregulated B2B workflow:

  • KYC onboarding — verify legal entity name, registration number, registered address, and incorporation date before opening a corporate account. A bank or fintech that skips this fails its AML program audit.
  • KYB vendor due diligence — procurement teams confirm a supplier is a real, active company with disclosed directors before signing a master services agreement.
  • BD account enrichment — sales teams append CRN, industry SIC codes, employee count, and director names to inbound leads to route them correctly and personalize outbound.
  • M &A target screening — corporate development teams build target lists by filtering registries on jurisdiction, SIC code, incorporation year, share-capital range, and director overlap with existing portfolio companies.
  • Sanctions enrichment and UBO mapping — investigators chain registry data with sanctions watchlists to surface entities whose beneficial owners appear on OFAC SDN, UK HMT, EU Consolidated, or UN sanctions lists.
  • Credit underwriting — trade credit insurers and B2B lenders pull filed accounts, charges, and director histories to score default risk.
  • OSINT and investigative journalism — reporters trace shell-company networks across the UK, France, Delaware, and offshore jurisdictions to expose money laundering, tax evasion, or political corruption.
  • Regulatory horizon scanning — compliance teams monitor enforcement registers (FCA, ASIC, MAS, SFC) for early warning that a counterparty or competitor is under investigation.

The common thread: every one of these workflows needs the data in JSON, in bulk, on a schedule, with predictable latency and cost. None of them are well-served by manual portal lookups.

3. What the Actors Extract — Registry Coverage Matrix

Below is the coverage map across the public NexGenData actor fleet. Identifiers are listed in the format each registry actually uses, because no two jurisdictions agree on what to call a company number.

Country Source Identifier Key fields extracted Best for
United Kingdom Companies House CRN (8-digit) Officers, appointments, dates of birth (partial), nationality, occupation, resigned/active status KYB, director-overlap mapping, EDD
United Kingdom Companies House PSC register CRN People with Significant Control (PSC), nature of control, share %, voting %, corporate PSCs UBO mapping, sanctions enrichment
France Pappers (RCS / INPI) SIREN (9-digit) / SIRET (14) Officers (dirigeants), share capital, NAF/APE code, registered address, RCS filings French KYB, M&A screening in EU
India MCA / OGD CIN (21-char) Master data, registered address, paid-up capital, ROC, listing status, directors (DIN), date of incorporation India KYB, supplier vetting, group-structure mapping
India MCA filings (INC-22/32) CIN / SRN Registered-office changes (INC-22), director appointments/changes (INC-32/DIR-12), filing dates Change detection, EDD trigger events
Singapore ACRA BizFile+ UEN (9-10 char) Entity name, status, address, directors, shareholders, business activities (SSIC), paid-up capital Singapore KYB, regional HQ verification
Hong Kong Companies Registry (ICRIS) CR number (7-digit) Name, status, directors, secretary, registered office, charges, annual return dates HK KYB, offshore-structure investigation
Australia ASIC Connect ACN (9-digit) / ABN (11) Entity status, registration date, type, jurisdiction, address, EX/AX flags AU KYB, ASX-adjacent due diligence
United States Multi-state Secretary of State State filing number / EIN Entity name, status, registered agent, formation date, jurisdiction, principal address US KYB across all 50 states
United States Delaware Division of Corporations File number Entity name, file number, incorporation date, status, registered agent Delaware-domiciled entity verification
China CNIPA Patent number / applicant name Applicant entity, address, IPC classification, grant/publication dates China entity discovery via IP filings (adjacent)

Output schemas are normalized across actors where possible: entity_name, jurisdiction, identifier, status, incorporation_date, registered_address, officers[], beneficial_owners[]. Source-specific fields (PSC nature of control, NAF code, SIC code, SSIC, etc.) are preserved in a raw object so downstream parsers don't lose fidelity.

4. Example Workflow — Building a UK Supplier KYB Pipeline

Imagine you're the head of procurement compliance at a mid-market UK SaaS company. You've inherited a vendor master list of 4,200 active suppliers. Internal audit wants UBO disclosure on every supplier by end of quarter, plus an exception report for anyone with adverse regulatory history. Here's the pipeline:

Step 1 — Normalize identifiers. Your ERP exports company names and VAT numbers. For UK suppliers, resolve to a CRN by name+postcode lookup or VAT-to-CRN cross-reference. Aim for >95% match rate; flag the rest for manual review.

Step 2 — Enrich officers via Companies House. Run the Business Registration Lookup actor (UK Companies House Officers actor available on request) over all 4,200 CRNs in batches of 500. You get a director-level dataset: name, role, appointed date, resigned date, date of birth (month/year), nationality, occupation. Materialize this into a suppliers_officers table.

Step 3 — Pull PSC / beneficial ownership. Run the OGD India Companies Master Data Lookup for India suppliers, and a comparable UK PSC actor over the same CRNs. PSC records give you the legally-disclosed beneficial owners with >25% ownership/voting/control, plus corporate PSCs (where a holding company is itself the PSC). For corporate PSCs, recurse: pull their PSC register, and again, until you bottom out at a natural person or hit a non-UK jurisdiction (Cayman, BVI, Jersey — flag for manual EDD).

Step 4 — Cross-check against regulatory enforcement. Run the resolved officer and beneficial-owner names against the Australia ASIC Enforcement Tracker (and the equivalent UK FCA Enforcement Tracker) for financial-services regulatory history, and against the Delaware Corporations Search for US-incorporated counterparties (and the OFAC SDN Watchlist scraper from the catalog) for US sanctions exposure. Fuzzy-match on name + DOB to avoid false positives on common names.

Step 5 — Score risk. Compute a per-supplier risk score: base score from entity status (dissolved/in-liquidation = high), director-overlap with known-bad entities, PSC concealment (corporate PSCs in opaque jurisdictions), and any enforcement hit. Anything above a threshold goes to enhanced due diligence (EDD).

Step 6 — Feed procurement tooling. Push the enriched dataset into your procurement system (Coupa, Ariba, Ivalua) as supplier attributes, and set up a weekly delta job: re-run steps 2–4 only for suppliers whose Companies House last_updated timestamp has changed, plus any new suppliers added in the past 7 days.

End-to-end this is a 1–2 day build, versus 4–6 weeks of analyst time manually. Marginal cost per supplier is well under $0.10 including all enrichment and screening calls.

5. Use Cases at a Glance

  • KYC onboarding for fintechs and banks — verify corporate customers at account opening, populate AML/CDD records, log evidence for regulator audit.
  • KYB vendor due diligence — procurement and TPRM (third-party risk management) teams baseline every new vendor before contracting.
  • Sales operations account enrichment — append registration data to inbound leads for routing, scoring, and outbound personalization.
  • M &A target screening — corp dev teams filter on jurisdiction, SIC code, age, and director patterns to build long lists.
  • Beneficial-owner mapping — investigators trace ownership chains across PSC, UEN, CIN, and offshore jurisdictions.
  • Sanctions exposure screening — chain UBO data with OFAC, HMT, EU, UN watchlists to surface indirect sanctions risk.
  • OSINT and investigative journalism — map shell-company networks, expose hidden directorships, surface political exposure (PEP-adjacent).
  • Trade credit underwriting — pull filed accounts, charges, and director histories to predict default risk.
  • Regulatory horizon scanning — monitor FCA, ASIC, MAS, SFC, and SEBI enforcement actions tied to counterparty entities.
  • Director-overlap analytics — surface directors who sit on the boards of competitors, suppliers, or distressed entities.

6. Run It Yourself — Business Registration Lookup

The fastest way to feel the difference between portal-by-portal lookups and a normalized API is to actually run one. Start with the highest-volume use case: enriching a list of US-incorporated companies with their state filing records and registered-agent data.

Run the Business Registration Lookup on Apify →

Paste a CSV of entity names or filing numbers, pick output format (JSON / CSV / Excel), and run. Results stream into your Apify dataset and can be pulled via API, webhook, n8n, Zapier, or downloaded as a single file. Pricing is pay-per-result with no monthly minimum, so you can validate the workflow on 50 companies before committing to 50,000.

For Delaware-domiciled holdcos, chain it with the Delaware Corporations Search actor — the schemas are designed to join on entity_name + state with no munging.

7. Related Actors for Cross-Jurisdiction and Risk Enrichment

Real KYB/KYC programs span jurisdictions and data domains. These actors slot into the same pipeline:

8. FAQ

Are company registries public data?

Most national corporate registries are statutorily public — the UK Companies House, French Pappers (over INPI/RCS), Singapore ACRA, India MCA, Australia ASIC, and US state Secretary-of-State systems all publish entity and officer data because incorporation law requires it. Some registries charge fees for individual document downloads (HK Companies Registry charges per filing; Cayman and BVI are largely paywalled), but the existence and basic data of a registered company is public almost everywhere. Scraping is legally distinct from the data being public — always review each registry's terms of service and applicable jurisdictional law before automating at scale.

What's UBO and how is it captured?

UBO stands for Ultimate Beneficial Owner — the natural person who ultimately owns or controls a legal entity, typically defined as >25% ownership, voting rights, or control. The UK captures this through the PSC (People with Significant Control) register, mandated since 2016. Singapore captures it via the ACRA Register of Registrable Controllers. France captures it via the RBE (Registre des Bénéficiaires Effectifs). Hong Kong via the SCR (Significant Controllers Register). Coverage and accessibility vary widely; the UK PSC register is the most open and machine-readable, which is why it's the starting point for most cross-jurisdiction UBO investigations.

Can I bulk-screen 10,000 suppliers?

Yes. The actors are designed for bulk inputs. For 10,000 UK CRNs, batch into runs of 500–1,000 and stream results via webhook into your data warehouse. End-to-end runtime for a 10K CRN officers-and-PSC pull is typically a few hours, with marginal cost in the low tens of dollars depending on the actor. For ongoing monitoring, set up a weekly delta job keyed on last_updated.

Do you cover the Cayman Islands or BVI?

Not yet for the offshore jurisdictions (Cayman, BVI, Jersey, Guernsey, Bermuda) — these registries are paywalled per-document and largely don't expose bulk lookup. The pragmatic workflow for offshore exposure: identify the offshore entity via PSC chains in covered jurisdictions (UK PSC will name a Cayman holding company as the corporate PSC), then escalate to a paid lookup service (Companies House International, OpenCorporates Premium, or a regulated EDD provider) for the offshore leg. Coverage of offshore registries is on the roadmap.

How fresh is the data?

Each actor pulls live from the source registry at run time, so freshness equals registry freshness. UK Companies House updates within minutes of a filing. Singapore ACRA and Australia ASIC update within hours to a day. India MCA can lag 24–72 hours. France Pappers refreshes daily from INPI. The actors don't cache stale data — each run is a fresh query against the source.

Can I export to my CRM or KYC platform?

Yes. Apify exposes datasets via REST API, webhooks, and direct integrations with n8n, Zapier, Make, and dozens of warehouse connectors. Common patterns: webhook to a Lambda/Cloud Function that upserts into Salesforce, HubSpot, or your KYC platform (Onfido, Sumsub, ComplyAdvantage); scheduled run with output to S3 or Google Cloud Storage for warehouse ingestion (Snowflake, BigQuery, Databricks); direct CSV download for ad-hoc analyst use.

How do you handle name fuzzy-matching across jurisdictions?

Entity names are messy — "Acme Holdings Limited" in the UK might be "Acme Holdings (HK) Ltd" in Hong Kong and "Acme Holdings Pte Ltd" in Singapore. The actors return canonical names from each registry; cross-jurisdiction resolution is the caller's responsibility. We recommend a two-stage match: (1) exact match on registration number where available (LEI, CRN, UEN, CIN), (2) fuzzy match on normalized name (lowercase, strip suffixes, strip punctuation) + address + director-overlap as a tiebreaker.

Is this compliant with GDPR?

Officer and PSC data published by national registries is public by statute, and processing it for KYC, KYB, AML, sanctions screening, and due diligence is generally a legitimate-interest or legal-obligation basis under GDPR Article 6. You remain controller for onward processing — document lawful basis, honor data-subject rights, and consult your DPO before launch.

Published as part of the NexGenData public registry data tools series. Explore the fullPublic Registry Data Tools category for jurisdiction-specific deep dives, or browse related coverage on sanctions data tools, ASIC enforcement tracking, and court records due diligence.

Top comments (0)