DEV Community

Cover image for Building a KYB Pipeline for Polish Companies: The Technical Reality
getregdata
getregdata

Posted on

Building a KYB Pipeline for Polish Companies: The Technical Reality

Building a KYB Pipeline for Polish Companies: The Technical Reality

Most teams building KYB for Polish entities start with the same mental model: pull the KRS REST API for company details, hit CRBR for beneficial owners, run a sanctions check, done. That model is incomplete - and the gaps are not edge cases. They are structural. The missing pieces cover insolvency risk, financial health, and regulatory standing. Getting them requires navigating six separate government systems with six different session models, data formats, and failure modes.


The KRS API Is Not What It Looks Like

The KRS REST API (api.rejestry.ms.gov.pl) is officially documented, free, and returns clean JSON. For most company fields - legal form, registered address, NIP, REGON, share capital - it works well. The problem surfaces when you need board members.

The reprezentacja field, which should contain director names, returns this:

{
  "reprezentacja": [
    { "nazwisko": "L******", "imie": "A***" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The names are censored in the API response. The actual names exist in PDF documents inside ZIP archives, downloadable from the KRS portal. But the download endpoint is not simply authenticated. The portal generates a one-time token using a steganographic technique: a specific pixel coordinate in an HTML canvas element encodes part of the session state. That value is combined with AES-encrypted request parameters to sign the download. We reverse-engineered this flow to extract board member names as plain JSON.

This is not an accident or a bug. The portal was designed as the access method; the REST API was never intended to expose full personal data. If your KYB pipeline reads board members from the JSON API and does not notice the censored values, you are silently missing a core compliance signal.


CRBR: Strong for UBOs, Blind to Shell Chains

The Central Register of Beneficial Owners (CRBR, crbr.podatki.gov.pl) is arguably the strongest single source in the Polish KYB stack. It returns registered UBOs, ownership percentages, and the nature of control (direct, indirect, other). No proxy is needed. The data is structured and reliable.

The limitation is architectural: CRBR records who the company declared as its beneficial owner at registration. It does not trace ownership chains. A Polish operating company owned by a Luxembourg holding owned by a Cayman trust will show the Luxembourg entity or a named individual - depending on how the chain was disclosed. Verifying the chain requires cross-referencing against foreign registries. CRBR is a necessary check, not a sufficient one.


What Most Pipelines Miss: KRZ, MSiG, eKRS

KRZ - National Debtor Registry

KRZ (krz.ms.gov.pl) is a separate government system from KRS. It contains court-filed insolvency proceedings, enforcement actions, and restructuring cases. This is the live risk signal - if a company has an active creditor enforcement or is in the middle of restructuring, it will appear here before any other source picks it up.

KRZ has its own login, its own data model, and is frequently overlooked in KYB implementations that treat KRS as the canonical company registry. It is not. KRS tells you the company exists; KRZ tells you whether it is actively distressed.

MSiG - Court and Economic Gazette

MSiG (ems.ms.gov.pl) publishes formal insolvency announcements, but also company transformations, mergers, demergers, and name changes. This is the historical record - proceedings that closed before KRZ went live, or structural changes that affect entity continuity. If you are assessing a company with more than a few years of history, MSiG is relevant.

eKRS - Annual Financial Filings

eKRS (ekrs.ms.gov.pl) hosts annual financial statements submitted by Polish companies. For credit-oriented KYB - assessing payment risk, financial health trends, debt-to-equity ratios - this is the primary source. The filings are in XBRL or PDF format depending on company type and filing year. Parsing the structured versions requires schema mapping; the PDFs require extraction pipelines.


KNF: Only If Your Counterparty Is a Financial Entity

If you are onboarding a payment institution, investment firm, lending company, or any regulated financial entity, the KNF registry (knf.gov.pl) is mandatory. It covers 75,000+ licensed entities and their authorization status. An entity that presents itself as a licensed payment institution but does not appear in KNF is a red flag that no other source will catch.


The Data Source Landscape

Source What It Proves for KYB Data Quality Key Gotcha
KRS REST API Legal identity, incorporation, share capital High for metadata Board member names censored - requires PDF extraction
CRBR Registered UBO, ownership %, control type High for disclosed structure Does not trace shell chains; reflects declared, not verified, ownership
KRZ Active insolvency, enforcement, restructuring High for current proceedings Separate system from KRS; commonly omitted from pipelines
MSiG Historical insolvency, mergers, transformations High for historical record Required for companies with pre-KRZ history
eKRS Annual financials, revenue, debt structure Medium - depends on filing type XBRL vs PDF formats require separate parsers
KNF Regulatory license status High Only relevant for regulated financial entities

KYB Signal Coverage by Source

Compliance Requirement KRS CRBR KRZ MSiG eKRS KNF
Legal entity verification Yes - - - - -
Board member identity Yes (PDF) - - - - -
UBO / beneficial ownership Partial Yes - - - -
Active insolvency risk - - Yes Partial - -
Historical insolvency - - Partial Yes - -
Financial health - - - - Yes -
Regulatory license status - - - - - Yes
Entity name history - - - Yes - -

The Integration Problem Is Entity Resolution

Each of these systems uses a different primary key. KRS uses its own KRS number. NIP (tax ID) and REGON (statistical ID) appear in KRS but are the primary keys in other systems. KRZ uses NIP. CRBR accepts NIP or KRS number. MSiG uses company name - which is the least stable identifier since names change on merger and rebranding.

In practice, this means your pipeline needs an entity resolution layer. You cannot JOIN these sources directly. The same company appears as "ACME Sp. z o.o." in MSiG, a 10-digit NIP in KRZ, and a 10-character KRS number in CRBR. A name normalization step plus NIP as the canonical key is the practical approach - but you still need to handle historical name variants and post-merger continuity.

Session management adds another layer of complexity. Some endpoints issue time-limited tokens that expire mid-batch. Some use CAPTCHA challenges on the first request of a session. KRS PDF downloads require the canvas token replay described above. Each source needs its own session handler, and failures in one do not propagate cleanly to others - you need source-level retry logic, not pipeline-level.


What Complete Looks Like

A production KYB pipeline for Polish entities needs all six sources. KRS for legal identity and board members (with PDF extraction). CRBR for disclosed UBOs. KRZ for active insolvency. MSiG for historical proceedings. eKRS for financial health. KNF if the entity is in a regulated sector.

Omitting any one of these is not a configuration choice - it is a coverage gap that will surface as a false negative on a distressed or non-compliant entity.


Tools and Resources

If you are building this pipeline, the following Apify actors handle the extraction and session complexity for three of the most technically challenging sources:

Each returns structured JSON and handles the session management, token replay, and proxy requirements for its source. They are pay-per-use with no subscriptions.

Top comments (0)