<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Paweł Sobkowiak</title>
    <description>The latest articles on DEV Community by Paweł Sobkowiak (@pawe_sobkow).</description>
    <link>https://dev.to/pawe_sobkow</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3953226%2Facbc6a06-7825-4b23-952a-70383fb8fb5d.jpg</url>
      <title>DEV Community: Paweł Sobkowiak</title>
      <link>https://dev.to/pawe_sobkow</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pawe_sobkow"/>
    <language>en</language>
    <item>
      <title>I built a search engine for 3 million Polish businesses — here's what I learned</title>
      <dc:creator>Paweł Sobkowiak</dc:creator>
      <pubDate>Tue, 26 May 2026 20:50:22 +0000</pubDate>
      <link>https://dev.to/pawe_sobkow/i-built-a-search-engine-for-3-million-polish-businesses-heres-what-i-learned-54pn</link>
      <guid>https://dev.to/pawe_sobkow/i-built-a-search-engine-for-3-million-polish-businesses-heres-what-i-learned-54pn</guid>
      <description>&lt;p&gt;Poland has over 3 million registered businesses spread across two separate public registries — KRS (corporations) and CEIDG (sole proprietorships). Finding reliable data about a Polish company used to mean navigating slow government portals, dealing with inconsistent data formats, and manually cross-referencing multiple sources.&lt;br&gt;
So I built nipgo.pl to fix that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;&lt;br&gt;
If you're a B2B sales person, accountant, or procurement manager in Poland, verifying a contractor means:&lt;/p&gt;

&lt;p&gt;Going to the KRS portal — slow, no API-friendly interface&lt;br&gt;
Checking CEIDG separately — different format, different search&lt;br&gt;
Cross-referencing VAT status on the Ministry of Finance whitelist&lt;br&gt;
Manually checking if the company has any public procurement history&lt;/p&gt;

&lt;p&gt;This is painful. Especially when you need to do it for 50 companies a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What nipgo.pl does&lt;/strong&gt;&lt;br&gt;
nipgo.pl aggregates all of this into one search:&lt;/p&gt;

&lt;p&gt;700k+ KRS entities (corporations, partnerships, foundations)&lt;br&gt;
2.6M+ CEIDG entities (sole proprietorships)&lt;br&gt;
VAT status from the Ministry of Finance&lt;br&gt;
Public procurement history (BZP tenders since 2021)&lt;br&gt;
Public subsidies and grants (SUDOP registry)&lt;br&gt;
Contact data scraped from public sources&lt;br&gt;
AI-generated company summaries&lt;/p&gt;

&lt;p&gt;Search by company name, NIP (tax ID), REGON, phone number, email, domain, or owner name. Filter by industry (PKD code), region, legal form, registration date, or capital amount.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The data challenge&lt;/strong&gt;&lt;br&gt;
The hardest part wasn't building the UI — it was the data.&lt;br&gt;
KRS API returns asterisked names for natural persons (GDPR compliance since 2023). Getting full names requires authenticated scraping of PDF registry documents — each one a different format depending on when the company was registered.&lt;br&gt;
CEIDG has ~2.6M records across ~50,000 paginated API pages. Running that takes weeks and requires careful rate limit management across multiple API tokens.&lt;br&gt;
PKD codes (Polish industry classification) exist in two formats — pre-2015 companies use a nested array format, newer ones use flat objects. Handling both without crashes took more debugging than I'd like to admit.&lt;br&gt;
VAT whitelist has an Imperva WAF that limits requests to ~1,400/day from a single IP. Batch endpoints return zero results in practice. Individual lookups only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd do differently&lt;/strong&gt;&lt;br&gt;
Start with the data pipeline, not the UI. I spent too much time on the frontend before the data was clean enough to display. A beautiful UI on top of messy data is useless.&lt;br&gt;
Build keyset pagination from day one. OFFSET-based pagination on 2.6M records causes timeout hell at high offsets. Switching to keyset pagination (cursor-based) was a painful but necessary refactor.&lt;br&gt;
Monitor everything early. Data quality issues in public registries are invisible until a user hits an edge case — a company registered in 1994 with a completely different JSON structure, a CEIDG record with a null NIP, a PKD code from a deprecated classification system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current state&lt;/strong&gt;&lt;br&gt;
The platform is live at &lt;a href="https://nipgo.pl" rel="noopener noreferrer"&gt;nipgo.pl&lt;/a&gt; with a freemium model:&lt;/p&gt;

&lt;p&gt;Free — basic search and registry data&lt;br&gt;
Basic — contact data, CSV export, monitoring, CRM&lt;br&gt;
Pro — financial reports, risk scoring, full history&lt;/p&gt;

&lt;p&gt;Still a lot to build — financial statements, ownership graphs, automated change alerts. But the core data is there and it works.&lt;/p&gt;

&lt;p&gt;If you're building something similar for another country's business registry, happy to share what I've learned. Drop a comment or reach out at &lt;a href="mailto:hello@nipgo.pl"&gt;hello@nipgo.pl&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Built with: Next.js, Supabase (PostgreSQL), Python scrapers, Vercel&lt;br&gt;
Data: KRS API, CEIDG API, MF VAT Whitelist, BZP, SUDOP&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>python</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
