DEV Community

Alessandro Binda
Alessandro Binda

Posted on

I aggregated 250M+ company records from 40+ government registries — here are the tools I built

Last year I started a project to build a unified database of company information from official government business registries.

After crawling 40+ sources across Europe and beyond, I now have 250 million company records with revenue, employees, credit scores, and financial data.

The whole thing is open source. Here's what I built and how you can use it.

The problem

If you've ever needed company data for a sales pipeline, due diligence, market research, or just curiosity — you know the options:

  • Paid APIs (Dun & Bradstreet, Clearbit, ZoomInfo): $15K-50K/year
  • Government registries: Free, but each country has its own format, its own API (or PDF-only portal), and its own quirks
  • Manual research: Copy-paste from company websites, LinkedIn, etc.

I wanted one API that covers everything.

What I built

1. enrich-companies (CLI)

The simplest tool: give it a CSV with company names, get back the same CSV with 16 extra columns.

# Node.js
npx enrich-companies companies.csv -o enriched.csv

# Python
pip install enrich-companies
enrich-companies companies.csv -o enriched.csv
Enter fullscreen mode Exit fullscreen mode

Output:

  Enriching 3 companies from companies.csv...

  [1/3] Ferrero — Revenue: €17B | Employees: 41,000 | Score: 92
  [2/3] Siemens — Revenue: €72B | Employees: 311,000 | Score: 88
  [3/3] LVMH — Revenue: €86B | Employees: 213,000 | Score: 95

  Done! 3/3 companies enriched
Enter fullscreen mode Exit fullscreen mode

It auto-detects the company name column (works with company, name, business_name, firma, empresa, azienda, etc.), and adds:

Column Example
revenue 12500000
employees 340
health_score 78.5
nace_code 56.10
legal_form S.r.l.
status active
vat_number IT12345678901
founded 2015-03-12
city, country, website, phone, email ...

No API key needed. Free tier: 50 lookups/month.

npm: enrich-companies
PyPI: enrich-companies
GitHub: Alessandro114/enrich-companies


2. MCP Server (for AI agents)

If you use Claude, ChatGPT, or any AI agent — this MCP server lets the AI search and look up company data directly.

npx scala-mcp-server
Enter fullscreen mode Exit fullscreen mode

Add to your Claude Desktop config:

{
  "mcpServers": {
    "scala-score": {
      "command": "npx",
      "args": ["-y", "scala-mcp-server"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then ask Claude: "Find all active restaurants in Milan with more than 50 employees" — and it queries the database directly.

npm: scala-mcp-server
GitHub: Alessandro114/scala-mcp-server


3. Python SDK

from scala_score import ScalaScore

score = ScalaScore("your-api-key")
results = score.search("Ferrero", country="IT")

for company in results.companies:
    print(f"{company.name} — Revenue: {company.revenue}")
Enter fullscreen mode Exit fullscreen mode

PyPI: scala-score


4. Chrome Extension

Right-click any company name on a webpage → instant lookup with revenue, employees, credit score.

Score Company Lookup on Chrome Web Store


5. Bulk Dataset

The full dataset is available for download:


Data coverage

  • 250M+ companies across 50+ countries
  • Strong coverage: Italy, Germany, France, Spain, UK, Netherlands, Belgium, Austria, Switzerland, Czech Republic, Poland, Romania, US, and more
  • Sources: official business registries, financial filings, public records
  • Updated regularly from government open data portals

What's next

I'm actively adding more countries and improving data quality. Contributions are welcome — especially for countries where registry data is hard to access.

If you try any of these tools, I'd love to hear your feedback. And if you find them useful, a GitHub star helps more than you think.


All tools are MIT licensed. The API has a free tier (50 lookups/month) with no signup required.

Top comments (0)