DEV Community

getregdata
getregdata

Posted on

Automated Lead Feed: Get New Spanish Company Incorporations Daily from BORME

Automated Lead Feed: Get New Spanish Company Incorporations Daily from BORME

The official public Spanish corporate gazette publishes every new company incorporation in Spain. Every single day. Structured data - company name, tax ID, founding officers, registered capital, province. All of it. Public record, freely available.

There's just one problem: it publishes as PDFs. Nobody wants to open 50 PDFs every morning and copy-paste names into a CSV.

Here's a Python pipeline that turns the daily gazette into a CRM-ready lead feed. Run it every morning, get yesterday's incorporations in a CSV. Zero manual work.

What you get

Every morning, a CSV with these fields for each new Spanish company:

  • Company name
  • NIF (Spanish tax ID)
  • Province
  • Founding officers (names and roles)
  • Registered capital
  • Legal form (SL, SA, etc.)
  • BORME publication date

500-1000 new incorporations per day across Spain. All of it structured and ready for your CRM.

The pipeline (3 steps)

1. Scrape today's BORME Section A

Section A of the daily gazette contains all corporate acts: incorporations, officer appointments, capital changes, dissolutions. We filter for incorporations only.

The actor that handles this is the BORME Corporate Acts Parser. It parses the official PDFs into structured JSON - company name, NIF, officers, capital. No browser automation needed, it works over HTTP.

Input: today's date, acts filter set to "incorporations".
Output: JSON array with every new Spanish company registered yesterday.

import requests
import csv
from datetime import date, timedelta

APIFY_TOKEN = "your-apify-token"
ACTOR_ID = "uBS46fLD6LVZwaxCc"

# Request yesterday's incorporations (gazette publishes with 1-day lag)
target_date = (date.today() - timedelta(days=1)).isoformat()

run = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
    headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
    json={
        "date": target_date,
        "actsFilter": ["incorporations"],
    },
).json()

run_id = run["data"]["id"]
print(f"Run started: {run_id}")
Enter fullscreen mode Exit fullscreen mode

2. Wait, then fetch the results

The run takes 30-90 seconds depending on how many incorporations were published. Poll for completion:

import time

while True:
    status = requests.get(
        f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
    ).json()
    if status["data"]["status"] == "SUCCEEDED":
        break
    time.sleep(5)

# Fetch structured results
results = requests.get(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}/dataset/items",
    headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
).json()

print(f"Extracted {len(results)} new incorporations")

# Sample result:
# {
#   "companyName": "TECHSOLUTIONS MADRID SL",
#   "nif": "B12345678",
#   "province": "Madrid",
#   "officers": [
#     {"name": "GARCIA LOPEZ MARIA", "role": "Administrador unico"},
#     {"name": "RODRIGUEZ PEREZ JUAN", "role": "Apoderado"}
#   ],
#   "registeredCapital": "3000.00 EUR",
#   "legalForm": "Sociedad Limitada",
#   "actType": "Constitucion",
#   "publicationDate": "2026-06-17"
# }
Enter fullscreen mode Exit fullscreen mode

3. Flatten and export to CSV

BORME's JSON output includes arrays (officers, capital details). Flatten them into CSV rows:

def flatten_incorporation(inc):
    """Convert a BORME incorporation to a flat CSV row."""
    officers = inc.get("officers", [])
    admin = next((o["name"] for o in officers if "admin" in o.get("role", "").lower()), "")
    all_officers = "; ".join(f"{o['name']} ({o['role']})" for o in officers)

    return {
        "company_name": inc.get("companyName", ""),
        "nif": inc.get("nif", ""),
        "province": inc.get("province", ""),
        "legal_form": inc.get("legalForm", ""),
        "registered_capital": inc.get("registeredCapital", ""),
        "admin_name": admin,
        "all_officers": all_officers,
        "publication_date": inc.get("publicationDate", ""),
        "act_type": inc.get("actType", ""),
    }

rows = [flatten_incorporation(inc) for inc in results]

# Write CSV
with open("borme_leads.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=rows[0].keys())
    writer.writeheader()
    writer.writerows(rows)

print(f"Wrote {len(rows)} leads to borme_leads.csv")
Enter fullscreen mode Exit fullscreen mode

Run it daily

Wrap this in a cron job or a scheduled GitHub Action and you have a hands-free lead feed:

# .github/workflows/borme-daily.yml
name: BORME Daily Lead Feed
on:
  schedule:
    - cron: "0 7 * * 1-5"  # 7 AM UTC, Mon-Fri
jobs:
  fetch:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install requests
      - run: python borme_feed.py
      - uses: actions/upload-artifact@v3
        with:
          name: borme-leads
          path: borme_leads.csv
Enter fullscreen mode Exit fullscreen mode

Use it as an AI agent skill

If you're using agent-based tools, the workflow is packaged as an installable skill in the getregdata repo. Install it once and trigger it with natural language:

git clone https://github.com/Nolpak14/getregdata
Enter fullscreen mode Exit fullscreen mode

The regdata-lead-gen skill covers BORME daily incorporation feeds plus director extraction from KRS, WKO Austrian business directory, and Spanish Registro Mercantil company profiles. One skill, four countries of lead-gen data.

"Pull yesterday's new Spanish incorporations and save them as a CSV"
Enter fullscreen mode Exit fullscreen mode

The skill handles date calculation, actor selection, result fetching, and CSV export - you just describe what you want.

What this actually costs

BORME's pricing is $0.003 per corporate act extracted. Spain averages 500-1000 new incorporations per day. That's $1.50-$3.00 per day for a complete feed of every new Spanish company.

For comparison, B2B lead databases charge $500-$2,000/month for Spanish company data that's typically refreshed quarterly, not daily. You're paying 50-100x more for stale data.

And because BORME publishes the same corporate acts for ALL companies - not just the ones that paid to be listed - you get 100% coverage. No "premium tier" companies. Every incorporation, every officer appointment, every capital change.

From lead to qualified prospect

A new incorporation is a signal, but it's a thin signal. You know the company exists and who runs it, but not much else.

For deeper qualification, the Spain Company Directory Scraper pulls full profiles from the official Spanish company registry: CNAE industry codes, full officer history, company status, registered address.

Pipeline logic:

  1. BORME feed identifies new incorporations (daily)
  2. For promising leads, look up the full registry profile (one-off)
  3. Enrich with CNAE code, verify the company is still active, get full officer list
  4. Load into your CRM with industry classification and contact data

The BORME actor is $0.003 per result. The directory actor is $0.005 per result. You can feed 1,000 daily incorporations and deep-dive 50 promising ones for about $3.25 total.

What this replaces

If you're currently doing any of these, this pipeline makes them obsolete:

  • Manually checking the official gazette website every morning
  • Paying a lead database $500+/month for quarterly-refreshed Spanish company data
  • Missing new incorporations because you only check once a week
  • Buying lists that include dissolved companies (happens more than you'd think - the gazette publishes dissolutions too, but lead databases are slow to remove them)

Gotchas

  • The gazette publishes with a 1-day lag. Monday's run covers Friday's publications.
  • Provincial courts publish on slightly different schedules. Madrid and Barcelona account for ~40% of all incorporations.
  • Some incorporations appear in Section A, some in Section B. The actor handles Section A (the main corporate acts section) and is the most reliable for new company data.
  • NIF format is validated by the parser, but always spot-check a few against the official registry if you're using this for compliance purposes.

Full script

The complete pipeline is on GitHub: Nolpak14/getregdata - look under skills/regdata-lead-gen/.

Everything above is one pip install requests away from your CRM having fresh Spanish leads every morning.

Top comments (0)