DEV Community

getregdata
getregdata

Posted on

Build a Daily B2B Lead Feed from Spanish Corporate Filings in 3 API Calls

Build a Daily B2B Lead Feed from Spanish Corporate Filings in 3 API Calls

Every new company incorporation in Spain appears in the official corporate gazette before it hits any private database. If you are selling to Spanish businesses -- accounting software, payment processing, office space, legal services -- that gazette is your earliest possible signal. A company registered today needs everything tomorrow.

The problem: the gazette publishes PDFs, not structured data. No official API. No CSV export. No RSS feed.

Here is how to turn those daily PDFs into a structured B2B lead feed with three API calls and a cron job.

Why the Official Gazette Matters for B2B Sales

Spain requires every company incorporation, board appointment, capital change, and dissolution to be published in the official corporate gazette (Section A). The filings go up daily, and they are public by law.

The key insight for B2B sales: a new incorporation is a buying-intent signal. The company does not yet have a bank account, an accountant, an office lease, or a payroll provider. If you reach them in week one, you are competing with nobody.

Private databases (Informa, Axesor, eInforma) eventually pick up these filings, but with a delay. The gazette is the source. Why pay a subscription for stale data when you can go to the source?

The 3-Step Pipeline

DAILY CRON → Scrape today's gazette → Filter incorporations → Push to CRM
Enter fullscreen mode Exit fullscreen mode

Here is the full pipeline in Python.

Step 1: Scrape today's corporate acts

The gazette publishes daily PDFs organized by province and section. Section A contains incorporations, appointments, dismissals, capital changes, dissolutions, and mergers.

We use an Apify actor that parses these PDFs into structured JSON. One API call covers all provinces for a given date.

import requests
import json
from datetime import datetime, timedelta

APIFY_TOKEN = "your_apify_token_here"
ACTOR_ID = "uBS46fLD6LVZwaxCc"  # BORME Corporate Acts scraper

def scrape_borme(date_str=None, province=None):
    """Scrape BORME Section A for a given date. Returns dataset ID."""
    if date_str is None:
        date_str = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

    payload = {
        "dateFrom": date_str,
        "dateTo": date_str,
    }
    if province:
        payload["province"] = province

    resp = requests.post(
        f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
        json=payload,
    )
    resp.raise_for_status()
    return resp.json()["data"]["defaultDatasetId"]
Enter fullscreen mode Exit fullscreen mode

Step 2: Wait for results, then fetch

The actor runs asynchronously. Poll for completion, then pull results as JSON.

import time

def wait_for_run(run_id):
    """Poll until the actor run finishes."""
    while True:
        resp = requests.get(
            f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}",
            headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
        )
        status = resp.json()["data"]["status"]
        if status == "SUCCEEDED":
            return
        if status in ("FAILED", "ABORTED", "TIMED-OUT"):
            raise RuntimeError(f"Run failed: {status}")
        time.sleep(10)


def fetch_results(dataset_id):
    """Pull all results from the dataset as a list of dicts."""
    items = []
    resp = requests.get(
        f"https://api.apify.com/v2/datasets/{dataset_id}/items?format=json",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
    )
    items.extend(resp.json())
    return items
Enter fullscreen mode Exit fullscreen mode

Step 3: Filter for incorporations and push to CRM

Each result contains act_type, company_name, province, and metadata. Filter for incorporations ("Constitucion") and map them into your lead format.

CRM_WEBHOOK = "https://your-crm.com/api/leads"
INCORPORATION_TYPE = "Constitucion"  # Spanish for incorporation

def push_incorporations(items, source_date):
    """Filter incorporations and push to CRM."""
    leads = []
    for item in items:
        if item.get("act_type") == INCORPORATION_TYPE:
            lead = {
                "company_name": item.get("company_name"),
                "province": item.get("province"),
                "cif": item.get("cif"),
                "legal_form": item.get("legal_form"),
                "registration_date": item.get("registration_date"),
                "source": "official_gazette",
                "source_date": source_date,
            }
            leads.append(lead)

    if leads:
        # Push to your CRM, Slack, Google Sheets, or wherever leads go
        resp = requests.post(
            CRM_WEBHOOK,
            json={"leads": leads, "date": source_date, "count": len(leads)},
        )
        print(f"Pushed {len(leads)} new incorporations for {source_date}")
    else:
        print(f"No incorporations found for {source_date}")

    return leads
Enter fullscreen mode Exit fullscreen mode

Tying it together

def daily_lead_feed(date_str=None):
    if date_str is None:
        date_str = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

    # Step 1: Trigger scrape
    run = requests.post(
        f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
        json={"dateFrom": date_str, "dateTo": date_str},
    ).json()

    dataset_id = run["data"]["defaultDatasetId"]
    run_id = run["data"]["id"]

    # Step 2: Wait and fetch
    wait_for_run(run_id)
    items = fetch_results(dataset_id)

    # Step 3: Filter and push
    leads = push_incorporations(items, date_str)

    return leads

# Run it
if __name__ == "__main__":
    daily_lead_feed()
Enter fullscreen mode Exit fullscreen mode

Production Considerations

Schedule it daily. The gazette publishes Monday through Friday. A cron job at 9:00 CET catches the previous day's filings. Most incorporations appear within 24 hours.

Deduplicate. The same company can appear in multiple BORME sections over time (incorporation, then capital change, then board appointment). Track CIFs you have already seen so you do not re-push existing leads.

Enrich before pushing. The raw BORME data gives you company name, CIF, legal form, and province. For a richer lead, combine it with a second lookup against the commercial registry for industry codes (CNAE), address, and officer names. That is a separate API call -- worth it if you are qualifying leads before sending them to sales.

Add a Slack alert for volume. If BORME publishes 500 incorporations one day and 10 the next, your sales team should know. A simple Slack notification with a count gives them context.

What This Replaces

Most B2B teams targeting Spain either:

  • Buy static lists from brokers (stale, expensive, $0.50-2.00/lead)
  • Subscribe to enterprise databases (Informa, Axesor -- $500+/month contracts)
  • Manually browse filings (unscalable past 10 companies)

This pipeline costs per result, no subscription, and pulls from the source before anyone else gets the data. A typical day yields 50-200 new incorporations across all provinces.

The Actor

The BORME Corporate Acts scraper parses the official daily gazette into structured JSON with fields for act type, company name, CIF, province, legal form, and registration date. Results are available as JSON or CSV.

View on Apify Store


For more European compliance and business data workflows, see the KYC onboarding pipeline for Polish companies and the cross-border insolvency watchlist for Poland and Austria.

Top comments (0)