Build a Daily B2B Lead Feed from Spanish Corporate Filings in 3 API Calls
Every new company incorporation in Spain appears in the official corporate gazette before it hits any private database. If you are selling to Spanish businesses -- accounting software, payment processing, office space, legal services -- that gazette is your earliest possible signal. A company registered today needs everything tomorrow.
The problem: the gazette publishes PDFs, not structured data. No official API. No CSV export. No RSS feed.
Here is how to turn those daily PDFs into a structured B2B lead feed with three API calls and a cron job.
Why the Official Gazette Matters for B2B Sales
Spain requires every company incorporation, board appointment, capital change, and dissolution to be published in the official corporate gazette (Section A). The filings go up daily, and they are public by law.
The key insight for B2B sales: a new incorporation is a buying-intent signal. The company does not yet have a bank account, an accountant, an office lease, or a payroll provider. If you reach them in week one, you are competing with nobody.
Private databases (Informa, Axesor, eInforma) eventually pick up these filings, but with a delay. The gazette is the source. Why pay a subscription for stale data when you can go to the source?
The 3-Step Pipeline
DAILY CRON → Scrape today's gazette → Filter incorporations → Push to CRM
Here is the full pipeline in Python.
Step 1: Scrape today's corporate acts
The gazette publishes daily PDFs organized by province and section. Section A contains incorporations, appointments, dismissals, capital changes, dissolutions, and mergers.
We use an Apify actor that parses these PDFs into structured JSON. One API call covers all provinces for a given date.
import requests
import json
from datetime import datetime, timedelta
APIFY_TOKEN = "your_apify_token_here"
ACTOR_ID = "uBS46fLD6LVZwaxCc" # BORME Corporate Acts scraper
def scrape_borme(date_str=None, province=None):
"""Scrape BORME Section A for a given date. Returns dataset ID."""
if date_str is None:
date_str = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
payload = {
"dateFrom": date_str,
"dateTo": date_str,
}
if province:
payload["province"] = province
resp = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
json=payload,
)
resp.raise_for_status()
return resp.json()["data"]["defaultDatasetId"]
Step 2: Wait for results, then fetch
The actor runs asynchronously. Poll for completion, then pull results as JSON.
import time
def wait_for_run(run_id):
"""Poll until the actor run finishes."""
while True:
resp = requests.get(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs/{run_id}",
headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
)
status = resp.json()["data"]["status"]
if status == "SUCCEEDED":
return
if status in ("FAILED", "ABORTED", "TIMED-OUT"):
raise RuntimeError(f"Run failed: {status}")
time.sleep(10)
def fetch_results(dataset_id):
"""Pull all results from the dataset as a list of dicts."""
items = []
resp = requests.get(
f"https://api.apify.com/v2/datasets/{dataset_id}/items?format=json",
headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
)
items.extend(resp.json())
return items
Step 3: Filter for incorporations and push to CRM
Each result contains act_type, company_name, province, and metadata. Filter for incorporations ("Constitucion") and map them into your lead format.
CRM_WEBHOOK = "https://your-crm.com/api/leads"
INCORPORATION_TYPE = "Constitucion" # Spanish for incorporation
def push_incorporations(items, source_date):
"""Filter incorporations and push to CRM."""
leads = []
for item in items:
if item.get("act_type") == INCORPORATION_TYPE:
lead = {
"company_name": item.get("company_name"),
"province": item.get("province"),
"cif": item.get("cif"),
"legal_form": item.get("legal_form"),
"registration_date": item.get("registration_date"),
"source": "official_gazette",
"source_date": source_date,
}
leads.append(lead)
if leads:
# Push to your CRM, Slack, Google Sheets, or wherever leads go
resp = requests.post(
CRM_WEBHOOK,
json={"leads": leads, "date": source_date, "count": len(leads)},
)
print(f"Pushed {len(leads)} new incorporations for {source_date}")
else:
print(f"No incorporations found for {source_date}")
return leads
Tying it together
def daily_lead_feed(date_str=None):
if date_str is None:
date_str = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
# Step 1: Trigger scrape
run = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
json={"dateFrom": date_str, "dateTo": date_str},
).json()
dataset_id = run["data"]["defaultDatasetId"]
run_id = run["data"]["id"]
# Step 2: Wait and fetch
wait_for_run(run_id)
items = fetch_results(dataset_id)
# Step 3: Filter and push
leads = push_incorporations(items, date_str)
return leads
# Run it
if __name__ == "__main__":
daily_lead_feed()
Production Considerations
Schedule it daily. The gazette publishes Monday through Friday. A cron job at 9:00 CET catches the previous day's filings. Most incorporations appear within 24 hours.
Deduplicate. The same company can appear in multiple BORME sections over time (incorporation, then capital change, then board appointment). Track CIFs you have already seen so you do not re-push existing leads.
Enrich before pushing. The raw BORME data gives you company name, CIF, legal form, and province. For a richer lead, combine it with a second lookup against the commercial registry for industry codes (CNAE), address, and officer names. That is a separate API call -- worth it if you are qualifying leads before sending them to sales.
Add a Slack alert for volume. If BORME publishes 500 incorporations one day and 10 the next, your sales team should know. A simple Slack notification with a count gives them context.
What This Replaces
Most B2B teams targeting Spain either:
- Buy static lists from brokers (stale, expensive, $0.50-2.00/lead)
- Subscribe to enterprise databases (Informa, Axesor -- $500+/month contracts)
- Manually browse filings (unscalable past 10 companies)
This pipeline costs per result, no subscription, and pulls from the source before anyone else gets the data. A typical day yields 50-200 new incorporations across all provinces.
The Actor
The BORME Corporate Acts scraper parses the official daily gazette into structured JSON with fields for act type, company name, CIF, province, legal form, and registration date. Results are available as JSON or CSV.
For more European compliance and business data workflows, see the KYC onboarding pipeline for Polish companies and the cross-border insolvency watchlist for Poland and Austria.
Top comments (0)