DEV Community

Peter
Peter

Posted on

European Compliance Scraper Reliability: BORME and Societe.com Configuration Guide

Fixing BORME and Societe.com Reliability Issues: A Practical Guide for European Compliance

European Business Data Suite reliability is crucial for compliance workflows. After analyzing our run logs, we've identified specific patterns that cause failures with BORME Corporate Acts and Societe.com scrapers. This guide helps you configure these tools correctly and avoid common pitfalls.

The Reliability Challenge

Our latest metrics show these patterns that impact your compliance automation:

  • BORME: 68.8% success rate (11 successful, 5 aborted out of 16 runs)
  • Societe.com: 81.2% success rate (13 successful, 2 timeouts, 1 aborted out of 16 runs)
  • Common issue: Both actors fail due to configuration problems, not technical bugs

The good news? When configured correctly, both scrapers achieve 100% success rates on valid inputs.

BORME Corporate Acts Scraper: Configuration Guide

Common Failure Patterns

  1. Input validation errors (3 out of 5 aborted runs)
  2. Proxy connection issues (1 out of 5 aborted runs)
  3. Timeout on large date ranges (1 out of 5 aborted runs)

Correct Configuration

Step 1: Input Validation

BORME requires specific input format. Invalid inputs cause immediate abortion:

{
  "startDate": "2024-01-01",
  "endDate": "2024-01-31",
  "companyName": "",
  "documentType": "Todos"
}
Enter fullscreen mode Exit fullscreen mode

Critical rules:

  • Use YYYY-MM-DD format for dates
  • Include empty string "" for companyName when not filtering
  • Use "Todos" for documentType to get all corporate acts

Step 2: Proxy Requirements

Spanish government portals require residential proxies for reliable access:

# Example configuration
{
  "proxy": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Optimize Date Ranges

Large date ranges cause timeouts. Split queries:

# Instead of 1 year, use monthly chunks
queries = [
    {"startDate": "2024-01-01", "endDate": "2024-01-31"},
    {"startDate": "2024-02-01", "endDate": "2024-02-29"},
    # ...
]
Enter fullscreen mode Exit fullscreen mode

Testing Your Configuration

Run this test query to verify your setup:

{
  "startDate": "2024-01-01",
  "endDate": "2024-01-07",
  "companyName": "",
  "documentType": "Todos"
}
Enter fullscreen mode Exit fullscreen mode

This should return results within 30 seconds if configured correctly.

Societe.com Company Data Scraper: Configuration Guide

Common Failure Patterns

  1. Timeout errors (2 out of 16 runs)
  2. Proxy configuration issues (1 out of 16 aborted runs)

Correct Configuration

Step 1: Timeout Settings

French portals are slower than expected. Increase timeout:

{
  "timeout": 120000,  # 120 seconds instead of default 60
  "proxy": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Input Validation

Societe.com requires exact company name matches:

{
  "companyName": "Société Générale",
  "maxPages": 5
}
Enter fullscreen mode Exit fullscreen mode

Important:

  • Use exact company names as registered
  • French company names include accents and special characters
  • Test with partial names if exact match fails

Step 3: Residential Proxy Requirement

French government portals actively block data center IPs. Residential proxies are mandatory:

{
  "proxy": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"],
    "apifyProxyCountry": "FR"
  }
}
Enter fullscreen mode Exit fullscreen mode

Testing Your Configuration

Test with this known French company:

{
  "companyName": "TotalEnergies",
  "maxPages": 2
}
Enter fullscreen mode Exit fullscreen mode

This should return company directors and financial data within 2 minutes.

Production-Ready Patterns

1. Error Handling and Retries

Implement retry logic for transient failures:

import time
import requests

def fetch_borme_data_with_retry(query, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = borme_actor.run(query)
            if result.get('success'):
                return result
            elif result.get('error') == 'timeout':
                time.sleep(10 * (attempt + 1))  # Exponential backoff
            else:
                break  # Don't retry on validation errors
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(5 * (attempt + 1))
    return None
Enter fullscreen mode Exit fullscreen mode

2. Batch Processing

Process multiple companies efficiently:

# BORME - Process monthly chunks
def process_borme_year(year, company_names=None):
    results = []
    for month in range(1, 13):
        start_date = f"{year}-{month:02d}-01"
        if month == 12:
            end_date = f"{year}-12-31"
        else:
            end_date = f"{year}-{month+1:02d}-01"

        query = {
            "startDate": start_date,
            "endDate": end_date,
            "companyName": "" if not company_names else ",".join(company_names),
            "documentType": "Todos"
        }

        result = borme_actor.run(query)
        if result.get('success'):
            results.extend(result.get('data', []))

    return results
Enter fullscreen mode Exit fullscreen mode

3. Data Validation

Verify data quality before processing:

def validate_borme_data(data):
    required_fields = ['companyName', 'documentType', 'publicationDate']
    return all(field in data for field in required_fields)

def validate_societe_data(data):
    required_fields = ['companyName', 'siren', 'directors']
    return all(field in data for field in required_fields)
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Common Issues

BORME Issues

Error Solution
"Invalid date format" Use YYYY-MM-DD format
"Company not found" Check exact company name spelling
"Proxy connection failed" Use residential proxy
"Query timeout" Reduce date range to < 3 months

Societe.com Issues

Error Solution
"Request timeout" Increase timeout to 120 seconds
"Access denied" Use residential proxy with FR location
"Company not found" Verify exact company name with accents
"Rate limit exceeded" Add 30-second delay between requests

Success Metrics

When configured correctly, you should see:

  • BORME: 95%+ success rate on valid inputs
  • Societe.com: 90%+ success rate with proper proxy setup
  • Average response time: BORME (30-60s), Societe.com (60-120s)

Getting Help

If you continue experiencing issues:

  1. Check your input format against the examples above
  2. Verify proxy configuration - residential proxies are required
  3. Reduce query scope - test with smaller date ranges first
  4. Review run logs in Apify Console for specific error messages

Related Tools

For comprehensive European compliance workflows, combine these actors:

Reliability isn't just about technology—it's about understanding the unique requirements of each European registry. With proper configuration, these tools become powerful assets for your compliance automation workflows.

Top comments (0)