Scraping Open Banking Data: PSD2 APIs and Screen Scraping

#python #tutorial #webdev #programming

The European PSD2 directive opened banking data to third-party providers, creating a massive opportunity for fintech developers. But what happens when official APIs fall short? Screen scraping fills the gap.

Understanding the PSD2 Landscape

PSD2 requires banks to expose Account Information Services (AIS) and Payment Initiation Services (PIS) through APIs. However, many banks provide inconsistent implementations, rate-limit aggressively, or lag behind on compliance. Scraping public-facing banking portals can supplement official channels for market research and competitive analysis.

Setting Up the Scraper

First, install the required libraries:

pip install requests beautifulsoup4 pandas

Here's a scraper that collects publicly available banking product data:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

class OpenBankingScraper:
    def __init__(self, api_key):
        self.session = requests.Session()
        self.base_url = "http://api.scraperapi.com"
        self.api_key = api_key

    def scrape_bank_products(self, bank_url):
        params = {"api_key": self.api_key, "url": bank_url}
        response = self.session.get(self.base_url, params=params)
        soup = BeautifulSoup(response.text, "html.parser")

        products = []
        for card in soup.select(".product-card, .account-type"):
            name = card.select_one("h2, h3")
            rate = card.select_one(".rate, .apr")
            fees = card.select_one(".fee, .monthly-cost")
            if name:
                products.append({
                    "name": name.text.strip(),
                    "rate": rate.text.strip() if rate else "N/A",
                    "fees": fees.text.strip() if fees else "N/A"
                })
        return products

    def compare_rates(self, bank_urls):
        all_products = []
        for url in bank_urls:
            products = self.scrape_bank_products(url)
            all_products.extend(products)
            time.sleep(2)
        return pd.DataFrame(all_products)

# Usage
scraper = OpenBankingScraper("YOUR_SCRAPERAPI_KEY")
banks = [
    "https://example-bank.com/savings-accounts",
    "https://example-bank.com/checking-accounts"
]
df = scraper.compare_rates(banks)
print(df.to_string(index=False))

Handling Anti-Bot Protections

Banks invest heavily in bot detection. Using a proxy rotation service like ScraperAPI handles JavaScript rendering, CAPTCHA solving, and IP rotation automatically. For high-volume collection, ThorData provides residential proxies that mimic real user traffic patterns.

Building a Rate Comparison Pipeline

import schedule
import json

def daily_rate_check():
    scraper = OpenBankingScraper("YOUR_KEY")
    urls = load_bank_urls("banks.json")
    df = scraper.compare_rates(urls)

    # Detect rate changes
    previous = pd.read_csv("previous_rates.csv")
    changes = df.merge(previous, on="name", suffixes=("_new", "_old"))
    changes = changes[changes["rate_new"] != changes["rate_old"]]

    if not changes.empty:
        send_alert(changes.to_dict("records"))
    df.to_csv("previous_rates.csv", index=False)

schedule.every().day.at("08:00").do(daily_rate_check)

Monitoring with ScrapeOps

Track your scraper's health with ScrapeOps — monitor success rates, response times, and costs across all your banking data jobs.

Legal Considerations

Always scrape only publicly available data. PSD2 explicitly allows screen scraping as a fallback mechanism (Article 10 of the RTS), but respect robots.txt and terms of service. Use data for analysis and research, not unauthorized account access.

Conclusion

Combining PSD2 APIs with targeted scraping creates a powerful toolkit for fintech market research. Start with official APIs where available, fall back to scraping for gaps, and always prioritize data accuracy and compliance.