The European PSD2 directive opened banking data to third-party providers, creating a massive opportunity for fintech developers. But what happens when official APIs fall short? Screen scraping fills the gap.
Understanding the PSD2 Landscape
PSD2 requires banks to expose Account Information Services (AIS) and Payment Initiation Services (PIS) through APIs. However, many banks provide inconsistent implementations, rate-limit aggressively, or lag behind on compliance. Scraping public-facing banking portals can supplement official channels for market research and competitive analysis.
Setting Up the Scraper
First, install the required libraries:
pip install requests beautifulsoup4 pandas
Here's a scraper that collects publicly available banking product data:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
class OpenBankingScraper:
def __init__(self, api_key):
self.session = requests.Session()
self.base_url = "http://api.scraperapi.com"
self.api_key = api_key
def scrape_bank_products(self, bank_url):
params = {"api_key": self.api_key, "url": bank_url}
response = self.session.get(self.base_url, params=params)
soup = BeautifulSoup(response.text, "html.parser")
products = []
for card in soup.select(".product-card, .account-type"):
name = card.select_one("h2, h3")
rate = card.select_one(".rate, .apr")
fees = card.select_one(".fee, .monthly-cost")
if name:
products.append({
"name": name.text.strip(),
"rate": rate.text.strip() if rate else "N/A",
"fees": fees.text.strip() if fees else "N/A"
})
return products
def compare_rates(self, bank_urls):
all_products = []
for url in bank_urls:
products = self.scrape_bank_products(url)
all_products.extend(products)
time.sleep(2)
return pd.DataFrame(all_products)
# Usage
scraper = OpenBankingScraper("YOUR_SCRAPERAPI_KEY")
banks = [
"https://example-bank.com/savings-accounts",
"https://example-bank.com/checking-accounts"
]
df = scraper.compare_rates(banks)
print(df.to_string(index=False))
Handling Anti-Bot Protections
Banks invest heavily in bot detection. Using a proxy rotation service like ScraperAPI handles JavaScript rendering, CAPTCHA solving, and IP rotation automatically. For high-volume collection, ThorData provides residential proxies that mimic real user traffic patterns.
Building a Rate Comparison Pipeline
import schedule
import json
def daily_rate_check():
scraper = OpenBankingScraper("YOUR_KEY")
urls = load_bank_urls("banks.json")
df = scraper.compare_rates(urls)
# Detect rate changes
previous = pd.read_csv("previous_rates.csv")
changes = df.merge(previous, on="name", suffixes=("_new", "_old"))
changes = changes[changes["rate_new"] != changes["rate_old"]]
if not changes.empty:
send_alert(changes.to_dict("records"))
df.to_csv("previous_rates.csv", index=False)
schedule.every().day.at("08:00").do(daily_rate_check)
Monitoring with ScrapeOps
Track your scraper's health with ScrapeOps — monitor success rates, response times, and costs across all your banking data jobs.
Legal Considerations
Always scrape only publicly available data. PSD2 explicitly allows screen scraping as a fallback mechanism (Article 10 of the RTS), but respect robots.txt and terms of service. Use data for analysis and research, not unauthorized account access.
Conclusion
Combining PSD2 APIs with targeted scraping creates a powerful toolkit for fintech market research. Start with official APIs where available, fall back to scraping for gaps, and always prioritize data accuracy and compliance.
Top comments (0)