Sports analytics is booming. From fantasy leagues to professional scouting, access to comprehensive athlete stats across multiple sports creates valuable insights. Here's how to build a multi-sport stats scraper.
The Challenge
Sports data is scattered across dozens of sites, each with different structures. ESPN, official league sites, and stats aggregators all organize data differently. A unified scraper abstracts this complexity.
Multi-Sport Scraper
pip install requests beautifulsoup4 pandas lxml
import requests
from bs4 import BeautifulSoup
import pandas as pd
from abc import ABC, abstractmethod
class SportScraper(ABC):
def __init__(self, api_key):
self.api_key = api_key
self.session = requests.Session()
def fetch(self, url):
proxy = f"http://api.scraperapi.com?api_key={self.api_key}&url={url}"
return self.session.get(proxy, timeout=30)
@abstractmethod
def scrape_stats(self, url):
pass
class BasketballScraper(SportScraper):
def scrape_stats(self, url):
resp = self.fetch(url)
soup = BeautifulSoup(resp.text, "html.parser")
players = []
for row in soup.select("table.stats tbody tr"):
cols = row.select("td")
if len(cols) >= 8:
players.append({
"name": cols[0].text.strip(),
"team": cols[1].text.strip(),
"ppg": float(cols[2].text.strip() or 0),
"rpg": float(cols[3].text.strip() or 0),
"apg": float(cols[4].text.strip() or 0),
"sport": "basketball"
})
return players
class SoccerScraper(SportScraper):
def scrape_stats(self, url):
resp = self.fetch(url)
soup = BeautifulSoup(resp.text, "html.parser")
players = []
for row in soup.select("table.stats tbody tr"):
cols = row.select("td")
if len(cols) >= 6:
players.append({
"name": cols[0].text.strip(),
"team": cols[1].text.strip(),
"goals": int(cols[2].text.strip() or 0),
"assists": int(cols[3].text.strip() or 0),
"sport": "soccer"
})
return players
class MultiSportAggregator:
def __init__(self, api_key):
self.scrapers = {
"basketball": BasketballScraper(api_key),
"soccer": SoccerScraper(api_key),
}
def collect_all(self, urls_by_sport):
all_stats = []
for sport, urls in urls_by_sport.items():
scraper = self.scrapers.get(sport)
if scraper:
for url in urls:
stats = scraper.scrape_stats(url)
all_stats.extend(stats)
return pd.DataFrame(all_stats)
# Usage
agg = MultiSportAggregator("YOUR_SCRAPERAPI_KEY")
urls = {
"basketball": ["https://example.com/nba/stats"],
"soccer": ["https://example.com/epl/stats"]
}
df = agg.collect_all(urls)
print(df.groupby("sport").size())
Handling Anti-Scraping Measures
Sports sites are heavily protected. ScraperAPI handles JavaScript rendering and bot detection. ThorData residential proxies work well for geo-restricted league content. Monitor your scraper health with ScrapeOps.
Conclusion
A multi-sport stats aggregator turns scattered data into structured analytics. The abstract base class pattern makes adding new sports straightforward. Start with the leagues that matter most to your use case and expand from there.
Top comments (0)