DEV Community

Cover image for I Scraped 120 Years of Olympic History — and You Can Too
Santhosh
Santhosh

Posted on

I Scraped 120 Years of Olympic History — and You Can Too

I’ve always been fascinated by the Olympics.

The stories, the records, the triumphs… but when I went looking for a clean dataset of every athlete in history, I hit a wall.

Sure, there’s Olympedia.org — an incredible resource — but no “Download” button.

So I decided:

If the dataset doesn’t exist, I’ll build it myself.

The result? A Python scraper that can pull every athlete profile from 1896 to today — perfect for data analysis and visualization projects.


📌 What This Script Does

With one command, you get:
✅ Name, gender, height, weight
✅ Birth & death info (date, city, country)
✅ National Olympic Committee (NOC)
✅ Last Olympic Games and sport
✅ Medal counts (gold, silver, bronze)

Saved neatly in a CSV ready for Pandas or Excel.


📊 What You Can Do With It

This isn’t just about scraping.

Once you have the data, you can:

  • Visualize medal trends over decades
  • Explore which sports certain countries dominate
  • Study athlete physique trends (height/weight) over time
  • Map birthplaces of medalists with GeoPandas

⚡ How Fast Is It?

With 10 threads and a 0.4 second delay per request,
you can scrape thousands of athletes in under an hour — without hammering the site.


🚀 Quick Start

1️⃣ Clone the repo

git clone https://github.com/Wydoinn/Olympedia-Athlete-Scraper.git
cd Olympedia-Athlete-Scraper
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

2️⃣ Run the scraper

# Start fresh
python scraper.py --start 1 --concurrency 10 --delay 0.4 --csv olympedia.csv

# Or resume where you left off
python scraper.py --resume
Enter fullscreen mode Exit fullscreen mode

3️⃣ Open olympedia.csv and start exploring.


📂 The Data Format

Example row:

athlete_id,name,sex,height_cm,weight_kg,born_date,died_date,
born_city,born_region,born_country,died_city,died_region,died_country,
noc,games,year,sport,gold_medal,silver_medal,bronze_medal
19,Maurice Germot,M,178,68,1882-11-15,1958-01-06,
Vichy,Allier,FRA,Vichy,Allier,FRA,FRA,
Stockholm 1912,1912,Tennis,0,2,0
Enter fullscreen mode Exit fullscreen mode

🧠 How It Works (in 20 Seconds)

  • Multi-threaded with ThreadPoolExecutor
  • Resumable with a progress.json checkpoint
  • Auto-stops after 1000 consecutive missing IDs
  • Parses HTML using BeautifulSoup
  • Writes CSV as it runs (so you can peek mid-scrape)

🧹 A Note on Responsible Scraping

Please be respectful:

  • Keep a delay between requests
  • Don’t flood the server
  • Always credit the source (Olympedia)

💬 What would you want to analyze first?
Drop a comment and let’s brainstorm!


Top comments (1)

Collapse
 
jimross412 profile image
jim ross

awesome