I Scraped 120 Years of Olympic History — and You Can Too

#python #programming #beginners #opensource

I’ve always been fascinated by the Olympics.

The stories, the records, the triumphs… but when I went looking for a clean dataset of every athlete in history, I hit a wall.

Sure, there’s Olympedia.org — an incredible resource — but no “Download” button.

So I decided:

If the dataset doesn’t exist, I’ll build it myself.

The result? A Python scraper that can pull every athlete profile from 1896 to today — perfect for data analysis and visualization projects.

📌 What This Script Does

With one command, you get:
✅ Name, gender, height, weight
✅ Birth & death info (date, city, country)
✅ National Olympic Committee (NOC)
✅ Last Olympic Games and sport
✅ Medal counts (gold, silver, bronze)

Saved neatly in a CSV ready for Pandas or Excel.

📊 What You Can Do With It

This isn’t just about scraping.

Once you have the data, you can:

Visualize medal trends over decades
Explore which sports certain countries dominate
Study athlete physique trends (height/weight) over time
Map birthplaces of medalists with GeoPandas

⚡ How Fast Is It?

With 10 threads and a 0.4 second delay per request,
you can scrape thousands of athletes in under an hour — without hammering the site.

🚀 Quick Start

1️⃣ Clone the repo

git clone https://github.com/Wydoinn/Olympedia-Athlete-Scraper.git
cd Olympedia-Athlete-Scraper
pip install -r requirements.txt

2️⃣ Run the scraper

# Start fresh
python scraper.py --start 1 --concurrency 10 --delay 0.4 --csv olympedia.csv

# Or resume where you left off
python scraper.py --resume

3️⃣ Open olympedia.csv and start exploring.

📂 The Data Format

Example row:

athlete_id,name,sex,height_cm,weight_kg,born_date,died_date,
born_city,born_region,born_country,died_city,died_region,died_country,
noc,games,year,sport,gold_medal,silver_medal,bronze_medal
19,Maurice Germot,M,178,68,1882-11-15,1958-01-06,
Vichy,Allier,FRA,Vichy,Allier,FRA,FRA,
Stockholm 1912,1912,Tennis,0,2,0