Finding scholarships is tedious — students spend hours browsing dozens of websites. What if you could automate that? In this tutorial, we'll build a Python scholarship finder that scrapes multiple scholarship databases and aggregates results.
The Problem
Scholarship information is scattered across hundreds of websites: Fastweb, Scholarships.com, university portals, and government databases. Each has different formats, search interfaces, and update schedules. A scraper can check them all in minutes.
Architecture Overview
Our scholarship finder will:
- Scrape multiple scholarship listing sites
- Extract key details (name, amount, deadline, eligibility)
- Filter by criteria (field of study, GPA, location)
- Store results in a structured format
- Send email alerts for new matches
Setting Up
pip install requests beautifulsoup4 pandas schedule
Building the Scraper
Let's start with a base scraper class that handles common patterns:
import requests
from bs4 import BeautifulSoup
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class Scholarship:
name: str
amount: str
deadline: Optional[str]
eligibility: str
url: str
source: str
class ScholarshipScraper:
def __init__(self, proxy_url=None):
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; ScholarshipBot/1.0)"
})
self.proxy_url = proxy_url
def fetch(self, url):
if self.proxy_url:
api_url = f"{self.proxy_url}&url={url}"
return self.session.get(api_url)
return self.session.get(url)
def parse_scholarships(self, html):
raise NotImplementedError
Scraping Scholarship Listings
class ScholarshipsComScraper(ScholarshipScraper):
BASE_URL = "https://www.scholarships.com/financial-aid/college-scholarships/scholarship-directory"
def search(self, category="computer-science"):
url = f"{self.BASE_URL}/{category}"
response = self.fetch(url)
soup = BeautifulSoup(response.text, "html.parser")
scholarships = []
listings = soup.find_all("div", class_="scholarship-item")
for listing in listings:
title_el = listing.find("h3") or listing.find("a", class_="title")
amount_el = listing.find("span", class_="amount")
deadline_el = listing.find("span", class_="deadline")
if title_el:
scholarships.append(Scholarship(
name=title_el.get_text(strip=True),
amount=amount_el.get_text(strip=True) if amount_el else "Varies",
deadline=deadline_el.get_text(strip=True) if deadline_el else None,
eligibility=category,
url=title_el.find("a")["href"] if title_el.find("a") else url,
source="scholarships.com"
))
return scholarships
Multi-Source Aggregation
The real power comes from combining multiple sources:
import time
class ScholarshipAggregator:
def __init__(self, scraper_api_key=None):
proxy = f"http://api.scraperapi.com?api_key={scraper_api_key}" if scraper_api_key else None
self.scrapers = [
ScholarshipsComScraper(proxy_url=proxy),
]
self.all_scholarships = []
def search_all(self, category):
for scraper in self.scrapers:
try:
results = scraper.search(category)
self.all_scholarships.extend(results)
time.sleep(2)
except Exception as e:
print(f"Error with {scraper.__class__.__name__}: {e}")
return self.all_scholarships
def filter_by_amount(self, min_amount=1000):
filtered = []
for s in self.all_scholarships:
try:
amount = int(s.amount.replace("$", "").replace(",", ""))
if amount >= min_amount:
filtered.append(s)
except ValueError:
filtered.append(s) # Keep "Varies" entries
return filtered
agg = ScholarshipAggregator(scraper_api_key="YOUR_KEY")
results = agg.search_all("computer-science")
print(f"Found {len(results)} scholarships")
high_value = agg.filter_by_amount(5000)
for s in high_value:
print(f"{s.name} - {s.amount} - Deadline: {s.deadline}")
Adding Email Alerts
import smtplib
from email.mime.text import MIMEText
import schedule
def send_alert(scholarships, recipient):
body = "New Scholarships Found:\n\n"
for s in scholarships:
body += f"- {s.name}: {s.amount} (Deadline: {s.deadline})\n {s.url}\n\n"
msg = MIMEText(body)
msg["Subject"] = f"{len(scholarships)} New Scholarships Found"
msg["From"] = "alerts@yourapp.com"
msg["To"] = recipient
with smtplib.SMTP("localhost", 587) as server:
server.send_message(msg)
def daily_check():
agg = ScholarshipAggregator()
results = agg.search_all("computer-science")
if results:
send_alert(results, "student@example.com")
schedule.every().day.at("08:00").do(daily_check)
Storing Results
import pandas as pd
import json
def save_results(scholarships, filename="scholarships"):
data = [vars(s) for s in scholarships]
df = pd.DataFrame(data)
df.to_csv(f"{filename}.csv", index=False)
with open(f"{filename}.json", "w") as f:
json.dump(data, f, indent=2)
print(f"Saved {len(scholarships)} scholarships")
Handling Anti-Bot Measures
Many scholarship sites use Cloudflare or similar protection. A proxy service like ScraperAPI handles JavaScript rendering, CAPTCHA solving, and IP rotation automatically. For residential proxies, ThorData offers geo-targeted IPs that blend in with regular traffic.
Scaling Tips
- Run on a schedule — check daily for new scholarships
- Deduplicate — use scholarship name + source as a unique key
- Track deadlines — remove expired scholarships automatically
- Monitor with ScrapeOps — use ScrapeOps to track success rates across your scrapers
- Cache aggressively — most scholarship pages update weekly at most
Conclusion
A scholarship finder is a practical project that combines web scraping fundamentals with real-world utility. Students can save hours of manual searching, and you'll learn patterns that apply to any aggregation project. The key is building modular scrapers that can adapt as sites change their HTML structure.
Happy scraping!
Top comments (0)