DEV Community

agenthustler
agenthustler

Posted on

How to Build a Scholarship Finder with Web Scraping

Finding scholarships is tedious — students spend hours browsing dozens of websites. What if you could automate that? In this tutorial, we'll build a Python scholarship finder that scrapes multiple scholarship databases and aggregates results.

The Problem

Scholarship information is scattered across hundreds of websites: Fastweb, Scholarships.com, university portals, and government databases. Each has different formats, search interfaces, and update schedules. A scraper can check them all in minutes.

Architecture Overview

Our scholarship finder will:

  1. Scrape multiple scholarship listing sites
  2. Extract key details (name, amount, deadline, eligibility)
  3. Filter by criteria (field of study, GPA, location)
  4. Store results in a structured format
  5. Send email alerts for new matches

Setting Up

pip install requests beautifulsoup4 pandas schedule
Enter fullscreen mode Exit fullscreen mode

Building the Scraper

Let's start with a base scraper class that handles common patterns:

import requests
from bs4 import BeautifulSoup
from dataclasses import dataclass
from datetime import datetime
from typing import Optional

@dataclass
class Scholarship:
    name: str
    amount: str
    deadline: Optional[str]
    eligibility: str
    url: str
    source: str

class ScholarshipScraper:
    def __init__(self, proxy_url=None):
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (compatible; ScholarshipBot/1.0)"
        })
        self.proxy_url = proxy_url

    def fetch(self, url):
        if self.proxy_url:
            api_url = f"{self.proxy_url}&url={url}"
            return self.session.get(api_url)
        return self.session.get(url)

    def parse_scholarships(self, html):
        raise NotImplementedError
Enter fullscreen mode Exit fullscreen mode

Scraping Scholarship Listings

class ScholarshipsComScraper(ScholarshipScraper):
    BASE_URL = "https://www.scholarships.com/financial-aid/college-scholarships/scholarship-directory"

    def search(self, category="computer-science"):
        url = f"{self.BASE_URL}/{category}"
        response = self.fetch(url)
        soup = BeautifulSoup(response.text, "html.parser")

        scholarships = []
        listings = soup.find_all("div", class_="scholarship-item")

        for listing in listings:
            title_el = listing.find("h3") or listing.find("a", class_="title")
            amount_el = listing.find("span", class_="amount")
            deadline_el = listing.find("span", class_="deadline")

            if title_el:
                scholarships.append(Scholarship(
                    name=title_el.get_text(strip=True),
                    amount=amount_el.get_text(strip=True) if amount_el else "Varies",
                    deadline=deadline_el.get_text(strip=True) if deadline_el else None,
                    eligibility=category,
                    url=title_el.find("a")["href"] if title_el.find("a") else url,
                    source="scholarships.com"
                ))
        return scholarships
Enter fullscreen mode Exit fullscreen mode

Multi-Source Aggregation

The real power comes from combining multiple sources:

import time

class ScholarshipAggregator:
    def __init__(self, scraper_api_key=None):
        proxy = f"http://api.scraperapi.com?api_key={scraper_api_key}" if scraper_api_key else None
        self.scrapers = [
            ScholarshipsComScraper(proxy_url=proxy),
        ]
        self.all_scholarships = []

    def search_all(self, category):
        for scraper in self.scrapers:
            try:
                results = scraper.search(category)
                self.all_scholarships.extend(results)
                time.sleep(2)
            except Exception as e:
                print(f"Error with {scraper.__class__.__name__}: {e}")
        return self.all_scholarships

    def filter_by_amount(self, min_amount=1000):
        filtered = []
        for s in self.all_scholarships:
            try:
                amount = int(s.amount.replace("$", "").replace(",", ""))
                if amount >= min_amount:
                    filtered.append(s)
            except ValueError:
                filtered.append(s)  # Keep "Varies" entries
        return filtered

agg = ScholarshipAggregator(scraper_api_key="YOUR_KEY")
results = agg.search_all("computer-science")
print(f"Found {len(results)} scholarships")

high_value = agg.filter_by_amount(5000)
for s in high_value:
    print(f"{s.name} - {s.amount} - Deadline: {s.deadline}")
Enter fullscreen mode Exit fullscreen mode

Adding Email Alerts

import smtplib
from email.mime.text import MIMEText
import schedule

def send_alert(scholarships, recipient):
    body = "New Scholarships Found:\n\n"
    for s in scholarships:
        body += f"- {s.name}: {s.amount} (Deadline: {s.deadline})\n  {s.url}\n\n"

    msg = MIMEText(body)
    msg["Subject"] = f"{len(scholarships)} New Scholarships Found"
    msg["From"] = "alerts@yourapp.com"
    msg["To"] = recipient

    with smtplib.SMTP("localhost", 587) as server:
        server.send_message(msg)

def daily_check():
    agg = ScholarshipAggregator()
    results = agg.search_all("computer-science")
    if results:
        send_alert(results, "student@example.com")

schedule.every().day.at("08:00").do(daily_check)
Enter fullscreen mode Exit fullscreen mode

Storing Results

import pandas as pd
import json

def save_results(scholarships, filename="scholarships"):
    data = [vars(s) for s in scholarships]
    df = pd.DataFrame(data)
    df.to_csv(f"{filename}.csv", index=False)

    with open(f"{filename}.json", "w") as f:
        json.dump(data, f, indent=2)

    print(f"Saved {len(scholarships)} scholarships")
Enter fullscreen mode Exit fullscreen mode

Handling Anti-Bot Measures

Many scholarship sites use Cloudflare or similar protection. A proxy service like ScraperAPI handles JavaScript rendering, CAPTCHA solving, and IP rotation automatically. For residential proxies, ThorData offers geo-targeted IPs that blend in with regular traffic.

Scaling Tips

  1. Run on a schedule — check daily for new scholarships
  2. Deduplicate — use scholarship name + source as a unique key
  3. Track deadlines — remove expired scholarships automatically
  4. Monitor with ScrapeOps — use ScrapeOps to track success rates across your scrapers
  5. Cache aggressively — most scholarship pages update weekly at most

Conclusion

A scholarship finder is a practical project that combines web scraping fundamentals with real-world utility. Students can save hours of manual searching, and you'll learn patterns that apply to any aggregation project. The key is building modular scrapers that can adapt as sites change their HTML structure.

Happy scraping!

Top comments (0)