How to Build a University Ranking Tracker with Web Scraping

#webdev #programming #python #tutorial

Introduction

University rankings from QS, Times Higher Education, and US News shift every year — influencing student decisions, institutional funding, and academic reputation. Building a ranking tracker lets you monitor changes, spot trends, and compare institutions systematically.

In this guide, we'll create a Python scraper that tracks university rankings across multiple sources.

Project Setup

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping QS World Rankings

QS publishes rankings with detailed methodology scores:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping Times Higher Education Rankings

THE rankings use a different methodology and scoring:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Subject-Specific Rankings

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Storing and Comparing Rankings

def init_database(db_path="rankings.db"):
    """Initialize SQLite database for ranking history."""
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS rankings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            rank INTEGER,
            university TEXT,
            country TEXT,
            score REAL,
            source TEXT,
            year INTEGER,
            scraped_at TEXT
        )
    """)
    conn.commit()
    return conn

def compare_year_over_year(university, db_path="rankings.db"):
    """Track how a university's ranking changed over years."""
    conn = sqlite3.connect(db_path)
    query = """
        SELECT year, source, rank, score
        FROM rankings
        WHERE university LIKE ?
        ORDER BY source, year
    """
    df = pd.read_sql(query, conn, params=[f"%{university}%"])
    conn.close()

    for source in df["source"].unique():
        subset = df[df["source"] == source]
        change = subset["rank"].iloc[-1] - subset["rank"].iloc[0]
        direction = "improved" if change < 0 else "declined"
        print(f"{source}: {direction} by {abs(change)} positions")

    return df

Automated Tracking Pipeline

def run_tracking_pipeline():
    """Run full ranking collection pipeline."""
    # Monitor scraping performance
    # Track success rates: https://scrapeops.io/?fpr=the-data28

    conn = init_database()

    print("Collecting QS rankings...")
    qs_data = scrape_qs_rankings()
    pd.DataFrame(qs_data).to_sql("rankings", conn, if_exists="append", index=False)
    time.sleep(5)

    print("Collecting THE rankings...")
    the_data = scrape_the_rankings()
    pd.DataFrame(the_data).to_sql("rankings", conn, if_exists="append", index=False)

    print(f"Stored {len(qs_data) + len(the_data)} rankings")
    conn.close()

if __name__ == "__main__":
    run_tracking_pipeline()

Conclusion

A university ranking tracker provides valuable data for students, researchers, and institutions. By collecting data from multiple sources and tracking changes over time, you can identify trends that single-year snapshots miss. Use ScraperAPI to handle the JavaScript rendering and anti-bot protections these sites employ.