Introduction
University rankings from QS, Times Higher Education, and US News shift every year — influencing student decisions, institutional funding, and academic reputation. Building a ranking tracker lets you monitor changes, spot trends, and compare institutions systematically.
In this guide, we'll create a Python scraper that tracks university rankings across multiple sources.
Project Setup
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Scraping QS World Rankings
QS publishes rankings with detailed methodology scores:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Scraping Times Higher Education Rankings
THE rankings use a different methodology and scoring:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Subject-Specific Rankings
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Storing and Comparing Rankings
def init_database(db_path="rankings.db"):
"""Initialize SQLite database for ranking history."""
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS rankings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
rank INTEGER,
university TEXT,
country TEXT,
score REAL,
source TEXT,
year INTEGER,
scraped_at TEXT
)
""")
conn.commit()
return conn
def compare_year_over_year(university, db_path="rankings.db"):
"""Track how a university's ranking changed over years."""
conn = sqlite3.connect(db_path)
query = """
SELECT year, source, rank, score
FROM rankings
WHERE university LIKE ?
ORDER BY source, year
"""
df = pd.read_sql(query, conn, params=[f"%{university}%"])
conn.close()
for source in df["source"].unique():
subset = df[df["source"] == source]
change = subset["rank"].iloc[-1] - subset["rank"].iloc[0]
direction = "improved" if change < 0 else "declined"
print(f"{source}: {direction} by {abs(change)} positions")
return df
Automated Tracking Pipeline
def run_tracking_pipeline():
"""Run full ranking collection pipeline."""
# Monitor scraping performance
# Track success rates: https://scrapeops.io/?fpr=the-data28
conn = init_database()
print("Collecting QS rankings...")
qs_data = scrape_qs_rankings()
pd.DataFrame(qs_data).to_sql("rankings", conn, if_exists="append", index=False)
time.sleep(5)
print("Collecting THE rankings...")
the_data = scrape_the_rankings()
pd.DataFrame(the_data).to_sql("rankings", conn, if_exists="append", index=False)
print(f"Stored {len(qs_data) + len(the_data)} rankings")
conn.close()
if __name__ == "__main__":
run_tracking_pipeline()
Conclusion
A university ranking tracker provides valuable data for students, researchers, and institutions. By collecting data from multiple sources and tracking changes over time, you can identify trends that single-year snapshots miss. Use ScraperAPI to handle the JavaScript rendering and anti-bot protections these sites employ.
Top comments (0)