DEV Community

agenthustler
agenthustler

Posted on

Scraping Online Learning Outcomes: Course Completion and Reviews

Scraping Online Learning Outcomes: Course Completion and Reviews

The online education market is massive, but choosing the right course is a gamble. What if you could scrape actual completion rates, student reviews, and outcome data to make data-driven learning decisions?

What Data Matters

  • Student ratings and review text
  • Enrollment numbers and completion signals
  • Instructor track records
  • Skill outcomes mentioned in reviews

Setup

import requests
from bs4 import BeautifulSoup
import json
import re
from datetime import datetime
from collections import Counter

PROXY_URL = "https://api.scraperapi.com"
API_KEY = "YOUR_SCRAPERAPI_KEY"
Enter fullscreen mode Exit fullscreen mode

Education platforms use aggressive anti-scraping. ScraperAPI handles the JavaScript rendering and bot detection.

Scraping Udemy Course Data

def scrape_udemy_course(course_slug):
    params = {
        "api_key": API_KEY,
        "url": f"https://www.udemy.com/course/{course_slug}/",
        "render": "true"
    }
    response = requests.get(PROXY_URL, params=params)
    soup = BeautifulSoup(response.text, "html.parser")

    title = soup.select_one("h1")
    rating = soup.select_one("[data-purpose='rating-number']")
    students = soup.select_one("[data-purpose='enrollment']")

    return {
        "platform": "udemy",
        "title": title.text.strip() if title else "",
        "rating": float(rating.text.strip()) if rating else None,
        "students": parse_number(students.text if students else "0"),
        "url": f"https://www.udemy.com/course/{course_slug}/"
    }

def parse_number(text):
    nums = re.findall(r'[\d,]+', text)
    return int(nums[0].replace(",", "")) if nums else 0
Enter fullscreen mode Exit fullscreen mode

Analyzing Learning Outcomes from Reviews

OUTCOME_KEYWORDS = {
    "got_job": ["got a job", "hired", "landed a role", "career change"],
    "built_project": ["built", "created", "deployed", "launched"],
    "skill_gained": ["learned", "understand", "confident"],
    "disappointed": ["waste", "outdated", "poor quality", "refund"]
}

def analyze_outcomes(reviews):
    outcome_counts = Counter()
    for review in reviews:
        text = review["text"].lower()
        for outcome, keywords in OUTCOME_KEYWORDS.items():
            if any(kw in text for kw in keywords):
                outcome_counts[outcome] += 1

    total = len(reviews)
    return {
        "total_reviews": total,
        "outcomes": {
            k: {"count": v, "pct": round(v/total*100, 1)}
            for k, v in outcome_counts.items()
        }
    }
Enter fullscreen mode Exit fullscreen mode

Course Comparison

from statistics import mean

def compare_courses(course_slugs):
    comparisons = []
    for slug in course_slugs:
        course = scrape_udemy_course(slug)
        reviews = scrape_course_reviews(slug, pages=3)
        outcomes = analyze_outcomes(reviews)

        course["outcomes"] = outcomes
        course["value_score"] = calculate_value(course, outcomes)
        comparisons.append(course)
        import time; time.sleep(5)

    return sorted(comparisons, key=lambda x: x["value_score"], reverse=True)

def calculate_value(course, outcomes):
    score = (course["rating"] or 0) * 2
    positive = outcomes["outcomes"].get("got_job", {}).get("pct", 0)
    score += positive * 0.5
    disappointed = outcomes["outcomes"].get("disappointed", {}).get("pct", 0)
    score -= disappointed * 0.3
    return round(score, 2)
Enter fullscreen mode Exit fullscreen mode

Infrastructure

  • ScraperAPI — JavaScript rendering for dynamic course pages
  • ThorData — residential proxies to browse like a real student
  • ScrapeOps — track scraping performance across education platforms

Conclusion

Scraping learning outcomes transforms course selection from guesswork into data science. Compare on actual student results, not just marketing. This pipeline helps learners and EdTech companies make data-driven decisions.

Top comments (0)