How to Scrape LinkedIn Learning for Course Trend Analysis

#python #tutorial #webdev #programming

LinkedIn Learning hosts thousands of professional courses — tracking which topics trend over time reveals workforce skill shifts before they hit job postings. Here's how to build a course trend analyzer.

Why Track LinkedIn Learning Trends?

When a new course category explodes on LinkedIn Learning, hiring managers follow 3-6 months later. Tracking course releases and popularity gives you a leading indicator for tech hiring trends.

Setting Up the Scraper

We'll use Python with requests and BeautifulSoup to extract course metadata from LinkedIn Learning's public catalog pages.

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime

API_KEY = "YOUR_SCRAPERAPI_KEY"  # Get one at https://www.scraperapi.com?fp_ref=the52

def scrape_linkedin_learning(category, page=1):
    url = f"https://www.linkedin.com/learning/search?keywords={category}&page={page}"
    proxy_url = f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"

    response = requests.get(proxy_url, timeout=60)
    soup = BeautifulSoup(response.text, 'html.parser')

    courses = []
    for card in soup.select('.search-card'):
        title = card.select_one('.card-title')
        author = card.select_one('.card-author')
        duration = card.select_one('.card-duration')
        if title:
            courses.append({
                'title': title.text.strip(),
                'author': author.text.strip() if author else 'N/A',
                'duration': duration.text.strip() if duration else 'N/A',
                'category': category,
                'scraped_at': datetime.now().isoformat()
            })
    return courses

categories = ['python', 'generative-ai', 'cloud-computing', 'cybersecurity', 'data-engineering']
all_courses = []
for cat in categories:
    courses = scrape_linkedin_learning(cat)
    all_courses.extend(courses)
    print(f"Found {len(courses)} courses in {cat}")

df = pd.DataFrame(all_courses)
df.to_csv('linkedin_learning_trends.csv', index=False)

Analyzing Trends Over Time

Run this weekly and compare snapshots to detect emerging topics:

import pandas as pd
import matplotlib.pyplot as plt

current = pd.read_csv('linkedin_learning_trends.csv')
previous = pd.read_csv('linkedin_learning_trends_last_week.csv')

current_counts = current['category'].value_counts()
previous_counts = previous['category'].value_counts()

growth = ((current_counts - previous_counts) / previous_counts * 100).fillna(0)
growth.sort_values(ascending=False).plot(kind='bar', title='Weekly Course Growth by Category')
plt.ylabel('Growth %')
plt.tight_layout()
plt.savefig('trend_chart.png')
print(growth.sort_values(ascending=False))

Handling Anti-Bot Protection

LinkedIn uses aggressive rate limiting. A proxy rotation service like ScraperAPI handles JavaScript rendering and rotating IPs automatically. For higher volume scraping, ThorData provides residential proxies that avoid detection. You can monitor your scraper health with ScrapeOps.

Building a Dashboard

Store results in SQLite and serve with Flask for a simple trend dashboard:

import sqlite3

conn = sqlite3.connect('course_trends.db')
df.to_sql('courses', conn, if_exists='append', index=False)

# Query weekly trend
query = '''
SELECT category, COUNT(*) as count, DATE(scraped_at) as week
FROM courses GROUP BY category, week ORDER BY week DESC
'''
trends = pd.read_sql(query, conn)

Key Takeaways

LinkedIn Learning catalog changes reflect industry skill demand shifts
Weekly snapshots reveal trends 3-6 months before hiring data confirms them
Use proxy services to handle LinkedIn's bot detection reliably
Store historical data to build longitudinal trend analysis

This approach works for any learning platform — Coursera, Udemy, Pluralsight. The pattern is the same: scrape catalog, track changes, spot signals before the market catches on.