Salary data and company reviews are among the most sought-after datasets in the job market. Glassdoor has built an empire on crowdsourced compensation data, company reviews, interview experiences, and CEO approval ratings. For HR professionals, recruiters, compensation analysts, and job seekers, having programmatic access to this data is transformative.
In this guide, we'll walk through how to extract salary reports, company reviews, interview data, and CEO ratings from Glassdoor using web scraping techniques and the Apify cloud platform.
Why Glassdoor Data Matters
Glassdoor hosts one of the largest collections of employee-contributed workplace data:
- Salary Reports: Over 100 million salary reports across every industry and role
- Company Reviews: Detailed employee reviews with pros, cons, and ratings across multiple dimensions
- Interview Experiences: Step-by-step interview process descriptions with difficulty ratings
- CEO Approval Ratings: Leadership ratings that correlate with company performance
- Benefits Reviews: Detailed breakdowns of company benefits packages
This data powers critical business decisions:
- HR/Compensation teams benchmark salaries against market rates
- Recruiters understand candidate expectations before making offers
- Job seekers negotiate from a position of knowledge
- Investors use employee sentiment as a leading indicator
- Researchers study labor market dynamics and workplace trends
Understanding Glassdoor's Data Structure
Glassdoor organizes data around companies, with each company having multiple data sections.
Company Overview
Each company profile at glassdoor.com/Overview/company-overview-{id}.htm contains:
Company Name
Overall Rating (1-5 stars)
Number of Reviews
CEO Name and Approval Rating
Recommend to Friend %
Industry
Company Size (employees)
Revenue Range
Headquarters Location
Founded Year
Company Type (Public, Private, etc.)
Website
Competitors
Salary Data
Salary reports at glassdoor.com/Salary/company-salaries-{id}.htm include:
Job Title
Base Pay (range: low, average, high)
Additional Pay (bonuses, stock, tips)
Total Pay Range
Number of Salaries Reported
Pay by Experience Level
Pay by Location
Pay Trend (year over year)
Company Reviews
Reviews at glassdoor.com/Reviews/company-reviews-{id}.htm contain:
Overall Rating (1-5)
Title/Summary
Pros (text)
Cons (text)
Advice to Management (text)
Rating Breakdown:
- Work/Life Balance
- Culture & Values
- Diversity & Inclusion
- Career Opportunities
- Compensation & Benefits
- Senior Management
Employment Status (Current/Former)
Job Title
Location
Date Posted
Helpful Count
Interview Data
Interview reviews at glassdoor.com/Interview/company-interview-{id}.htm:
Job Title Applied For
Application Method
Interview Experience (Positive/Neutral/Negative)
Interview Difficulty (1-5)
Offer Status (Accepted/Declined/No Offer)
Interview Questions
Interview Process Description
Date of Interview
Technical Approach
Glassdoor is a JavaScript-heavy application with sophisticated anti-bot measures. A browser-based approach with proper session management is essential.
Project Setup
mkdir glassdoor-scraper
cd glassdoor-scraper
npm init -y
npm install crawlee puppeteer apify
Salary Data Scraper
Here's a comprehensive scraper for extracting salary information:
import { PuppeteerCrawler, Dataset } from 'crawlee';
const crawler = new PuppeteerCrawler({
maxConcurrency: 1,
maxRequestsPerMinute: 10,
navigationTimeoutSecs: 60,
async requestHandler({ page, request, log }) {
const { label } = request.userData;
if (label === 'SALARY_LIST') {
log.info('Scraping salary page: ' + request.url);
await page.waitForSelector('[data-test="salaries-list"]', { timeout: 20000 });
const salaries = await page.evaluate(() => {
const rows = document.querySelectorAll('[data-test="salary-row"]');
return Array.from(rows).map(row => {
const jobTitle = row.querySelector('[data-test="salary-job-title"]')?.textContent?.trim();
const basePay = row.querySelector('[data-test="base-pay-amount"]')?.textContent?.trim();
const additionalPay = row.querySelector('[data-test="additional-pay"]')?.textContent?.trim();
const totalPay = row.querySelector('[data-test="total-pay-amount"]')?.textContent?.trim();
const numSalaries = row.querySelector('[data-test="num-salaries"]')?.textContent?.trim();
const payRange = row.querySelector('[data-test="pay-range"]')?.textContent?.trim();
const detailUrl = row.querySelector('a[data-test="salary-detail-link"]')?.href;
return { jobTitle, basePay, additionalPay, totalPay, numSalaries, payRange, detailUrl };
});
});
const companyName = await page.$eval(
'[data-test="employer-name"]',
el => el.textContent?.trim()
).catch(() => 'Unknown');
for (const salary of salaries) {
await Dataset.pushData({
...salary,
companyName,
sourceUrl: request.url,
scrapedAt: new Date().toISOString()
});
}
const nextPage = await page.$('[data-test="pagination-next"]:not([disabled])');
if (nextPage) {
const nextUrl = await nextPage.evaluate(el => el.href);
await crawler.addRequests([{
url: nextUrl,
userData: { label: 'SALARY_LIST' }
}]);
}
log.info('Extracted ' + salaries.length + ' salary entries for ' + companyName);
}
}
});
await crawler.run([{
url: 'https://www.glassdoor.com/Salary/Google-Salaries-E9079.htm',
userData: { label: 'SALARY_LIST' }
}]);
Company Reviews Scraper
Extracting detailed company reviews with rating breakdowns:
import { PuppeteerCrawler, Dataset } from 'crawlee';
const reviewsCrawler = new PuppeteerCrawler({
maxConcurrency: 1,
maxRequestsPerMinute: 8,
async requestHandler({ page, request, log }) {
log.info('Scraping reviews: ' + request.url);
await page.waitForSelector('[data-test="review-list"]', { timeout: 20000 });
const companyRatings = await page.evaluate(() => {
const ratingSection = document.querySelector('[data-test="rating-info"]');
if (!ratingSection) return {};
return {
overallRating: ratingSection.querySelector('[data-test="rating-headline"]')?.textContent?.trim(),
recommendPercent: ratingSection.querySelector('[data-test="recommend-pct"]')?.textContent?.trim(),
ceoApproval: ratingSection.querySelector('[data-test="ceo-approval"]')?.textContent?.trim(),
ceoName: ratingSection.querySelector('[data-test="ceo-name"]')?.textContent?.trim(),
};
});
const reviews = await page.evaluate(() => {
const reviewElements = document.querySelectorAll('[data-test="review-list-item"]');
return Array.from(reviewElements).map(review => {
const subRatings = {};
const ratingBars = review.querySelectorAll('[class*="subRating"]');
ratingBars.forEach(bar => {
const label = bar.querySelector('[class*="ratingLabel"]')?.textContent?.trim();
const value = bar.querySelector('[class*="ratingValue"]')?.textContent?.trim();
if (label && value) subRatings[label] = parseFloat(value);
});
return {
rating: review.querySelector('[class*="ratingNumber"]')?.textContent?.trim(),
title: review.querySelector('[data-test="review-title"]')?.textContent?.trim(),
pros: review.querySelector('[data-test="review-pros"]')?.textContent?.trim(),
cons: review.querySelector('[data-test="review-cons"]')?.textContent?.trim(),
advice: review.querySelector('[data-test="review-advice"]')?.textContent?.trim(),
employeeStatus: review.querySelector('[data-test="employee-status"]')?.textContent?.trim(),
jobTitle: review.querySelector('[data-test="review-job-title"]')?.textContent?.trim(),
location: review.querySelector('[data-test="review-location"]')?.textContent?.trim(),
date: review.querySelector('[data-test="review-date"]')?.textContent?.trim(),
helpfulCount: review.querySelector('[data-test="helpful-count"]')?.textContent?.trim(),
subRatings
};
});
});
for (const review of reviews) {
await Dataset.pushData({
...review,
companyRatings,
sourceUrl: request.url,
scrapedAt: new Date().toISOString()
});
}
log.info('Extracted ' + reviews.length + ' reviews');
const nextBtn = await page.$('[data-test="pagination-next"]:not([disabled])');
if (nextBtn) {
const nextUrl = await nextBtn.evaluate(el => el.href);
await reviewsCrawler.addRequests([{
url: nextUrl,
userData: { label: 'REVIEWS' }
}]);
}
}
});
await reviewsCrawler.run([
'https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm'
]);
Interview Experience Scraper
import { PuppeteerCrawler, Dataset } from 'crawlee';
const interviewCrawler = new PuppeteerCrawler({
maxConcurrency: 1,
maxRequestsPerMinute: 8,
async requestHandler({ page, request, log }) {
log.info('Scraping interviews: ' + request.url);
await page.waitForSelector('[data-test="interview-list"]', { timeout: 20000 });
const interviewStats = await page.evaluate(() => {
return {
experienceBreakdown: {
positive: document.querySelector('[data-test="positive-pct"]')?.textContent?.trim(),
neutral: document.querySelector('[data-test="neutral-pct"]')?.textContent?.trim(),
negative: document.querySelector('[data-test="negative-pct"]')?.textContent?.trim(),
},
averageDifficulty: document.querySelector('[data-test="avg-difficulty"]')?.textContent?.trim(),
applicationSources: Array.from(
document.querySelectorAll('[data-test="app-source"]')
).map(el => ({
source: el.querySelector('.source-name')?.textContent?.trim(),
percentage: el.querySelector('.source-pct')?.textContent?.trim()
}))
};
});
const interviews = await page.evaluate(() => {
return Array.from(
document.querySelectorAll('[data-test="interview-item"]')
).map(item => ({
jobTitle: item.querySelector('[data-test="interview-job-title"]')?.textContent?.trim(),
date: item.querySelector('[data-test="interview-date"]')?.textContent?.trim(),
experience: item.querySelector('[data-test="interview-experience"]')?.textContent?.trim(),
difficulty: item.querySelector('[data-test="difficulty-rating"]')?.textContent?.trim(),
offer: item.querySelector('[data-test="offer-status"]')?.textContent?.trim(),
applicationMethod: item.querySelector('[data-test="app-method"]')?.textContent?.trim(),
process: item.querySelector('[data-test="interview-process"]')?.textContent?.trim(),
questions: Array.from(
item.querySelectorAll('[data-test="interview-question"]')
).map(q => q.textContent?.trim()),
helpfulCount: item.querySelector('[data-test="helpful-count"]')?.textContent?.trim()
}));
});
for (const interview of interviews) {
await Dataset.pushData({
...interview,
interviewStats,
sourceUrl: request.url,
scrapedAt: new Date().toISOString()
});
}
log.info('Extracted ' + interviews.length + ' interview reviews');
}
});
await interviewCrawler.run([
'https://www.glassdoor.com/Interview/Google-Interview-Questions-E9079.htm'
]);
Deploying as an Apify Actor
Here's a complete Apify Actor that combines all scrapers with configurable input:
import { Actor } from 'apify';
import { PuppeteerCrawler, Dataset } from 'crawlee';
await Actor.init();
const input = await Actor.getInput() ?? {};
const {
companyUrl = '',
dataTypes = ['salaries', 'reviews', 'interviews'],
maxPages = 10,
} = input;
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'US',
});
const startUrls = [];
if (dataTypes.includes('salaries') && companyUrl) {
const salaryUrl = companyUrl.replace('/Overview/', '/Salary/').replace('-overview-', '-salaries-');
startUrls.push({ url: salaryUrl, userData: { label: 'SALARIES', page: 1 } });
}
if (dataTypes.includes('reviews') && companyUrl) {
const reviewUrl = companyUrl.replace('/Overview/', '/Reviews/').replace('-overview-', '-reviews-');
startUrls.push({ url: reviewUrl, userData: { label: 'REVIEWS', page: 1 } });
}
if (dataTypes.includes('interviews') && companyUrl) {
const interviewUrl = companyUrl.replace('/Overview/', '/Interview/').replace('-overview-', '-interview-');
startUrls.push({ url: interviewUrl, userData: { label: 'INTERVIEWS', page: 1 } });
}
const crawler = new PuppeteerCrawler({
proxyConfiguration,
maxConcurrency: 1,
maxRequestsPerMinute: 8,
navigationTimeoutSecs: 60,
launchContext: {
launchOptions: {
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
}
},
async requestHandler({ page, request, log }) {
const { label, page: pageNum } = request.userData;
log.info('Processing ' + label + ' page ' + pageNum + ': ' + request.url);
switch (label) {
case 'SALARIES':
await scrapeSalaries(page, request, log, maxPages);
break;
case 'REVIEWS':
await scrapeReviews(page, request, log, maxPages);
break;
case 'INTERVIEWS':
await scrapeInterviews(page, request, log, maxPages);
break;
}
},
failedRequestHandler({ request, log }) {
log.error('Failed: ' + request.url + ' - ' + request.errorMessages);
}
});
async function scrapeSalaries(page, request, log, maxPages) {
await page.waitForSelector('[data-test="salaries-list"]', { timeout: 20000 });
const salaries = await page.evaluate(() => {
return Array.from(document.querySelectorAll('[data-test="salary-row"]')).map(row => ({
type: 'salary',
jobTitle: row.querySelector('[data-test="salary-job-title"]')?.textContent?.trim(),
basePay: row.querySelector('[data-test="base-pay-amount"]')?.textContent?.trim(),
totalPay: row.querySelector('[data-test="total-pay-amount"]')?.textContent?.trim(),
numReports: row.querySelector('[data-test="num-salaries"]')?.textContent?.trim(),
}));
});
await Dataset.pushData(salaries.map(s => ({
...s, sourceUrl: request.url, scrapedAt: new Date().toISOString()
})));
log.info('Got ' + salaries.length + ' salary entries');
}
async function scrapeReviews(page, request, log, maxPages) {
await page.waitForSelector('[data-test="review-list"]', { timeout: 20000 });
const reviews = await page.evaluate(() => {
return Array.from(document.querySelectorAll('[data-test="review-list-item"]')).map(r => ({
type: 'review',
rating: r.querySelector('[class*="ratingNumber"]')?.textContent?.trim(),
title: r.querySelector('[data-test="review-title"]')?.textContent?.trim(),
pros: r.querySelector('[data-test="review-pros"]')?.textContent?.trim(),
cons: r.querySelector('[data-test="review-cons"]')?.textContent?.trim(),
jobTitle: r.querySelector('[data-test="review-job-title"]')?.textContent?.trim(),
date: r.querySelector('[data-test="review-date"]')?.textContent?.trim(),
}));
});
await Dataset.pushData(reviews.map(r => ({
...r, sourceUrl: request.url, scrapedAt: new Date().toISOString()
})));
log.info('Got ' + reviews.length + ' reviews');
}
async function scrapeInterviews(page, request, log, maxPages) {
await page.waitForSelector('[data-test="interview-list"]', { timeout: 20000 });
const interviews = await page.evaluate(() => {
return Array.from(document.querySelectorAll('[data-test="interview-item"]')).map(item => ({
type: 'interview',
jobTitle: item.querySelector('[data-test="interview-job-title"]')?.textContent?.trim(),
experience: item.querySelector('[data-test="interview-experience"]')?.textContent?.trim(),
difficulty: item.querySelector('[data-test="difficulty-rating"]')?.textContent?.trim(),
offer: item.querySelector('[data-test="offer-status"]')?.textContent?.trim(),
process: item.querySelector('[data-test="interview-process"]')?.textContent?.trim(),
questions: Array.from(
item.querySelectorAll('[data-test="interview-question"]')
).map(q => q.textContent?.trim()),
date: item.querySelector('[data-test="interview-date"]')?.textContent?.trim(),
}));
});
await Dataset.pushData(interviews.map(i => ({
...i, sourceUrl: request.url, scrapedAt: new Date().toISOString()
})));
log.info('Got ' + interviews.length + ' interviews');
}
await crawler.run(startUrls);
await Actor.exit();
Data Analysis Examples
Once you've collected the data, here are practical analyses you can perform:
Salary Benchmarking
function benchmarkSalaries(salaryData, targetRole) {
const roleData = salaryData.filter(s =>
s.jobTitle.toLowerCase().includes(targetRole.toLowerCase())
);
if (roleData.length === 0) return null;
const basePays = roleData
.map(s => parseCurrency(s.basePay))
.filter(v => v > 0)
.sort((a, b) => a - b);
return {
role: targetRole,
sampleSize: basePays.length,
percentile25: basePays[Math.floor(basePays.length * 0.25)],
median: basePays[Math.floor(basePays.length * 0.5)],
percentile75: basePays[Math.floor(basePays.length * 0.75)],
average: basePays.reduce((a, b) => a + b, 0) / basePays.length,
};
}
function parseCurrency(str) {
if (!str) return 0;
return parseInt(str.replace(/[$,K]/gi, '')) * (str.toLowerCase().includes('k') ? 1000 : 1);
}
Sentiment Analysis on Reviews
function analyzeReviewSentiment(reviews) {
const ratings = reviews.map(r => parseFloat(r.rating));
const avgRating = ratings.reduce((a, b) => a + b, 0) / ratings.length;
const trends = {};
reviews.forEach(review => {
const month = review.date?.substring(0, 7);
if (month) {
if (!trends[month]) trends[month] = { sum: 0, count: 0 };
trends[month].sum += parseFloat(review.rating);
trends[month].count++;
}
});
const monthlyAvg = Object.entries(trends)
.sort(([a], [b]) => a.localeCompare(b))
.map(([month, data]) => ({
month,
averageRating: (data.sum / data.count).toFixed(2),
reviewCount: data.count
}));
const prosWords = extractKeyPhrases(reviews.map(r => r.pros).filter(Boolean));
const consWords = extractKeyPhrases(reviews.map(r => r.cons).filter(Boolean));
return {
averageRating: avgRating.toFixed(2),
totalReviews: reviews.length,
monthlyTrend: monthlyAvg,
topProsThemes: prosWords.slice(0, 10),
topConsThemes: consWords.slice(0, 10)
};
}
function extractKeyPhrases(texts) {
const wordCount = {};
const stopWords = new Set([
'the', 'a', 'an', 'is', 'are', 'was', 'and', 'or', 'but',
'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'very',
'that', 'this', 'it', 'not', 'no', 'can', 'you', 'your',
'they', 'we', 'i'
]);
texts.forEach(text => {
const words = text.toLowerCase()
.replace(/[^a-z\s]/g, '')
.split(/\s+/)
.filter(w => w.length > 3 && !stopWords.has(w));
words.forEach(w => { wordCount[w] = (wordCount[w] || 0) + 1; });
});
return Object.entries(wordCount)
.sort(([, a], [, b]) => b - a)
.map(([word, count]) => ({ word, count }));
}
Best Practices and Ethical Guidelines
Rate Limiting is Critical
Glassdoor actively monitors for automated access. Always implement conservative rate limits:
const crawler = new PuppeteerCrawler({
maxConcurrency: 1,
maxRequestsPerMinute: 8,
requestHandlerTimeoutSecs: 120,
navigationTimeoutSecs: 60,
});
Session Management
Glassdoor requires login for some data. Use persistent sessions:
async function setupSession(page) {
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
);
await page.waitForTimeout(2000 + Math.random() * 3000);
}
Proxy Rotation
Using residential proxies is highly recommended for Glassdoor:
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'US',
});
Data Privacy Compliance
- Never scrape individual user profiles or personal information
- Aggregate salary data rather than storing individual reports
- Comply with GDPR if processing data from EU users
- Use the data for legitimate research and benchmarking purposes
- Do not republish raw review text without proper attribution
Handling CAPTCHAs
Glassdoor may present CAPTCHAs during heavy scraping. Strategies to minimize this:
- Low request rates: Stay under 10 requests per minute
- Residential proxies: Appear as regular users
- Session persistence: Reuse browser sessions
- Human-like behavior: Add random delays and mouse movements
Real-World Applications
For Compensation Teams
Build a dashboard that tracks salary trends for your key roles across competitors. Update weekly to stay ahead of market movements. This helps ensure your offers are competitive without overpaying.
For Recruiters
Aggregate review sentiment data to pitch candidates on culture. When a candidate mentions they care about work-life balance, having data showing your client company rates 4.2/5.0 on that dimension is powerful.
For Due Diligence
Investors increasingly use employee sentiment as a signal. A company with declining review scores and increasing cons around leadership may be heading for trouble, even if revenue looks healthy.
For Job Seekers
Build a personalized dashboard that tracks companies you're interested in. Monitor for new salary reports in your role, read recent interview experiences, and track CEO approval trends over time.
Conclusion
Glassdoor contains some of the most valuable workplace data on the internet. By using Puppeteer-based scraping with Crawlee and deploying on Apify's infrastructure, you can build reliable data pipelines that power salary benchmarking, sentiment analysis, and competitive intelligence.
Remember to always scrape responsibly: respect rate limits, use the data ethically, and comply with privacy regulations. The goal is to build sustainable data collection workflows that provide long-term value, not to overwhelm servers with aggressive scraping.
Start with a single company and data type, validate your selectors, and scale gradually. The code examples in this guide give you a solid foundation to build upon for your specific use case. Whether you're benchmarking compensation packages, monitoring company sentiment, or preparing for your next interview, programmatic access to Glassdoor data gives you an information advantage that manual browsing simply cannot match.
Top comments (0)