Web scraping job boards has become an essential skill for data analysts, recruiters, and job seekers looking to gain a competitive edge in the employment market. Monster.com, one of the pioneering online job platforms founded in 1999, remains a significant source of employment data with millions of job listings across industries and geographies.
In this comprehensive guide, we'll explore how to extract job listings, salary information, and career data from Monster.com using modern scraping techniques and Apify's cloud platform.
Understanding Monster.com's Structure
Monster.com organizes its content around several key data types that are valuable for extraction:
Job Listings
The core of Monster's platform consists of job postings that include:
- Job title and description
- Company name and logo
- Location (city, state, country)
- Salary range (when disclosed)
- Job type (full-time, part-time, contract, temporary)
- Experience level required
- Posted date and application deadline
- Required skills and qualifications
Company Profiles
Monster maintains employer profiles containing:
- Company overview and description
- Industry classification
- Company size and revenue range
- Headquarters location
- Benefits and culture information
- Active job listings count
Career Resources
Monster also provides career advice content, salary tools, and resume resources that contain structured data about:
- Industry salary benchmarks
- Career path progressions
- Skills demand trends
Setting Up Your Scraping Environment
Before diving into the actual scraping, let's set up a proper development environment. We'll use Node.js with Crawlee, which is Apify's open-source web scraping library.
// package.json dependencies
{
"dependencies": {
"crawlee": "^3.7.0",
"apify": "^3.1.0",
"cheerio": "^1.0.0-rc.12"
}
}
Install the dependencies:
npm init -y
npm install crawlee apify cheerio
Scraping Job Listings by Category
Monster.com organizes jobs into categories like Technology, Healthcare, Finance, and more. Here's how to build a scraper that navigates these categories and extracts job data:
import { CheerioCrawler, Dataset } from 'crawlee';
const crawler = new CheerioCrawler({
maxRequestsPerCrawl: 500,
maxConcurrency: 5,
async requestHandler({ request, $, enqueueLinks, log }) {
const pageType = request.userData.pageType || 'listing';
if (pageType === 'listing') {
// Extract job cards from search results
const jobs = [];
$('[data-testid="svx-job-card"]').each((index, element) => {
const $el = $(element);
const job = {
title: $el.find('.job-cardstyle__JobCardTitle').text().trim(),
company: $el.find('.company-name').text().trim(),
location: $el.find('.job-cardstyle__JobCardLocation').text().trim(),
postedDate: $el.find('.job-cardstyle__JobCardDate').text().trim(),
summary: $el.find('.job-cardstyle__JobCardBody').text().trim(),
url: $el.find('a').attr('href'),
scrapedAt: new Date().toISOString(),
};
jobs.push(job);
});
if (jobs.length > 0) {
await Dataset.pushData(jobs);
log.info(`Extracted ${jobs.length} jobs from ${request.url}`);
}
// Follow pagination
await enqueueLinks({
selector: 'a.pagination__next',
userData: { pageType: 'listing' },
});
}
},
});
// Start with category URLs
await crawler.run([
{ url: 'https://www.monster.com/jobs/search?q=software+engineer', userData: { pageType: 'listing' } },
{ url: 'https://www.monster.com/jobs/search?q=data+analyst', userData: { pageType: 'listing' } },
{ url: 'https://www.monster.com/jobs/search?q=project+manager', userData: { pageType: 'listing' } },
]);
This crawler extracts basic job information from Monster's search results pages. The maxConcurrency setting limits parallel requests to avoid overwhelming the server.
Extracting Detailed Job Information
Search results provide summary data, but individual job pages contain much richer information. Let's enhance our scraper to visit each job detail page:
import { CheerioCrawler, Dataset } from 'crawlee';
const crawler = new CheerioCrawler({
maxRequestsPerCrawl: 1000,
maxConcurrency: 3,
async requestHandler({ request, $, enqueueLinks, log }) {
const pageType = request.userData.pageType;
if (pageType === 'search') {
// Collect job detail URLs from search results
const jobLinks = [];
$('[data-testid="svx-job-card"] a').each((i, el) => {
const href = $(el).attr('href');
if (href && href.includes('/job/')) {
jobLinks.push({
url: new URL(href, 'https://www.monster.com').href,
userData: { pageType: 'detail' },
});
}
});
log.info(`Found ${jobLinks.length} job links on search page`);
await crawler.addRequests(jobLinks);
// Handle pagination
await enqueueLinks({
selector: 'a[data-testid="svx-pagination-next"]',
userData: { pageType: 'search' },
});
} else if (pageType === 'detail') {
// Extract comprehensive job details
const jobDetail = {
title: $('h1.job-title').text().trim(),
company: $('.employer-name').text().trim(),
location: $('.location-text').text().trim(),
salary: extractSalary($),
jobType: $('.job-type').text().trim(),
experienceLevel: $('.experience-level').text().trim(),
description: $('.job-description').text().trim(),
requirements: extractRequirements($),
benefits: extractBenefits($),
skills: extractSkills($),
postedDate: $('.posted-date').text().trim(),
applicationUrl: $('a.apply-button').attr('href'),
sourceUrl: request.url,
scrapedAt: new Date().toISOString(),
};
await Dataset.pushData(jobDetail);
log.info(`Scraped detail: ${jobDetail.title} at ${jobDetail.company}`);
}
},
});
function extractSalary($) {
const salaryText = $('.salary-info, .compensation').text().trim();
if (!salaryText) return null;
const match = salaryText.match(/\$(\d[\d,]+)\s*-?\s*\$?(\d[\d,]+)?/);
return {
raw: salaryText,
min: match ? parseInt(match[1].replace(/,/g, '')) : null,
max: match && match[2] ? parseInt(match[2].replace(/,/g, '')) : null,
period: salaryText.toLowerCase().includes('year') ? 'annual'
: salaryText.toLowerCase().includes('hour') ? 'hourly' : 'unknown',
};
}
function extractRequirements($) {
const requirements = [];
$('.job-requirements li, .qualifications li').each((i, el) => {
requirements.push($(el).text().trim());
});
return requirements;
}
function extractBenefits($) {
const benefits = [];
$('.benefits li, .perks li').each((i, el) => {
benefits.push($(el).text().trim());
});
return benefits;
}
function extractSkills($) {
const skills = [];
$('.skill-tag, .required-skill').each((i, el) => {
skills.push($(el).text().trim());
});
return skills;
}
await crawler.run([
{ url: 'https://www.monster.com/jobs/search?q=developer&where=remote', userData: { pageType: 'search' } },
]);
Location-Based Job Search Scraping
Monster supports geographic filtering, which is invaluable for location-specific job market analysis:
const locations = [
'New York, NY',
'San Francisco, CA',
'Austin, TX',
'Chicago, IL',
'Seattle, WA',
'Remote',
];
const searchTerms = ['software engineer', 'data scientist', 'product manager'];
const startUrls = [];
for (const location of locations) {
for (const term of searchTerms) {
const encodedTerm = encodeURIComponent(term);
const encodedLocation = encodeURIComponent(location);
startUrls.push({
url: `https://www.monster.com/jobs/search?q=${encodedTerm}&where=${encodedLocation}`,
userData: {
pageType: 'search',
searchTerm: term,
location: location,
},
});
}
}
console.log(`Generated ${startUrls.length} search combinations`);
await crawler.run(startUrls);
This creates a matrix of search queries, allowing you to compare job availability and salary ranges across different cities for the same roles.
Salary Data Extraction and Analysis
Salary information is one of the most valuable data points. Here's a dedicated module for salary extraction and normalization:
class SalaryExtractor {
static normalize(salaryText) {
if (!salaryText) return null;
const cleaned = salaryText.replace(/[^\d\$\-\.,kKmM\/yearhourannum]/gi, ' ').trim();
// Handle "k" notation (e.g., "$80k - $120k")
const kMatch = cleaned.match(/\$(\d+)k\s*-?\s*\$?(\d+)?k?/i);
if (kMatch) {
return {
min: parseInt(kMatch[1]) * 1000,
max: kMatch[2] ? parseInt(kMatch[2]) * 1000 : null,
period: 'annual',
currency: 'USD',
};
}
// Handle standard ranges (e.g., "$80,000 - $120,000")
const rangeMatch = cleaned.match(/\$(\d[\d,]+)\s*-\s*\$?(\d[\d,]+)/);
if (rangeMatch) {
const min = parseInt(rangeMatch[1].replace(/,/g, ''));
const max = parseInt(rangeMatch[2].replace(/,/g, ''));
return {
min,
max,
period: min > 500 ? 'annual' : 'hourly',
currency: 'USD',
};
}
// Handle single values
const singleMatch = cleaned.match(/\$(\d[\d,]+)/);
if (singleMatch) {
const value = parseInt(singleMatch[1].replace(/,/g, ''));
return {
min: value,
max: null,
period: value > 500 ? 'annual' : 'hourly',
currency: 'USD',
};
}
return { raw: salaryText, min: null, max: null, period: 'unknown', currency: 'USD' };
}
static toAnnual(salary) {
if (!salary || !salary.min) return null;
if (salary.period === 'annual') return salary;
if (salary.period === 'hourly') {
return {
...salary,
min: salary.min * 2080,
max: salary.max ? salary.max * 2080 : null,
period: 'annual',
};
}
return salary;
}
}
Company Profile Extraction
Understanding employers is crucial for job seekers and recruiters alike. Here's how to scrape company profiles from Monster:
async function scrapeCompanyProfile($, companyUrl) {
return {
name: $('h1.company-name').text().trim(),
industry: $('.company-industry').text().trim(),
size: $('.company-size').text().trim(),
founded: $('.company-founded').text().trim(),
headquarters: $('.company-hq').text().trim(),
description: $('.company-description').text().trim(),
website: $('.company-website a').attr('href'),
rating: parseFloat($('.company-rating .score').text()) || null,
reviewCount: parseInt($('.review-count').text().replace(/\D/g, '')) || 0,
activeJobs: parseInt($('.active-jobs-count').text().replace(/\D/g, '')) || 0,
benefits: [],
culture: [],
profileUrl: companyUrl,
scrapedAt: new Date().toISOString(),
};
}
Deploying on Apify Platform
The real power comes when you deploy your scraper to Apify's cloud platform. This gives you scheduled runs, proxy rotation, and automatic data storage.
Creating an Apify Actor
import { Actor } from 'apify';
import { CheerioCrawler, Dataset } from 'crawlee';
await Actor.init();
const input = await Actor.getInput() ?? {};
const {
searchTerms = ['software engineer'],
locations = ['United States'],
maxResults = 100,
includeDetails = true,
proxyConfig = { useApifyProxy: true },
} = input;
const proxyConfiguration = await Actor.createProxyConfiguration(proxyConfig);
const crawler = new CheerioCrawler({
proxyConfiguration,
maxRequestsPerCrawl: maxResults,
maxConcurrency: 5,
async requestHandler({ request, $, log }) {
// ... scraping logic from above ...
const results = extractJobsFromPage($);
await Dataset.pushData(results);
},
async failedRequestHandler({ request, log }) {
log.error(`Request failed: ${request.url}`);
},
});
const startUrls = buildSearchUrls(searchTerms, locations);
await crawler.run(startUrls);
await Actor.exit();
Input Schema Configuration
Create an INPUT_SCHEMA.json for your Actor to make it user-friendly:
{
"title": "Monster.com Job Scraper",
"description": "Extracts job listings, salary data, and company information from Monster.com",
"type": "object",
"schemaVersion": 1,
"properties": {
"searchTerms": {
"title": "Search Terms",
"type": "array",
"description": "Job titles or keywords to search for",
"editor": "stringList",
"default": ["software engineer"]
},
"locations": {
"title": "Locations",
"type": "array",
"description": "Cities, states, or countries to search in",
"editor": "stringList",
"default": ["United States"]
},
"maxResults": {
"title": "Maximum Results",
"type": "integer",
"description": "Maximum number of job listings to scrape",
"default": 100,
"minimum": 1,
"maximum": 10000
},
"includeDetails": {
"title": "Include Job Details",
"type": "boolean",
"description": "Visit individual job pages for full details",
"default": true
}
},
"required": ["searchTerms"]
}
Data Export and Analysis
Once your scraper runs, you can export data in multiple formats. Here's how to process the results for analysis:
import { Dataset } from 'crawlee';
async function exportAndAnalyze() {
const dataset = await Dataset.open();
const { items } = await dataset.getData();
// Salary analysis by location
const salaryByLocation = {};
items.forEach(job => {
if (job.salary && job.salary.min && job.location) {
const loc = job.location.split(',')[0].trim();
if (!salaryByLocation[loc]) {
salaryByLocation[loc] = [];
}
salaryByLocation[loc].push(job.salary.min);
}
});
// Calculate averages
const analysis = Object.entries(salaryByLocation).map(([location, salaries]) => ({
location,
avgSalary: Math.round(salaries.reduce((a, b) => a + b, 0) / salaries.length),
medianSalary: salaries.sort((a, b) => a - b)[Math.floor(salaries.length / 2)],
jobCount: salaries.length,
minSalary: Math.min(...salaries),
maxSalary: Math.max(...salaries),
}));
console.table(analysis.sort((a, b) => b.avgSalary - a.avgSalary));
return analysis;
}
Handling Anti-Scraping Measures
Monster.com employs various protections. Here are ethical strategies to handle them:
const crawlerConfig = {
// Respect rate limits
maxConcurrency: 3,
maxRequestRetries: 3,
// Use reasonable delays
minConcurrency: 1,
requestHandlerTimeoutSecs: 60,
// Rotate user agents
preNavigationHooks: [
async ({ request }, controller) => {
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36',
];
request.headers = {
...request.headers,
'User-Agent': userAgents[Math.floor(Math.random() * userAgents.length)],
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9',
'Accept-Language': 'en-US,en;q=0.5',
};
},
],
// Session pool for managing cookies
useSessionPool: true,
sessionPoolOptions: {
maxPoolSize: 10,
sessionOptions: {
maxUsageCount: 50,
},
},
};
Best Practices for Monster.com Scraping
Respect robots.txt: Always check and follow Monster's robots.txt directives before scraping.
Rate limiting: Keep your request rate reasonable. A maximum of 3-5 concurrent requests with delays is recommended for ethical scraping.
Data freshness: Job listings change frequently. Schedule your scraper to run daily or weekly to maintain current data.
Error handling: Implement robust error handling for network timeouts, page structure changes, and CAPTCHA challenges.
Legal compliance: Review Monster.com's Terms of Service before scraping. Consider using their official API if available for your use case.
Data storage: Use structured formats (JSON, CSV) and normalize salary data for consistent analysis.
Proxy rotation: Use Apify's built-in proxy infrastructure to distribute requests across multiple IP addresses and avoid rate limiting.
Monitoring: Set up alerts for scraper failures or significant changes in data volume that might indicate structural changes on the site.
Use Cases for Monster.com Data
The extracted data supports numerous practical applications:
Job market analysis: Track hiring trends across industries and regions to understand which sectors are growing, which skills are in demand, and how compensation evolves over time.
Salary benchmarking: Compare compensation packages across roles, locations, and company sizes. This helps both job seekers negotiate better offers and employers stay competitive with their compensation packages.
Competitive intelligence: Monitor competitors' hiring patterns to understand their strategic direction. A sudden burst of AI engineering postings might signal a pivot into machine learning.
Recruitment optimization: Recruiters can identify talent pools, understand candidate expectations, and optimize job posting language based on what performs well on the platform.
Academic research: Researchers studying labor economics, skills gaps, or workforce trends use job board data to quantify employment market dynamics at scale.
Conclusion
Scraping Monster.com provides valuable insights into the job market when done responsibly and ethically. By combining Crawlee's powerful scraping capabilities with Apify's cloud infrastructure, you can build robust, scalable data pipelines that deliver actionable employment market intelligence.
Remember to always respect the website's terms of service, implement reasonable rate limits, and handle the extracted data responsibly. The techniques covered here — from basic job listing extraction to salary normalization and company profiling — form a solid foundation for any job market data project.
Whether you're building a job aggregator, conducting market research, or analyzing hiring trends, these tools and techniques will help you extract maximum value from Monster.com's vast employment data.
Looking for a ready-made solution? Check out job scraping actors on Apify Store for pre-built, maintained scrapers that handle all the complexity for you.
Top comments (0)