Web scraping startup job boards and company profiles has become an essential skill for recruiters, investors, market researchers, and job seekers who want to stay ahead in the competitive startup ecosystem. Wellfound (formerly AngelList Talent) is one of the richest sources of startup job data on the internet, hosting thousands of positions from early-stage startups to well-funded unicorns.
In this comprehensive guide, we'll explore how to extract job listings, company profiles, funding information, and remote job opportunities from Wellfound using modern web scraping techniques and Apify's cloud platform.
Why Scrape Wellfound?
Wellfound is unique among job boards for several reasons:
- Startup-focused: Unlike LinkedIn or Indeed, Wellfound exclusively features startup jobs, making the data highly targeted
- Rich company profiles: Each company listing includes funding stage, team size, market/industry, tech stack, and investor information
- Salary transparency: Most job listings display salary ranges upfront
- Remote job emphasis: Wellfound has been a leader in remote job listings since before the pandemic
- Investor connections: Company profiles often include notable investors and board members
This makes Wellfound data valuable for:
- Recruiters building talent pipelines for startup clients
- Investors tracking hiring trends as a signal for company health
- Job seekers aggregating opportunities across multiple platforms
- Market researchers analyzing startup ecosystem trends
- Competitive analysts monitoring hiring patterns in specific verticals
Understanding Wellfound's Structure
Before writing any scraping code, let's understand how Wellfound organizes its data.
Company Profiles
Each company on Wellfound has a profile page at wellfound.com/company/{slug} containing:
Company Name
Tagline / One-liner
Description (often multiple paragraphs)
Location(s)
Company Size (range)
Funding Stage (Seed, Series A, B, C, etc.)
Total Raised
Markets/Industries
Tech Stack
Founded Year
Website URL
Social Links
Team Members
Open Jobs
Job Listings
Job listings are found at wellfound.com/jobs and individual listings at wellfound.com/jobs/{id}. Each listing includes:
Job Title
Company (linked to profile)
Location / Remote status
Salary Range (min-max)
Equity Range
Experience Level
Job Type (Full-time, Part-time, Contract)
Visa Sponsorship availability
Skills/Tags
Posted Date
Job Description (full HTML)
Search and Filtering
Wellfound supports filtering by:
- Role type (Engineering, Design, Product, Marketing, etc.)
- Location
- Remote preference
- Company size
- Funding stage
- Salary range
- Experience level
Technical Approach to Scraping Wellfound
Wellfound is a modern React-based single-page application (SPA), which means traditional HTTP scraping won't capture dynamically rendered content. You'll need a browser-based approach.
Setting Up Your Environment
First, let's set up a Node.js project with Crawlee, Apify's open-source crawling library:
mkdir wellfound-scraper
cd wellfound-scraper
npm init -y
npm install crawlee puppeteer apify
Basic Job Listing Scraper
Here's a scraper that extracts job listings from Wellfound search results:
import { PuppeteerCrawler, Dataset } from 'crawlee';
const crawler = new PuppeteerCrawler({
maxRequestsPerCrawl: 200,
requestHandlerTimeoutSecs: 120,
async requestHandler({ page, request, enqueueLinks, log }) {
const { url } = request;
if (url.includes('/jobs') && !url.includes('/jobs/')) {
log.info('Scraping job listings from: ' + url);
await page.waitForSelector('[class*="styles_jobCard"]', {
timeout: 15000
});
await autoScroll(page);
const jobs = await page.evaluate(() => {
const cards = document.querySelectorAll('[class*="styles_jobCard"]');
return Array.from(cards).map(card => {
const titleEl = card.querySelector('[class*="styles_jobTitle"] a');
const companyEl = card.querySelector('[class*="styles_companyName"] a');
const locationEl = card.querySelector('[class*="styles_location"]');
const salaryEl = card.querySelector('[class*="styles_compensation"]');
const tagsEls = card.querySelectorAll('[class*="styles_tag"]');
return {
title: titleEl?.textContent?.trim() || '',
jobUrl: titleEl?.href || '',
company: companyEl?.textContent?.trim() || '',
companyUrl: companyEl?.href || '',
location: locationEl?.textContent?.trim() || '',
salary: salaryEl?.textContent?.trim() || '',
tags: Array.from(tagsEls).map(t => t.textContent?.trim()),
scrapedAt: new Date().toISOString()
};
});
});
log.info('Found ' + jobs.length + ' job listings');
await Dataset.pushData(jobs);
await enqueueLinks({
selector: '[class*="styles_pagination"] a',
label: 'LISTING'
});
} else if (url.includes('/jobs/')) {
log.info('Scraping job detail: ' + url);
await page.waitForSelector('[class*="styles_jobDetail"]', { timeout: 15000 });
const jobDetail = await page.evaluate(() => {
return {
title: document.querySelector('h1')?.textContent?.trim(),
description: document.querySelector('[class*="styles_description"]')?.innerHTML,
requirements: document.querySelector('[class*="styles_requirements"]')?.innerHTML,
benefits: document.querySelector('[class*="styles_benefits"]')?.innerHTML,
applicationUrl: document.querySelector('a[class*="styles_applyButton"]')?.href,
};
});
await Dataset.pushData({
...jobDetail,
url,
scrapedAt: new Date().toISOString()
});
}
},
failedRequestHandler({ request, log }) {
log.error('Request failed: ' + request.url);
},
});
async function autoScroll(page) {
await page.evaluate(async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 500;
const timer = setInterval(() => {
const scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= scrollHeight) {
clearInterval(timer);
resolve();
}
}, 300);
});
});
}
await crawler.run(['https://wellfound.com/jobs']);
Company Profile Scraper
Now let's build a scraper specifically for company profiles:
import { PuppeteerCrawler, Dataset } from 'crawlee';
const crawler = new PuppeteerCrawler({
maxRequestsPerCrawl: 100,
async requestHandler({ page, request, log }) {
log.info('Scraping company: ' + request.url);
await page.waitForSelector('[class*="styles_company"]', { timeout: 15000 });
const company = await page.evaluate(() => {
const name = document.querySelector('h1')?.textContent?.trim();
const tagline = document.querySelector('[class*="styles_tagline"]')?.textContent?.trim();
const description = document.querySelector('[class*="styles_description"]')?.textContent?.trim();
const website = document.querySelector('a[class*="styles_websiteLink"]')?.href;
const logoUrl = document.querySelector('img[class*="styles_logo"]')?.src;
const details = {};
const detailRows = document.querySelectorAll('[class*="styles_detailRow"]');
detailRows.forEach(row => {
const label = row.querySelector('[class*="styles_label"]')?.textContent?.trim();
const value = row.querySelector('[class*="styles_value"]')?.textContent?.trim();
if (label && value) {
details[label.toLowerCase().replace(/\s+/g, '_')] = value;
}
});
const techStack = Array.from(
document.querySelectorAll('[class*="styles_techTag"]')
).map(el => el.textContent?.trim());
const team = Array.from(
document.querySelectorAll('[class*="styles_teamMember"]')
).map(member => ({
name: member.querySelector('[class*="styles_memberName"]')?.textContent?.trim(),
role: member.querySelector('[class*="styles_memberRole"]')?.textContent?.trim(),
profileUrl: member.querySelector('a')?.href
}));
const openJobs = document.querySelectorAll('[class*="styles_jobListing"]').length;
return { name, tagline, description, website, logoUrl, ...details, techStack, team, openJobs };
});
await Dataset.pushData({
...company,
profileUrl: request.url,
scrapedAt: new Date().toISOString()
});
}
});
const companyUrls = [
'https://wellfound.com/company/stripe',
'https://wellfound.com/company/figma',
'https://wellfound.com/company/notion',
];
await crawler.run(companyUrls);
Extracting Funding Data
Funding information is particularly valuable for investors and market researchers:
async function extractFundingData(page) {
return await page.evaluate(() => {
const fundingSection = document.querySelector('[class*="styles_funding"]');
if (!fundingSection) return null;
const rounds = Array.from(
fundingSection.querySelectorAll('[class*="styles_fundingRound"]')
).map(round => ({
type: round.querySelector('[class*="styles_roundType"]')?.textContent?.trim(),
amount: round.querySelector('[class*="styles_roundAmount"]')?.textContent?.trim(),
date: round.querySelector('[class*="styles_roundDate"]')?.textContent?.trim(),
investors: Array.from(
round.querySelectorAll('[class*="styles_investor"]')
).map(inv => inv.textContent?.trim())
}));
const totalRaised = fundingSection.querySelector('[class*="styles_totalRaised"]')?.textContent?.trim();
const fundingStage = fundingSection.querySelector('[class*="styles_stage"]')?.textContent?.trim();
return { totalRaised, fundingStage, rounds, lastRoundDate: rounds[0]?.date || null };
});
}
Filtering for Remote Jobs
One of the most popular use cases is extracting remote job listings:
import { PuppeteerCrawler, Dataset } from 'crawlee';
const remoteJobsCrawler = new PuppeteerCrawler({
async requestHandler({ page, log }) {
log.info('Scraping remote jobs...');
await page.goto('https://wellfound.com/jobs?remote=true', { waitUntil: 'networkidle0' });
await page.waitForSelector('[class*="styles_jobCard"]', { timeout: 15000 });
let allJobs = [];
let hasMore = true;
let pageNum = 1;
while (hasMore && pageNum <= 20) {
log.info('Processing page ' + pageNum);
const jobs = await page.evaluate(() => {
const cards = document.querySelectorAll('[class*="styles_jobCard"]');
return Array.from(cards).map(card => ({
title: card.querySelector('[class*="jobTitle"]')?.textContent?.trim(),
company: card.querySelector('[class*="companyName"]')?.textContent?.trim(),
salary: card.querySelector('[class*="compensation"]')?.textContent?.trim(),
equity: card.querySelector('[class*="equity"]')?.textContent?.trim(),
location: 'Remote',
tags: Array.from(card.querySelectorAll('[class*="styles_tag"]')).map(t => t.textContent?.trim())
}));
});
allJobs = [...allJobs, ...jobs];
const nextButton = await page.$('[class*="pagination"] [class*="next"]');
if (nextButton) {
await nextButton.click();
await page.waitForNavigation({ waitUntil: 'networkidle0' });
pageNum++;
} else {
hasMore = false;
}
}
log.info('Total remote jobs found: ' + allJobs.length);
await Dataset.pushData(allJobs);
}
});
await remoteJobsCrawler.run(['https://wellfound.com/jobs?remote=true']);
Deploying on Apify
To run your scraper reliably in the cloud, deploy it as an Apify Actor:
import { Actor } from 'apify';
import { PuppeteerCrawler, Dataset } from 'crawlee';
await Actor.init();
const input = await Actor.getInput() ?? {};
const {
searchQuery = '',
location = '',
remoteOnly = false,
maxResults = 100,
fundingStage = '',
companySize = ''
} = input;
let searchUrl = 'https://wellfound.com/jobs?';
const params = new URLSearchParams();
if (searchQuery) params.set('q', searchQuery);
if (location) params.set('location', location);
if (remoteOnly) params.set('remote', 'true');
if (fundingStage) params.set('fundingStage', fundingStage);
if (companySize) params.set('companySize', companySize);
searchUrl += params.toString();
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
});
const crawler = new PuppeteerCrawler({
proxyConfiguration,
maxRequestsPerCrawl: maxResults,
launchContext: {
launchOptions: {
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
}
},
async requestHandler({ page, request, log }) {
log.info('Processing: ' + request.url);
await page.waitForSelector('[class*="styles_jobCard"]', { timeout: 20000 });
const jobs = await page.evaluate(() => {
return Array.from(document.querySelectorAll('[class*="styles_jobCard"]')).map(card => ({
title: card.querySelector('[class*="jobTitle"]')?.textContent?.trim(),
company: card.querySelector('[class*="companyName"]')?.textContent?.trim(),
location: card.querySelector('[class*="location"]')?.textContent?.trim(),
salary: card.querySelector('[class*="compensation"]')?.textContent?.trim(),
equity: card.querySelector('[class*="equity"]')?.textContent?.trim(),
jobUrl: card.querySelector('[class*="jobTitle"] a')?.href,
tags: Array.from(card.querySelectorAll('[class*="styles_tag"]')).map(t => t.textContent?.trim())
}));
});
await Dataset.pushData(jobs);
}
});
await crawler.run([searchUrl]);
await Actor.exit();
Data Processing and Output
Once you've scraped the data, clean and structure it:
function processJobData(rawJobs) {
return rawJobs
.filter(job => job.title && job.company)
.map(job => ({
...job,
salaryMin: parseSalary(job.salary, 'min'),
salaryMax: parseSalary(job.salary, 'max'),
equityMin: parseEquity(job.equity, 'min'),
equityMax: parseEquity(job.equity, 'max'),
isRemote: job.location?.toLowerCase().includes('remote') ?? false,
normalizedTitle: normalizeJobTitle(job.title),
}))
.sort((a, b) => (b.salaryMax || 0) - (a.salaryMax || 0));
}
function parseSalary(salaryStr, type) {
if (!salaryStr) return null;
const matches = salaryStr.match(/\$?([\d,]+)k?\s*[-\u2013]\s*\$?([\d,]+)k?/);
if (!matches) return null;
const multiplier = salaryStr.includes('k') || salaryStr.includes('K') ? 1000 : 1;
return type === 'min'
? parseInt(matches[1].replace(/,/g, '')) * multiplier
: parseInt(matches[2].replace(/,/g, '')) * multiplier;
}
function parseEquity(equityStr, type) {
if (!equityStr) return null;
const matches = equityStr.match(/([\d.]+)%\s*[-\u2013]\s*([\d.]+)%/);
if (!matches) return null;
return type === 'min' ? parseFloat(matches[1]) : parseFloat(matches[2]);
}
function normalizeJobTitle(title) {
const titleMap = {
'swe': 'Software Engineer',
'sde': 'Software Development Engineer',
'fe': 'Frontend Engineer',
'be': 'Backend Engineer',
'fs': 'Full Stack Engineer',
};
const lower = title.toLowerCase();
for (const [abbr, full] of Object.entries(titleMap)) {
if (lower === abbr) return full;
}
return title;
}
Best Practices and Ethical Considerations
Rate Limiting
Always implement proper rate limiting to avoid overloading servers:
const crawler = new PuppeteerCrawler({
maxConcurrency: 2,
maxRequestsPerMinute: 15,
navigationTimeoutSecs: 60,
});
Respect robots.txt
Check wellfound.com/robots.txt before scraping and respect any disallowed paths.
Data Privacy
- Don't scrape personal contact information from user profiles
- Be mindful of GDPR and CCPA when storing scraped data
- Use the data for legitimate business purposes only
Handling Anti-Bot Measures
Using Apify's proxy infrastructure with residential IPs helps maintain reliable access:
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'US',
});
Use Cases and Applications
For Recruiters
Build a pipeline that scrapes new job listings daily, filters by tech stack and seniority, and pushes matches to a CRM. This gives you a head start on sourcing candidates before listings get widely circulated.
For Investors
Monitor hiring velocity as a signal. A startup that suddenly posts 20 engineering roles likely just closed a round. Track this across your portfolio companies and competitors.
For Job Seekers
Aggregate listings from Wellfound alongside LinkedIn, Indeed, and other boards. Set up alerts for specific keywords, salary ranges, or companies.
For Market Researchers
Analyze trends in startup hiring: which roles are most in demand, what salary ranges are trending upward, which tech stacks are gaining popularity.
Conclusion
Wellfound is a goldmine of startup job and company data that can power recruiting pipelines, investment analysis, job searches, and market research. By using Puppeteer-based scraping with Crawlee and deploying on Apify, you can build reliable, scalable data extraction workflows that keep you ahead of the competition.
The key is to start with a focused use case, implement proper rate limiting and error handling, and respect ethical guidelines. Whether you're tracking the next unicorn's hiring spree or building the ultimate startup job aggregator, the tools and techniques in this guide will get you there.
Start with the basic examples above, customize the selectors for your specific needs, and scale up gradually. Happy scraping!
Top comments (0)