DEV Community

agenthustler
agenthustler

Posted on

Wellfound (AngelList) Scraping: Extract Startup Jobs and Company Profiles

Web scraping startup job boards and company profiles has become an essential skill for recruiters, investors, market researchers, and job seekers who want to stay ahead in the competitive startup ecosystem. Wellfound (formerly AngelList Talent) is one of the richest sources of startup job data on the internet, hosting thousands of positions from early-stage startups to well-funded unicorns.

In this comprehensive guide, we'll explore how to extract job listings, company profiles, funding information, and remote job opportunities from Wellfound using modern web scraping techniques and Apify's cloud platform.

Why Scrape Wellfound?

Wellfound is unique among job boards for several reasons:

  • Startup-focused: Unlike LinkedIn or Indeed, Wellfound exclusively features startup jobs, making the data highly targeted
  • Rich company profiles: Each company listing includes funding stage, team size, market/industry, tech stack, and investor information
  • Salary transparency: Most job listings display salary ranges upfront
  • Remote job emphasis: Wellfound has been a leader in remote job listings since before the pandemic
  • Investor connections: Company profiles often include notable investors and board members

This makes Wellfound data valuable for:

  • Recruiters building talent pipelines for startup clients
  • Investors tracking hiring trends as a signal for company health
  • Job seekers aggregating opportunities across multiple platforms
  • Market researchers analyzing startup ecosystem trends
  • Competitive analysts monitoring hiring patterns in specific verticals

Understanding Wellfound's Structure

Before writing any scraping code, let's understand how Wellfound organizes its data.

Company Profiles

Each company on Wellfound has a profile page at wellfound.com/company/{slug} containing:

Company Name
Tagline / One-liner
Description (often multiple paragraphs)
Location(s)
Company Size (range)
Funding Stage (Seed, Series A, B, C, etc.)
Total Raised
Markets/Industries
Tech Stack
Founded Year
Website URL
Social Links
Team Members
Open Jobs
Enter fullscreen mode Exit fullscreen mode

Job Listings

Job listings are found at wellfound.com/jobs and individual listings at wellfound.com/jobs/{id}. Each listing includes:

Job Title
Company (linked to profile)
Location / Remote status
Salary Range (min-max)
Equity Range
Experience Level
Job Type (Full-time, Part-time, Contract)
Visa Sponsorship availability
Skills/Tags
Posted Date
Job Description (full HTML)
Enter fullscreen mode Exit fullscreen mode

Search and Filtering

Wellfound supports filtering by:

  • Role type (Engineering, Design, Product, Marketing, etc.)
  • Location
  • Remote preference
  • Company size
  • Funding stage
  • Salary range
  • Experience level

Technical Approach to Scraping Wellfound

Wellfound is a modern React-based single-page application (SPA), which means traditional HTTP scraping won't capture dynamically rendered content. You'll need a browser-based approach.

Setting Up Your Environment

First, let's set up a Node.js project with Crawlee, Apify's open-source crawling library:

mkdir wellfound-scraper
cd wellfound-scraper
npm init -y
npm install crawlee puppeteer apify
Enter fullscreen mode Exit fullscreen mode

Basic Job Listing Scraper

Here's a scraper that extracts job listings from Wellfound search results:

import { PuppeteerCrawler, Dataset } from 'crawlee';

const crawler = new PuppeteerCrawler({
    maxRequestsPerCrawl: 200,
    requestHandlerTimeoutSecs: 120,

    async requestHandler({ page, request, enqueueLinks, log }) {
        const { url } = request;

        if (url.includes('/jobs') && !url.includes('/jobs/')) {
            log.info('Scraping job listings from: ' + url);

            await page.waitForSelector('[class*="styles_jobCard"]', {
                timeout: 15000
            });

            await autoScroll(page);

            const jobs = await page.evaluate(() => {
                const cards = document.querySelectorAll('[class*="styles_jobCard"]');
                return Array.from(cards).map(card => {
                    const titleEl = card.querySelector('[class*="styles_jobTitle"] a');
                    const companyEl = card.querySelector('[class*="styles_companyName"] a');
                    const locationEl = card.querySelector('[class*="styles_location"]');
                    const salaryEl = card.querySelector('[class*="styles_compensation"]');
                    const tagsEls = card.querySelectorAll('[class*="styles_tag"]');

                    return {
                        title: titleEl?.textContent?.trim() || '',
                        jobUrl: titleEl?.href || '',
                        company: companyEl?.textContent?.trim() || '',
                        companyUrl: companyEl?.href || '',
                        location: locationEl?.textContent?.trim() || '',
                        salary: salaryEl?.textContent?.trim() || '',
                        tags: Array.from(tagsEls).map(t => t.textContent?.trim()),
                        scrapedAt: new Date().toISOString()
                    };
                });
            });

            log.info('Found ' + jobs.length + ' job listings');
            await Dataset.pushData(jobs);

            await enqueueLinks({
                selector: '[class*="styles_pagination"] a',
                label: 'LISTING'
            });

        } else if (url.includes('/jobs/')) {
            log.info('Scraping job detail: ' + url);

            await page.waitForSelector('[class*="styles_jobDetail"]', { timeout: 15000 });

            const jobDetail = await page.evaluate(() => {
                return {
                    title: document.querySelector('h1')?.textContent?.trim(),
                    description: document.querySelector('[class*="styles_description"]')?.innerHTML,
                    requirements: document.querySelector('[class*="styles_requirements"]')?.innerHTML,
                    benefits: document.querySelector('[class*="styles_benefits"]')?.innerHTML,
                    applicationUrl: document.querySelector('a[class*="styles_applyButton"]')?.href,
                };
            });

            await Dataset.pushData({
                ...jobDetail,
                url,
                scrapedAt: new Date().toISOString()
            });
        }
    },

    failedRequestHandler({ request, log }) {
        log.error('Request failed: ' + request.url);
    },
});

async function autoScroll(page) {
    await page.evaluate(async () => {
        await new Promise((resolve) => {
            let totalHeight = 0;
            const distance = 500;
            const timer = setInterval(() => {
                const scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;
                if (totalHeight >= scrollHeight) {
                    clearInterval(timer);
                    resolve();
                }
            }, 300);
        });
    });
}

await crawler.run(['https://wellfound.com/jobs']);
Enter fullscreen mode Exit fullscreen mode

Company Profile Scraper

Now let's build a scraper specifically for company profiles:

import { PuppeteerCrawler, Dataset } from 'crawlee';

const crawler = new PuppeteerCrawler({
    maxRequestsPerCrawl: 100,

    async requestHandler({ page, request, log }) {
        log.info('Scraping company: ' + request.url);

        await page.waitForSelector('[class*="styles_company"]', { timeout: 15000 });

        const company = await page.evaluate(() => {
            const name = document.querySelector('h1')?.textContent?.trim();
            const tagline = document.querySelector('[class*="styles_tagline"]')?.textContent?.trim();
            const description = document.querySelector('[class*="styles_description"]')?.textContent?.trim();
            const website = document.querySelector('a[class*="styles_websiteLink"]')?.href;
            const logoUrl = document.querySelector('img[class*="styles_logo"]')?.src;

            const details = {};
            const detailRows = document.querySelectorAll('[class*="styles_detailRow"]');
            detailRows.forEach(row => {
                const label = row.querySelector('[class*="styles_label"]')?.textContent?.trim();
                const value = row.querySelector('[class*="styles_value"]')?.textContent?.trim();
                if (label && value) {
                    details[label.toLowerCase().replace(/\s+/g, '_')] = value;
                }
            });

            const techStack = Array.from(
                document.querySelectorAll('[class*="styles_techTag"]')
            ).map(el => el.textContent?.trim());

            const team = Array.from(
                document.querySelectorAll('[class*="styles_teamMember"]')
            ).map(member => ({
                name: member.querySelector('[class*="styles_memberName"]')?.textContent?.trim(),
                role: member.querySelector('[class*="styles_memberRole"]')?.textContent?.trim(),
                profileUrl: member.querySelector('a')?.href
            }));

            const openJobs = document.querySelectorAll('[class*="styles_jobListing"]').length;

            return { name, tagline, description, website, logoUrl, ...details, techStack, team, openJobs };
        });

        await Dataset.pushData({
            ...company,
            profileUrl: request.url,
            scrapedAt: new Date().toISOString()
        });
    }
});

const companyUrls = [
    'https://wellfound.com/company/stripe',
    'https://wellfound.com/company/figma',
    'https://wellfound.com/company/notion',
];

await crawler.run(companyUrls);
Enter fullscreen mode Exit fullscreen mode

Extracting Funding Data

Funding information is particularly valuable for investors and market researchers:

async function extractFundingData(page) {
    return await page.evaluate(() => {
        const fundingSection = document.querySelector('[class*="styles_funding"]');
        if (!fundingSection) return null;

        const rounds = Array.from(
            fundingSection.querySelectorAll('[class*="styles_fundingRound"]')
        ).map(round => ({
            type: round.querySelector('[class*="styles_roundType"]')?.textContent?.trim(),
            amount: round.querySelector('[class*="styles_roundAmount"]')?.textContent?.trim(),
            date: round.querySelector('[class*="styles_roundDate"]')?.textContent?.trim(),
            investors: Array.from(
                round.querySelectorAll('[class*="styles_investor"]')
            ).map(inv => inv.textContent?.trim())
        }));

        const totalRaised = fundingSection.querySelector('[class*="styles_totalRaised"]')?.textContent?.trim();
        const fundingStage = fundingSection.querySelector('[class*="styles_stage"]')?.textContent?.trim();

        return { totalRaised, fundingStage, rounds, lastRoundDate: rounds[0]?.date || null };
    });
}
Enter fullscreen mode Exit fullscreen mode

Filtering for Remote Jobs

One of the most popular use cases is extracting remote job listings:

import { PuppeteerCrawler, Dataset } from 'crawlee';

const remoteJobsCrawler = new PuppeteerCrawler({
    async requestHandler({ page, log }) {
        log.info('Scraping remote jobs...');

        await page.goto('https://wellfound.com/jobs?remote=true', { waitUntil: 'networkidle0' });
        await page.waitForSelector('[class*="styles_jobCard"]', { timeout: 15000 });

        let allJobs = [];
        let hasMore = true;
        let pageNum = 1;

        while (hasMore && pageNum <= 20) {
            log.info('Processing page ' + pageNum);

            const jobs = await page.evaluate(() => {
                const cards = document.querySelectorAll('[class*="styles_jobCard"]');
                return Array.from(cards).map(card => ({
                    title: card.querySelector('[class*="jobTitle"]')?.textContent?.trim(),
                    company: card.querySelector('[class*="companyName"]')?.textContent?.trim(),
                    salary: card.querySelector('[class*="compensation"]')?.textContent?.trim(),
                    equity: card.querySelector('[class*="equity"]')?.textContent?.trim(),
                    location: 'Remote',
                    tags: Array.from(card.querySelectorAll('[class*="styles_tag"]')).map(t => t.textContent?.trim())
                }));
            });

            allJobs = [...allJobs, ...jobs];

            const nextButton = await page.$('[class*="pagination"] [class*="next"]');
            if (nextButton) {
                await nextButton.click();
                await page.waitForNavigation({ waitUntil: 'networkidle0' });
                pageNum++;
            } else {
                hasMore = false;
            }
        }

        log.info('Total remote jobs found: ' + allJobs.length);
        await Dataset.pushData(allJobs);
    }
});

await remoteJobsCrawler.run(['https://wellfound.com/jobs?remote=true']);
Enter fullscreen mode Exit fullscreen mode

Deploying on Apify

To run your scraper reliably in the cloud, deploy it as an Apify Actor:

import { Actor } from 'apify';
import { PuppeteerCrawler, Dataset } from 'crawlee';

await Actor.init();

const input = await Actor.getInput() ?? {};
const {
    searchQuery = '',
    location = '',
    remoteOnly = false,
    maxResults = 100,
    fundingStage = '',
    companySize = ''
} = input;

let searchUrl = 'https://wellfound.com/jobs?';
const params = new URLSearchParams();
if (searchQuery) params.set('q', searchQuery);
if (location) params.set('location', location);
if (remoteOnly) params.set('remote', 'true');
if (fundingStage) params.set('fundingStage', fundingStage);
if (companySize) params.set('companySize', companySize);
searchUrl += params.toString();

const proxyConfiguration = await Actor.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
});

const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    maxRequestsPerCrawl: maxResults,
    launchContext: {
        launchOptions: {
            headless: true,
            args: ['--no-sandbox', '--disable-setuid-sandbox']
        }
    },
    async requestHandler({ page, request, log }) {
        log.info('Processing: ' + request.url);

        await page.waitForSelector('[class*="styles_jobCard"]', { timeout: 20000 });

        const jobs = await page.evaluate(() => {
            return Array.from(document.querySelectorAll('[class*="styles_jobCard"]')).map(card => ({
                title: card.querySelector('[class*="jobTitle"]')?.textContent?.trim(),
                company: card.querySelector('[class*="companyName"]')?.textContent?.trim(),
                location: card.querySelector('[class*="location"]')?.textContent?.trim(),
                salary: card.querySelector('[class*="compensation"]')?.textContent?.trim(),
                equity: card.querySelector('[class*="equity"]')?.textContent?.trim(),
                jobUrl: card.querySelector('[class*="jobTitle"] a')?.href,
                tags: Array.from(card.querySelectorAll('[class*="styles_tag"]')).map(t => t.textContent?.trim())
            }));
        });

        await Dataset.pushData(jobs);
    }
});

await crawler.run([searchUrl]);
await Actor.exit();
Enter fullscreen mode Exit fullscreen mode

Data Processing and Output

Once you've scraped the data, clean and structure it:

function processJobData(rawJobs) {
    return rawJobs
        .filter(job => job.title && job.company)
        .map(job => ({
            ...job,
            salaryMin: parseSalary(job.salary, 'min'),
            salaryMax: parseSalary(job.salary, 'max'),
            equityMin: parseEquity(job.equity, 'min'),
            equityMax: parseEquity(job.equity, 'max'),
            isRemote: job.location?.toLowerCase().includes('remote') ?? false,
            normalizedTitle: normalizeJobTitle(job.title),
        }))
        .sort((a, b) => (b.salaryMax || 0) - (a.salaryMax || 0));
}

function parseSalary(salaryStr, type) {
    if (!salaryStr) return null;
    const matches = salaryStr.match(/\$?([\d,]+)k?\s*[-\u2013]\s*\$?([\d,]+)k?/);
    if (!matches) return null;
    const multiplier = salaryStr.includes('k') || salaryStr.includes('K') ? 1000 : 1;
    return type === 'min'
        ? parseInt(matches[1].replace(/,/g, '')) * multiplier
        : parseInt(matches[2].replace(/,/g, '')) * multiplier;
}

function parseEquity(equityStr, type) {
    if (!equityStr) return null;
    const matches = equityStr.match(/([\d.]+)%\s*[-\u2013]\s*([\d.]+)%/);
    if (!matches) return null;
    return type === 'min' ? parseFloat(matches[1]) : parseFloat(matches[2]);
}

function normalizeJobTitle(title) {
    const titleMap = {
        'swe': 'Software Engineer',
        'sde': 'Software Development Engineer',
        'fe': 'Frontend Engineer',
        'be': 'Backend Engineer',
        'fs': 'Full Stack Engineer',
    };
    const lower = title.toLowerCase();
    for (const [abbr, full] of Object.entries(titleMap)) {
        if (lower === abbr) return full;
    }
    return title;
}
Enter fullscreen mode Exit fullscreen mode

Best Practices and Ethical Considerations

Rate Limiting

Always implement proper rate limiting to avoid overloading servers:

const crawler = new PuppeteerCrawler({
    maxConcurrency: 2,
    maxRequestsPerMinute: 15,
    navigationTimeoutSecs: 60,
});
Enter fullscreen mode Exit fullscreen mode

Respect robots.txt

Check wellfound.com/robots.txt before scraping and respect any disallowed paths.

Data Privacy

  • Don't scrape personal contact information from user profiles
  • Be mindful of GDPR and CCPA when storing scraped data
  • Use the data for legitimate business purposes only

Handling Anti-Bot Measures

Using Apify's proxy infrastructure with residential IPs helps maintain reliable access:

const proxyConfiguration = await Actor.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
    countryCode: 'US',
});
Enter fullscreen mode Exit fullscreen mode

Use Cases and Applications

For Recruiters

Build a pipeline that scrapes new job listings daily, filters by tech stack and seniority, and pushes matches to a CRM. This gives you a head start on sourcing candidates before listings get widely circulated.

For Investors

Monitor hiring velocity as a signal. A startup that suddenly posts 20 engineering roles likely just closed a round. Track this across your portfolio companies and competitors.

For Job Seekers

Aggregate listings from Wellfound alongside LinkedIn, Indeed, and other boards. Set up alerts for specific keywords, salary ranges, or companies.

For Market Researchers

Analyze trends in startup hiring: which roles are most in demand, what salary ranges are trending upward, which tech stacks are gaining popularity.

Conclusion

Wellfound is a goldmine of startup job and company data that can power recruiting pipelines, investment analysis, job searches, and market research. By using Puppeteer-based scraping with Crawlee and deploying on Apify, you can build reliable, scalable data extraction workflows that keep you ahead of the competition.

The key is to start with a focused use case, implement proper rate limiting and error handling, and respect ethical guidelines. Whether you're tracking the next unicorn's hiring spree or building the ultimate startup job aggregator, the tools and techniques in this guide will get you there.

Start with the basic examples above, customize the selectors for your specific needs, and scale up gradually. Happy scraping!

Top comments (0)