DEV Community

agenthustler
agenthustler

Posted on

Job Posting Intelligence: Using Web Scrapers to Track Competitor Hiring Signals

Every company's hiring page is a window into their strategy. When a competitor starts hiring five machine learning engineers, they're building an AI product. When a startup posts for enterprise sales reps across three new cities, they're expanding territory. When a company you sell to posts for a "Procurement Manager," there's budget being allocated.

Job posting intelligence is one of the most underused signals in B2B sales and competitive analysis. In this guide, I'll show you how to build an automated system that monitors job postings across LinkedIn, Indeed, and ZipRecruiter — turning raw hiring data into actionable business intelligence.

Why Job Postings Are Strategic Gold

Before we dive into the technical implementation, let's understand why job data matters:

1. Growth Signals

A company posting 50+ roles in a quarter is growing. If those roles are in engineering, they're building. If they're in sales, they're scaling revenue. If they're in compliance, they might be preparing for an IPO or regulatory change.

2. Technology Stack Reveals

Job descriptions are remarkably specific about technology. A posting for a "Senior Kubernetes Engineer with Terraform experience" tells you exactly what infrastructure they're running. This is invaluable for selling developer tools, cloud services, or consulting.

3. Budget Indicators

Hiring is expensive. A company posting for senior roles with competitive salaries has budget. Companies that are cutting back reduce postings first — often before any public announcement.

4. Timing Signals

The timing of postings reveals urgency. A role reposted three times in two months? They're struggling to fill it and might be open to contractor solutions or tooling that reduces the need for that hire.

Architecture Overview

Here's what we're building:

┌─────────────┐    ┌─────────────┐    ┌──────────────┐
│  LinkedIn    │    │   Indeed     │    │ ZipRecruiter │
│  Jobs API    │    │   Scraper    │    │   Scraper    │
└──────┬──────┘    └──────┬──────┘    └──────┬───────┘
       │                  │                   │
       └──────────┬───────┴───────────────────┘
                  │
          ┌───────▼────────┐
          │  Data Pipeline  │
          │  (Normalize +   │
          │   Deduplicate)  │
          └───────┬────────┘
                  │
          ┌───────▼────────┐
          │  Signal Engine  │
          │  (Detect hiring │
          │   patterns)     │
          └───────┬────────┘
                  │
       ┌──────────┼──────────┐
       │          │          │
  ┌────▼───┐ ┌───▼────┐ ┌───▼────┐
  │  CRM   │ │ Alerts │ │Dashboard│
  │ Update │ │ System │ │  & API  │
  └────────┘ └────────┘ └────────┘
Enter fullscreen mode Exit fullscreen mode

Step 1: Collecting Job Posting Data

The foundation of any intelligence system is reliable data collection. We'll use Apify actors to scrape job boards at scale without managing infrastructure.

LinkedIn Jobs Collection

LinkedIn is the richest source for B2B hiring intelligence. Here's how to set up automated collection using the LinkedIn Jobs Scraper actor on Apify:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: process.env.APIFY_TOKEN,
});

async function scrapeLinkedInJobs(companies) {
    const results = [];

    for (const company of companies) {
        const run = await client.actor('hMvNSpz3JnHgl5jkh').call({
            searchUrl: `https://www.linkedin.com/jobs/search/?keywords=${encodeURIComponent(company)}&location=United%20States&f_TPR=r604800`,
            maxItems: 100,
            proxy: {
                useApifyProxy: true,
                apifyProxyGroups: ['RESIDENTIAL'],
            },
        });

        const { items } = await client.dataset(run.defaultDatasetId).listItems();
        results.push({
            company,
            jobs: items,
            scrapedAt: new Date().toISOString(),
        });
    }

    return results;
}

// Track competitors and key accounts
const watchList = [
    'Salesforce',
    'HubSpot',
    'Snowflake',
    'Databricks',
    'Stripe',
];

const data = await scrapeLinkedInJobs(watchList);
console.log(`Collected ${data.reduce((sum, d) => sum + d.jobs.length, 0)} job postings`);
Enter fullscreen mode Exit fullscreen mode

Indeed Jobs Collection

Indeed provides broader coverage including roles that never appear on LinkedIn, particularly in mid-market and SMB companies:

async function scrapeIndeedJobs(queries) {
    const allResults = [];

    for (const query of queries) {
        const run = await client.actor('misceres/indeed-scraper').call({
            position: query.title,
            country: 'US',
            location: query.location || '',
            maxItems: 200,
            parseCompanyDetails: true,
        });

        const { items } = await client.dataset(run.defaultDatasetId).listItems();

        allResults.push({
            query: query.title,
            results: items.map(item => ({
                title: item.positionName,
                company: item.company,
                location: item.location,
                description: item.description,
                salary: item.salary,
                postedAt: item.postedAt,
                url: item.url,
            })),
        });
    }

    return allResults;
}

// Search for signals that indicate tool/service needs
const queries = [
    { title: 'DevOps Engineer', location: 'Remote' },
    { title: 'Data Engineer Snowflake', location: '' },
    { title: 'Salesforce Administrator', location: '' },
    { title: 'Head of Procurement', location: '' },
];
Enter fullscreen mode Exit fullscreen mode

ZipRecruiter for SMB Coverage

ZipRecruiter captures small and medium businesses that often don't post on LinkedIn:

async function scrapeZipRecruiter(searchTerms) {
    const run = await client.actor('epctex/ziprecruiter-scraper').call({
        search: searchTerms,
        location: 'United States',
        maxItems: 150,
    });

    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    return items;
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Data Normalization Pipeline

Raw job data from different sources has different schemas. We need a unified format:

function normalizeJobPosting(raw, source) {
    return {
        id: generateId(raw.company, raw.title, source),
        title: cleanTitle(raw.title || raw.positionName),
        company: normalizeCompanyName(raw.company),
        location: normalizeLocation(raw.location),
        description: raw.description || '',
        salary: parseSalary(raw.salary || raw.compensation),
        source: source,
        postedAt: parseDate(raw.postedAt || raw.date),
        scrapedAt: new Date().toISOString(),
        url: raw.url || raw.link,
        // Extracted signals
        technologies: extractTechnologies(raw.description),
        seniorityLevel: detectSeniority(raw.title),
        department: classifyDepartment(raw.title),
        isRemote: detectRemote(raw.location, raw.description),
    };
}

function extractTechnologies(description) {
    if (!description) return [];

    const techPatterns = {
        cloud: ['AWS', 'Azure', 'GCP', 'Google Cloud'],
        databases: ['PostgreSQL', 'MongoDB', 'Redis', 'Snowflake', 'BigQuery'],
        languages: ['Python', 'JavaScript', 'TypeScript', 'Go', 'Rust', 'Java'],
        frameworks: ['React', 'Next.js', 'Django', 'FastAPI', 'Spring Boot'],
        devops: ['Kubernetes', 'Docker', 'Terraform', 'Jenkins', 'GitHub Actions'],
        data: ['Spark', 'Airflow', 'dbt', 'Kafka', 'Flink'],
        crm: ['Salesforce', 'HubSpot', 'Dynamics 365'],
    };

    const found = [];
    const descUpper = description.toUpperCase();

    for (const [category, techs] of Object.entries(techPatterns)) {
        for (const tech of techs) {
            if (descUpper.includes(tech.toUpperCase())) {
                found.push({ name: tech, category });
            }
        }
    }

    return found;
}

function detectSeniority(title) {
    const titleLower = title.toLowerCase();
    if (titleLower.match(/\b(vp|vice president|director|head of|chief)\b/)) return 'executive';
    if (titleLower.match(/\b(senior|sr\.?|lead|principal|staff)\b/)) return 'senior';
    if (titleLower.match(/\b(junior|jr\.?|associate|entry)\b/)) return 'junior';
    if (titleLower.match(/\b(intern|internship|co-op)\b/)) return 'intern';
    return 'mid';
}

function classifyDepartment(title) {
    const titleLower = title.toLowerCase();
    const deptMap = {
        engineering: /engineer|developer|architect|devops|sre|platform/,
        data: /data scientist|data engineer|analytics|ml engineer|machine learning/,
        sales: /sales|account executive|business development|sdr|bdr/,
        marketing: /marketing|growth|content|seo|brand/,
        product: /product manager|product owner|ux|designer/,
        operations: /operations|procurement|supply chain|logistics/,
        finance: /finance|accounting|controller|treasury/,
        hr: /recruiter|people|talent|human resources/,
    };

    for (const [dept, pattern] of Object.entries(deptMap)) {
        if (titleLower.match(pattern)) return dept;
    }
    return 'other';
}
Enter fullscreen mode Exit fullscreen mode

Deduplication

The same job often appears across multiple platforms. We need smart deduplication:

function deduplicateJobs(jobs) {
    const seen = new Map();

    for (const job of jobs) {
        // Create a fuzzy key based on company + title + location
        const key = [
            job.company.toLowerCase().replace(/[^a-z0-9]/g, ''),
            job.title.toLowerCase().replace(/[^a-z0-9]/g, '').substring(0, 30),
            job.location.toLowerCase().split(',')[0].trim(),
        ].join('|');

        if (seen.has(key)) {
            // Keep the one with more data
            const existing = seen.get(key);
            if ((job.description?.length || 0) > (existing.description?.length || 0)) {
                job.sources = [...(existing.sources || [existing.source]), job.source];
                seen.set(key, job);
            } else {
                existing.sources = [...(existing.sources || [existing.source]), job.source];
            }
        } else {
            seen.set(key, job);
        }
    }

    return Array.from(seen.values());
}
Enter fullscreen mode Exit fullscreen mode

Step 3: The Signal Engine

This is where raw data becomes intelligence. We're looking for patterns that indicate opportunity:

class HiringSignalEngine {
    constructor(historicalData) {
        this.history = historicalData; // Previous scrape results
    }

    analyzeCompany(company, currentJobs) {
        const signals = [];
        const previousJobs = this.history.filter(j => j.company === company);
        const previousCount = previousJobs.length;
        const currentCount = currentJobs.length;

        // Signal 1: Hiring Surge
        if (currentCount > previousCount * 1.5 && currentCount > 5) {
            signals.push({
                type: 'HIRING_SURGE',
                severity: 'high',
                message: `${company} increased postings by ${Math.round(((currentCount - previousCount) / previousCount) * 100)}% (${previousCount}${currentCount})`,
                actionable: true,
            });
        }

        // Signal 2: New Department
        const prevDepts = new Set(previousJobs.map(j => j.department));
        const currDepts = new Set(currentJobs.map(j => j.department));
        const newDepts = [...currDepts].filter(d => !prevDepts.has(d));

        if (newDepts.length > 0) {
            signals.push({
                type: 'NEW_DEPARTMENT',
                severity: 'medium',
                message: `${company} is hiring in new departments: ${newDepts.join(', ')}`,
                departments: newDepts,
            });
        }

        // Signal 3: Executive Hiring
        const execRoles = currentJobs.filter(j => j.seniorityLevel === 'executive');
        if (execRoles.length > 0) {
            signals.push({
                type: 'EXECUTIVE_HIRE',
                severity: 'high',
                message: `${company} is hiring ${execRoles.length} executive roles: ${execRoles.map(j => j.title).join(', ')}`,
                roles: execRoles,
            });
        }

        // Signal 4: Technology Shift
        const prevTech = new Set(previousJobs.flatMap(j => j.technologies.map(t => t.name)));
        const currTech = new Set(currentJobs.flatMap(j => j.technologies.map(t => t.name)));
        const newTech = [...currTech].filter(t => !prevTech.has(t));

        if (newTech.length >= 2) {
            signals.push({
                type: 'TECH_SHIFT',
                severity: 'medium',
                message: `${company} is adopting new technologies: ${newTech.join(', ')}`,
                technologies: newTech,
            });
        }

        // Signal 5: Hiring Freeze (inverse signal)
        if (currentCount < previousCount * 0.5 && previousCount > 10) {
            signals.push({
                type: 'HIRING_SLOWDOWN',
                severity: 'low',
                message: `${company} reduced postings by ${Math.round(((previousCount - currentCount) / previousCount) * 100)}%`,
                actionable: false,
            });
        }

        return signals;
    }

    generateWeeklyReport(companies) {
        const report = {
            generatedAt: new Date().toISOString(),
            totalCompanies: companies.length,
            signals: [],
        };

        for (const [company, jobs] of Object.entries(companies)) {
            const companySignals = this.analyzeCompany(company, jobs);
            if (companySignals.length > 0) {
                report.signals.push({
                    company,
                    jobCount: jobs.length,
                    signals: companySignals,
                });
            }
        }

        // Sort by signal severity
        report.signals.sort((a, b) => {
            const severityOrder = { high: 0, medium: 1, low: 2 };
            const aMax = Math.min(...a.signals.map(s => severityOrder[s.severity]));
            const bMax = Math.min(...b.signals.map(s => severityOrder[s.severity]));
            return aMax - bMax;
        });

        return report;
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 4: CRM Integration

Intelligence is only valuable if it reaches the right people at the right time. Here's how to push signals into your CRM:

HubSpot Integration

import Hubspot from '@hubspot/api-client';

const hubspot = new Hubspot.Client({ accessToken: process.env.HUBSPOT_TOKEN });

async function pushSignalToHubspot(signal) {
    // Search for the company in HubSpot
    const searchResponse = await hubspot.crm.companies.searchApi.doSearch({
        filterGroups: [{
            filters: [{
                propertyName: 'name',
                operator: 'CONTAINS_TOKEN',
                value: signal.company,
            }],
        }],
    });

    if (searchResponse.results.length === 0) {
        console.log(`Company ${signal.company} not found in HubSpot, skipping`);
        return;
    }

    const companyId = searchResponse.results[0].id;

    // Update company properties
    await hubspot.crm.companies.basicApi.update(companyId, {
        properties: {
            hiring_signal_type: signal.signals.map(s => s.type).join(', '),
            hiring_signal_date: new Date().toISOString().split('T')[0],
            active_job_count: signal.jobCount.toString(),
            hiring_signal_detail: signal.signals.map(s => s.message).join('\n'),
        },
    });

    // Create a note for the sales team
    if (signal.signals.some(s => s.severity === 'high')) {
        await hubspot.crm.objects.notesApi.create({
            properties: {
                hs_note_body: formatSignalNote(signal),
                hs_timestamp: Date.now(),
            },
            associations: [{
                to: { id: companyId },
                types: [{ associationCategory: 'HUBSPOT_DEFINED', associationTypeId: 190 }],
            }],
        });
    }
}

function formatSignalNote(signal) {
    let note = `🎯 **Hiring Intelligence Alert** — ${signal.company}\n\n`;
    note += `Active Postings: ${signal.jobCount}\n\n`;

    for (const s of signal.signals) {
        const icon = s.severity === 'high' ? '🔴' : s.severity === 'medium' ? '🟡' : '🟢';
        note += `${icon} **${s.type}**: ${s.message}\n`;
    }

    note += `\n---\n_Auto-generated by Job Posting Intelligence System_`;
    return note;
}
Enter fullscreen mode Exit fullscreen mode

Salesforce Integration

import jsforce from 'jsforce';

async function pushToSalesforce(signal) {
    const conn = new jsforce.Connection({
        loginUrl: process.env.SF_LOGIN_URL,
    });

    await conn.login(process.env.SF_USERNAME, process.env.SF_PASSWORD + process.env.SF_TOKEN);

    // Find the account
    const accounts = await conn.query(
        `SELECT Id, Name FROM Account WHERE Name LIKE '%${signal.company}%' LIMIT 1`
    );

    if (accounts.records.length === 0) return;

    const accountId = accounts.records[0].Id;

    // Create a Task for the account owner
    if (signal.signals.some(s => s.actionable)) {
        await conn.sobject('Task').create({
            Subject: `Hiring Signal: ${signal.signals[0].type} at ${signal.company}`,
            Description: signal.signals.map(s => s.message).join('\n'),
            WhatId: accountId,
            Priority: signal.signals.some(s => s.severity === 'high') ? 'High' : 'Normal',
            Status: 'Not Started',
            ActivityDate: new Date(Date.now() + 3 * 86400000).toISOString().split('T')[0],
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 5: Scheduling and Automation

Set up the entire pipeline to run automatically on Apify's scheduling system:

// main.js — Apify Actor that orchestrates the full pipeline
import { Actor } from 'apify';

await Actor.init();

const input = await Actor.getInput();
const {
    watchList = [],
    hubspotToken,
    slackWebhook,
    schedule = 'weekly',
} = input;

// 1. Collect from all sources
console.log('Collecting job postings...');
const linkedInJobs = await scrapeLinkedInJobs(watchList);
const indeedJobs = await scrapeIndeedJobs(
    watchList.map(c => ({ title: "c, location: '' }))"
);

// 2. Normalize and deduplicate
const allJobs = [
    ...linkedInJobs.flatMap(r => r.jobs.map(j => normalizeJobPosting(j, 'linkedin'))),
    ...indeedJobs.flatMap(r => r.results.map(j => normalizeJobPosting(j, 'indeed'))),
];
const uniqueJobs = deduplicateJobs(allJobs);
console.log(`${uniqueJobs.length} unique postings after dedup`);

// 3. Load historical data and generate signals
const store = await Actor.openKeyValueStore('job-intelligence-history');
const history = (await store.getValue('previous-scan')) || [];
const engine = new HiringSignalEngine(history);

// Group by company
const byCompany = {};
for (const job of uniqueJobs) {
    if (!byCompany[job.company]) byCompany[job.company] = [];
    byCompany[job.company].push(job);
}

const report = engine.generateWeeklyReport(byCompany);

// 4. Push to CRM and notify
for (const signal of report.signals) {
    if (hubspotToken) await pushSignalToHubspot(signal);
    if (slackWebhook) await notifySlack(slackWebhook, signal);
}

// 5. Save current scan as history for next run
await store.setValue('previous-scan', uniqueJobs);
await Actor.pushData(report);

console.log(`Generated ${report.signals.length} signals for ${report.totalCompanies} companies`);
await Actor.exit();
Enter fullscreen mode Exit fullscreen mode

Practical Use Cases

For Sales Teams

Track your ICP companies. When a target account starts hiring for roles related to your product category, that's an intent signal stronger than any website visit. A company hiring three "Data Engineers with Snowflake experience" is probably about to expand their data stack — perfect timing to reach out about complementary tooling.

For Competitive Intelligence

Monitor your competitors' postings weekly. If your main competitor starts aggressively hiring in a new city or for a new product category, you'll know about it weeks before any press release.

For Investors and Analysts

Job posting velocity is a leading indicator of company health. Track portfolio companies, potential investments, or public companies for signals that precede earnings reports.

For Recruiting Firms

Know which companies are struggling to fill roles (reposted 3+ times) — those are warm leads for staffing agencies and recruiting services.

Cost Considerations

Running this system on Apify is surprisingly affordable:

  • LinkedIn Jobs scraper: ~$5-10/run for 500 postings
  • Indeed scraper: ~$3-5/run for 500 postings
  • Weekly schedule for 50 companies: ~$50-80/month
  • Apify platform free tier: 100 actor runs/month included

Compare that to commercial job intelligence platforms like Thinknum or Revelio Labs, which charge $10,000-50,000/year. Building your own gives you the same data at a fraction of the cost with full customization.

Next Steps

  1. Start small: Pick 10 companies you care about and run weekly scrapes
  2. Build your signal library: Add custom signals relevant to your industry
  3. Connect to your workflow: CRM, Slack, email — wherever your team lives
  4. Iterate on accuracy: Track which signals actually correlate with deals closed

Job posting intelligence won't replace your sales intuition, but it will make sure you never miss a signal hiding in plain sight. The companies that systematically monitor hiring data will consistently outperform those relying on gut feel and Google Alerts.


Want to try this yourself? Check out the Apify marketplace for ready-to-use job scraping actors, or build your own with the Apify SDK.

Top comments (0)