DEV Community

agenthustler
agenthustler

Posted on

LinkedIn Profile Scraping: Extract Individual Professional Profiles and Career Data

LinkedIn is the world's largest professional network with over 1 billion members. Individual LinkedIn profiles contain a goldmine of structured career data — work history, skills, education, certifications, endorsements, and more. Whether you're building a recruiting pipeline, conducting market research on talent pools, or enriching your CRM with professional data, scraping LinkedIn profiles gives you access to data that's nearly impossible to get elsewhere.

In this guide, we'll cover the structure of LinkedIn public profiles, what data you can extract, how to handle LinkedIn's anti-scraping defenses, and how to build a production-ready profile scraper using Apify.

Important disclaimer: Always respect LinkedIn's Terms of Service, applicable privacy laws (GDPR, CCPA), and robots.txt. Only scrape publicly available data. This guide is for educational purposes and legitimate business use cases like recruiting and market research.

Understanding LinkedIn Profile Structure

A LinkedIn public profile is a structured document containing several distinct sections. Understanding this structure is essential for building an effective scraper.

The Profile Header

The top section of every profile contains:

  • Full name — First and last name
  • Headline — Professional tagline (e.g., "Senior Software Engineer at Google")
  • Location — City, state/region, country
  • Profile photo URL — The member's headshot
  • Background/banner image — Optional custom banner
  • Connection count — Number of connections (often shown as "500+" for large networks)
  • Current company and title — Extracted from the headline or current experience

Work Experience Section

Each experience entry includes:

  • Job title — The role held
  • Company name — The employer
  • Company LinkedIn URL — Link to the company page
  • Employment type — Full-time, part-time, contract, freelance, internship
  • Date range — Start and end dates (or "Present" for current roles)
  • Duration — Calculated time in role
  • Location — Where the role was based
  • Description — Free-text description of responsibilities and achievements

Education Section

Education entries contain:

  • School name — University, college, or institution
  • Degree — Bachelor's, Master's, PhD, etc.
  • Field of study — Major or concentration
  • Date range — Start and end years
  • Grade/GPA — Sometimes included
  • Activities and societies — Extracurricular involvement
  • Description — Additional context about the education

Skills and Endorsements

The skills section shows:

  • Skill name — e.g., "Python", "Project Management"
  • Endorsement count — Number of connections who endorsed this skill
  • Top endorsers — Names of people who endorsed (on full profiles)

Additional Sections

Profiles may also include:

  • Certifications — Professional certificates with issuing organization and dates
  • Publications — Articles, papers, books
  • Languages — Spoken languages with proficiency levels
  • Volunteer experience — Non-profit roles
  • Recommendations — Written testimonials from connections
  • Honors and awards — Professional recognition
  • Projects — Named projects with descriptions and collaborators
  • Contact info — Email, phone, website (if made public)

Contact Information Patterns

LinkedIn profiles can reveal contact information through several patterns:

Directly Available

  • Public email — Some users list their email in the contact info section
  • Website URLs — Personal sites, portfolios, blogs
  • Twitter/X handle — Social media links
  • Phone number — Rarely shared publicly

Derivable Patterns

While not directly scraped from LinkedIn, professional emails often follow predictable patterns based on company domain:

function generateEmailPatterns(firstName, lastName, domain) {
    const f = firstName.toLowerCase();
    const l = lastName.toLowerCase();
    return [
        `${f}.${l}@${domain}`,        // john.doe@company.com
        `${f}${l}@${domain}`,          // johndoe@company.com
        `${f[0]}${l}@${domain}`,       // jdoe@company.com
        `${f}@${domain}`,              // john@company.com
        `${f}_${l}@${domain}`,         // john_doe@company.com
        `${f}-${l}@${domain}`,         // john-doe@company.com
        `${l}.${f}@${domain}`,         // doe.john@company.com
        `${l}${f[0]}@${domain}`,       // doej@company.com
    ];
}
Enter fullscreen mode Exit fullscreen mode

These patterns can then be verified using email verification APIs — never send emails to unverified addresses.

Setting Up Your LinkedIn Profile Scraper

Method 1: Public Profile Scraping with Cheerio

For public LinkedIn profiles (those visible without login), you can use a lightweight HTTP-based approach:

const Apify = require('apify');
const cheerio = require('cheerio');

Apify.main(async () => {
    const input = await Apify.getInput();
    const { profileUrls = [], searchParams = {} } = input;

    const proxyConfig = await Apify.createProxyConfiguration({
        groups: ['RESIDENTIAL']
    });

    const requestList = await Apify.openRequestList('linkedin-profiles',
        profileUrls.map(url => ({
            url: url.replace(/\/$/, '') + '/', // Ensure trailing slash
            userData: { type: 'profile' }
        }))
    );

    const crawler = new Apify.CheerioCrawler({
        requestList,
        proxyConfiguration: proxyConfig,
        additionalMimeTypes: ['application/json'],
        prepareRequestFunction: ({ request }) => {
            request.headers = {
                'Accept': 'text/html,application/xhtml+xml',
                'Accept-Language': 'en-US,en;q=0.9',
                'Cache-Control': 'no-cache',
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
            };
        },
        handlePageFunction: async ({ request, $ }) => {
            // LinkedIn embeds structured data in JSON-LD
            const jsonLd = $('script[type="application/ld+json"]').html();
            let structuredData = {};
            if (jsonLd) {
                try {
                    structuredData = JSON.parse(jsonLd);
                } catch (e) {
                    console.log('Failed to parse JSON-LD');
                }
            }

            // Extract from meta tags (reliable for public profiles)
            const profile = {
                name: $('meta[property="og:title"]').attr('content')
                    || $('h1').first().text().trim(),
                headline: $('meta[name="description"]').attr('content')?.split(' | ')[0],
                location: $('.top-card-layout__second-subline span').first().text().trim(),
                profileUrl: request.url,
                profileImage: $('meta[property="og:image"]').attr('content'),
                connectionCount: $('.top-card-layout__connections')
                    .text().trim().replace(/[^0-9+]/g, ''),
            };

            // Extract experience section
            profile.experience = [];
            $('section.experience .experience-item').each((i, el) => {
                profile.experience.push({
                    title: $(el).find('.experience-item__title').text().trim(),
                    company: $(el).find('.experience-item__subtitle').text().trim(),
                    dateRange: $(el).find('.experience-item__duration span')
                        .first().text().trim(),
                    duration: $(el).find('.experience-item__duration span')
                        .last().text().trim(),
                    location: $(el).find('.experience-item__location').text().trim(),
                    description: $(el).find('.experience-item__description')
                        .text().trim()
                });
            });

            // Extract education section
            profile.education = [];
            $('section.education .education__item').each((i, el) => {
                profile.education.push({
                    school: $(el).find('.education__item--school').text().trim(),
                    degree: $(el).find('.education__item--degree').text().trim(),
                    fieldOfStudy: $(el).find('.education__item--field')
                        .text().trim(),
                    dateRange: $(el).find('.education__item--dates')
                        .text().trim()
                });
            });

            // Extract skills
            profile.skills = [];
            $('section.skills .skill-categories__skill').each((i, el) => {
                profile.skills.push($(el).text().trim());
            });

            profile.scrapedAt = new Date().toISOString();
            await Apify.pushData(profile);
        }
    });

    await crawler.run();
});
Enter fullscreen mode Exit fullscreen mode

Method 2: Browser-Based Extraction with Playwright

For richer data extraction, especially from profiles that require JavaScript rendering:

const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getInput();
    const { profileUrls = [], includeSkills = true, includeRecommendations = false } = input;

    const proxyConfig = await Apify.createProxyConfiguration({
        groups: ['RESIDENTIAL']
    });

    const crawler = new Apify.PlaywrightCrawler({
        proxyConfiguration: proxyConfig,
        maxConcurrency: 2,  // LinkedIn is strict on concurrency
        navigationTimeoutSecs: 45,
        handlePageFunction: async ({ request, page }) => {
            // Wait for main profile content
            await page.waitForSelector('.pv-text-details__left-panel',
                { timeout: 20000 }).catch(() => {});

            // Scroll to load lazy sections
            await autoScroll(page);

            const profileData = await page.evaluate(() => {
                const getText = (sel) =>
                    document.querySelector(sel)?.textContent?.trim() || '';
                const getAttr = (sel, attr) =>
                    document.querySelector(sel)?.getAttribute(attr) || '';

                // Header information
                const profile = {
                    fullName: getText('.pv-text-details__left-panel h1'),
                    headline: getText('.pv-text-details__left-panel .text-body-medium'),
                    location: getText('.pv-text-details__left-panel .text-body-small.inline'),
                    about: getText('#about ~ div .pv-shared-text-with-see-more span'),
                    followerCount: getText('.pv-recent-activity-section__follower-count'),
                };

                // Work experience
                profile.experience = [];
                document.querySelectorAll('#experience ~ div .pvs-list__paged-list-item')
                    .forEach(item => {
                    const expEntry = {
                        title: item.querySelector('.t-bold span')
                            ?.textContent?.trim() || '',
                        company: item.querySelector('.t-normal span')
                            ?.textContent?.trim() || '',
                        dateRange: item.querySelector('.pvs-entity__caption-wrapper')
                            ?.textContent?.trim() || '',
                        description: item.querySelector('.pvs-list__outer-container .pv-shared-text-with-see-more span')
                            ?.textContent?.trim() || ''
                    };
                    if (expEntry.title) profile.experience.push(expEntry);
                });

                // Education
                profile.education = [];
                document.querySelectorAll('#education ~ div .pvs-list__paged-list-item')
                    .forEach(item => {
                    const eduEntry = {
                        school: item.querySelector('.t-bold span')
                            ?.textContent?.trim() || '',
                        degree: item.querySelector('.t-normal span')
                            ?.textContent?.trim() || '',
                        dates: item.querySelector('.pvs-entity__caption-wrapper')
                            ?.textContent?.trim() || ''
                    };
                    if (eduEntry.school) profile.education.push(eduEntry);
                });

                // Certifications
                profile.certifications = [];
                document.querySelectorAll('#certifications ~ div .pvs-list__paged-list-item')
                    .forEach(item => {
                    profile.certifications.push({
                        name: item.querySelector('.t-bold span')
                            ?.textContent?.trim() || '',
                        issuer: item.querySelector('.t-normal span')
                            ?.textContent?.trim() || '',
                        date: item.querySelector('.pvs-entity__caption-wrapper')
                            ?.textContent?.trim() || ''
                    });
                });

                // Languages
                profile.languages = [];
                document.querySelectorAll('#languages ~ div .pvs-list__paged-list-item')
                    .forEach(item => {
                    profile.languages.push({
                        language: item.querySelector('.t-bold span')
                            ?.textContent?.trim() || '',
                        proficiency: item.querySelector('.t-normal span')
                            ?.textContent?.trim() || ''
                    });
                });

                return profile;
            });

            // Extract skills (requires clicking "Show all skills")
            if (includeSkills) {
                const showAllSkills = await page.$('text="Show all skills"');
                if (showAllSkills) {
                    await showAllSkills.click();
                    await page.waitForTimeout(2000);

                    profileData.skills = await page.evaluate(() => {
                        const skills = [];
                        document.querySelectorAll('.pv-skill-category-entity__name')
                            .forEach(el => {
                            const endorsementCount = el.closest('.pv-skill-category-entity')
                                ?.querySelector('.pv-skill-category-entity__endorsement-count')
                                ?.textContent?.trim();
                            skills.push({
                                name: el.textContent.trim(),
                                endorsements: parseInt(endorsementCount) || 0
                            });
                        });
                        return skills;
                    });

                    // Close the modal
                    const closeBtn = await page.$('[aria-label="Dismiss"]');
                    if (closeBtn) await closeBtn.click();
                }
            }

            // Add metadata
            profileData.profileUrl = request.url;
            profileData.scrapedAt = new Date().toISOString();

            await Apify.pushData(profileData);
        }
    });

    await crawler.addRequests(
        profileUrls.map(url => ({ url, userData: { type: 'profile' } }))
    );

    await crawler.run();
});

// Helper function to scroll page and trigger lazy loading
async function autoScroll(page) {
    await page.evaluate(async () => {
        await new Promise((resolve) => {
            let totalHeight = 0;
            const distance = 400;
            const timer = setInterval(() => {
                const scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;
                if (totalHeight >= scrollHeight) {
                    clearInterval(timer);
                    resolve();
                }
            }, 300);
        });
    });
    await page.waitForTimeout(2000);
}
Enter fullscreen mode Exit fullscreen mode

Extracting Skills and Endorsement Data

Skills and endorsements reveal what a professional is known for and how their network validates their expertise. Here's a dedicated extraction function:

async function extractSkillsDetailed(page) {
    const skills = await page.evaluate(() => {
        const skillData = {
            topSkills: [],
            industryKnowledge: [],
            toolsTechnologies: [],
            interpersonalSkills: [],
            otherSkills: []
        };

        // Skills are often categorized
        document.querySelectorAll('.pv-skill-category-list')
            .forEach(category => {
            const categoryName = category.querySelector('h3')
                ?.textContent?.trim()?.toLowerCase() || 'other';

            const categorySkills = [];
            category.querySelectorAll('.pv-skill-category-entity')
                .forEach(skill => {
                categorySkills.push({
                    name: skill.querySelector('.pv-skill-category-entity__name span')
                        ?.textContent?.trim(),
                    endorsementCount: parseInt(
                        skill.querySelector('.pv-skill-category-entity__endorsement-count')
                            ?.textContent?.trim()
                    ) || 0
                });
            });

            if (categoryName.includes('industry')) {
                skillData.industryKnowledge = categorySkills;
            } else if (categoryName.includes('tools') || categoryName.includes('tech')) {
                skillData.toolsTechnologies = categorySkills;
            } else if (categoryName.includes('interpersonal')) {
                skillData.interpersonalSkills = categorySkills;
            } else {
                skillData.otherSkills.push(...categorySkills);
            }
        });

        return skillData;
    });

    return skills;
}
Enter fullscreen mode Exit fullscreen mode

Handling LinkedIn's Anti-Scraping Measures

LinkedIn has some of the most aggressive anti-scraping defenses on the web. Here's how to handle them:

Session and Cookie Management

const sessionPool = new Apify.SessionPool({
    maxPoolSize: 20,
    sessionOptions: {
        maxUsageCount: 5,  // Retire sessions after 5 uses
        maxErrorScore: 1    // Retire on first error
    },
    createSessionFunction: async (sessionPool) => {
        const session = new Apify.Session({ sessionPool });
        // Set LinkedIn-specific cookies
        session.setCookies([
            { name: 'li_at', value: '', domain: '.linkedin.com' },
            { name: 'JSESSIONID', value: `"ajax:${Date.now()}"`, domain: '.linkedin.com' }
        ], 'https://www.linkedin.com');
        return session;
    }
});
Enter fullscreen mode Exit fullscreen mode

Rate Limiting Strategy

const rateLimiter = {
    requestCount: 0,
    lastReset: Date.now(),
    maxRequestsPerMinute: 10,

    async throttle() {
        this.requestCount++;
        const elapsed = Date.now() - this.lastReset;

        if (elapsed < 60000 && this.requestCount >= this.maxRequestsPerMinute) {
            const waitTime = 60000 - elapsed + Math.random() * 5000;
            console.log(`Rate limit reached. Waiting ${Math.round(waitTime/1000)}s`);
            await new Promise(resolve => setTimeout(resolve, waitTime));
            this.requestCount = 0;
            this.lastReset = Date.now();
        }

        if (elapsed >= 60000) {
            this.requestCount = 0;
            this.lastReset = Date.now();
        }
    }
};
Enter fullscreen mode Exit fullscreen mode

Proxy Rotation

const proxyConfig = await Apify.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
    countryCode: 'US'
});
Enter fullscreen mode Exit fullscreen mode

Data Processing and Enrichment

Raw scraped data needs processing before it's useful. Here's a post-processing pipeline:

function processProfile(rawProfile) {
    const processed = {
        // Clean and normalize name
        firstName: rawProfile.fullName?.split(' ')[0] || '',
        lastName: rawProfile.fullName?.split(' ').slice(1).join(' ') || '',
        fullName: rawProfile.fullName?.trim(),

        // Parse headline into components
        ...parseHeadline(rawProfile.headline),

        // Normalize location
        location: normalizeLocation(rawProfile.location),

        // Calculate career metrics
        totalYearsExperience: calculateTotalExperience(rawProfile.experience),
        currentTenure: calculateCurrentTenure(rawProfile.experience),
        numberOfCompanies: new Set(
            rawProfile.experience?.map(e => e.company)
        ).size,
        averageTenure: calculateAverageTenure(rawProfile.experience),

        // Skill analysis
        topSkills: rawProfile.skills
            ?.sort((a, b) => b.endorsements - a.endorsements)
            .slice(0, 10)
            .map(s => s.name),
        skillCount: rawProfile.skills?.length || 0,

        // Education level
        highestDegree: determineHighestDegree(rawProfile.education),

        // Raw data preserved
        experience: rawProfile.experience,
        education: rawProfile.education,
        certifications: rawProfile.certifications,
        languages: rawProfile.languages,

        // Metadata
        profileUrl: rawProfile.profileUrl,
        scrapedAt: rawProfile.scrapedAt,
        dataQuality: assessDataQuality(rawProfile)
    };

    return processed;
}

function parseHeadline(headline) {
    if (!headline) return { currentTitle: '', currentCompany: '' };
    const atMatch = headline.match(/^(.+?)\s+at\s+(.+)$/i);
    if (atMatch) {
        return { currentTitle: atMatch[1].trim(), currentCompany: atMatch[2].trim() };
    }
    const pipeMatch = headline.match(/^(.+?)\s*[|]\s*(.+)$/);
    if (pipeMatch) {
        return { currentTitle: pipeMatch[1].trim(), currentCompany: pipeMatch[2].trim() };
    }
    return { currentTitle: headline, currentCompany: '' };
}

function calculateTotalExperience(experience) {
    if (!experience?.length) return 0;
    let totalMonths = 0;
    experience.forEach(exp => {
        const duration = exp.duration || exp.dateRange;
        const yearsMatch = duration?.match(/(\d+)\s*yr/);
        const monthsMatch = duration?.match(/(\d+)\s*mo/);
        totalMonths += (parseInt(yearsMatch?.[1]) || 0) * 12;
        totalMonths += parseInt(monthsMatch?.[1]) || 0;
    });
    return Math.round(totalMonths / 12 * 10) / 10;
}

function assessDataQuality(profile) {
    let score = 0;
    if (profile.fullName) score += 20;
    if (profile.headline) score += 15;
    if (profile.experience?.length > 0) score += 25;
    if (profile.education?.length > 0) score += 15;
    if (profile.skills?.length > 0) score += 15;
    if (profile.location) score += 10;
    return { score, grade: score >= 80 ? 'A' : score >= 60 ? 'B' : score >= 40 ? 'C' : 'D' };
}
Enter fullscreen mode Exit fullscreen mode

Using Pre-Built Apify Actors

For production use, consider leveraging existing Apify Store actors that have already solved these challenges:

const Apify = require('apify');

const client = Apify.newClient({ token: 'YOUR_APIFY_TOKEN' });

// Run LinkedIn Profile Scraper
const run = await client.actor('apify/linkedin-profile-scraper').call({
    profileUrls: [
        'https://www.linkedin.com/in/example-profile-1/',
        'https://www.linkedin.com/in/example-profile-2/'
    ],
    proxyConfig: { useApifyProxy: true, apifyProxyGroups: ['RESIDENTIAL'] },
    maxRequestRetries: 3,
    includeSkills: true,
    includeEducation: true,
    includeCertifications: true
});

// Download results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} profiles`);

// Export to CSV
const csvContent = items.map(item => ({
    name: item.fullName,
    title: item.currentTitle,
    company: item.currentCompany,
    location: item.location,
    experience_years: item.totalYearsExperience,
    skills: item.topSkills?.join('; ')
}));
Enter fullscreen mode Exit fullscreen mode

Using a pre-built actor saves you from dealing with LinkedIn's complex anti-scraping defenses, session management, and CAPTCHA handling. Many Apify actors for LinkedIn are battle-tested with thousands of users and handle edge cases automatically.

Building a Talent Pipeline

Here's how to combine profile scraping with search to build a complete talent sourcing pipeline:

const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getInput();
    const {
        searchKeywords = 'senior software engineer',
        targetLocation = 'San Francisco Bay Area',
        targetCompanies = [],
        maxProfiles = 100
    } = input;

    const dataset = await Apify.openDataset();
    const kvStore = await Apify.openKeyValueStore();

    // Step 1: Search for profiles matching criteria
    const searchUrl = `https://www.linkedin.com/search/results/people/?keywords=${encodeURIComponent(searchKeywords)}&location=${encodeURIComponent(targetLocation)}`;

    // Step 2: Extract profile URLs from search results
    // Step 3: Scrape each individual profile
    // Step 4: Score and rank candidates

    const candidates = [];

    // Scoring function
    function scoreCandidate(profile) {
        let score = 0;

        // Years of experience
        const years = profile.totalYearsExperience || 0;
        if (years >= 5) score += 30;
        else if (years >= 3) score += 20;
        else score += 10;

        // Target company experience
        const hasTargetCompany = profile.experience?.some(exp =>
            targetCompanies.some(tc =>
                exp.company?.toLowerCase().includes(tc.toLowerCase())
            )
        );
        if (hasTargetCompany) score += 25;

        // Skill match
        const relevantSkills = ['javascript', 'python', 'react', 'node.js', 'aws'];
        const matchedSkills = profile.skills?.filter(s =>
            relevantSkills.includes(s.name?.toLowerCase())
        ).length || 0;
        score += matchedSkills * 5;

        // Education
        if (profile.highestDegree === 'Master' || profile.highestDegree === 'PhD') {
            score += 10;
        }

        return score;
    }

    // Output ranked candidates
    const ranked = candidates
        .map(c => ({ ...c, score: scoreCandidate(c) }))
        .sort((a, b) => b.score - a.score);

    await dataset.pushData(ranked);
    await kvStore.setValue('summary', {
        totalFound: ranked.length,
        topCandidates: ranked.slice(0, 10).map(c => ({
            name: c.fullName,
            title: c.currentTitle,
            score: c.score
        }))
    });
});
Enter fullscreen mode Exit fullscreen mode

Ethical Considerations and Legal Compliance

When scraping LinkedIn profiles, keep these guidelines in mind:

  1. Only scrape public data — Never attempt to bypass login walls or access private profile sections
  2. Respect rate limits — Aggressive scraping harms the platform and other users
  3. GDPR compliance — If processing EU citizens' data, ensure you have a lawful basis
  4. CCPA compliance — California residents have rights over their personal data
  5. Data retention — Don't store personal data longer than necessary
  6. Purpose limitation — Only use data for the stated purpose
  7. Opt-out mechanisms — Provide a way for individuals to request data deletion
  8. No discrimination — Never use scraped data for discriminatory purposes

Conclusion

LinkedIn profile scraping unlocks powerful capabilities for recruiting, market research, and professional networking at scale. The key challenges — anti-scraping defenses, dynamic content loading, and data normalization — are all solvable with the right tools and approach.

Whether you build a custom scraper using the Apify SDK or leverage pre-built actors from the Apify Store, the combination of residential proxies, session management, and careful rate limiting will give you reliable access to professional profile data.

Start with a small batch of profiles to validate your pipeline, then scale gradually. Always prioritize data quality over quantity, and make sure your scraping practices comply with applicable privacy laws and platform terms.


Looking for ready-made LinkedIn scraping solutions? Browse the Apify Store for battle-tested profile scrapers, or build your own with the Apify SDK.

Top comments (0)