LinkedIn is the world's largest professional network with over 1 billion members. Individual LinkedIn profiles contain a goldmine of structured career data — work history, skills, education, certifications, endorsements, and more. Whether you're building a recruiting pipeline, conducting market research on talent pools, or enriching your CRM with professional data, scraping LinkedIn profiles gives you access to data that's nearly impossible to get elsewhere.
In this guide, we'll cover the structure of LinkedIn public profiles, what data you can extract, how to handle LinkedIn's anti-scraping defenses, and how to build a production-ready profile scraper using Apify.
Important disclaimer: Always respect LinkedIn's Terms of Service, applicable privacy laws (GDPR, CCPA), and robots.txt. Only scrape publicly available data. This guide is for educational purposes and legitimate business use cases like recruiting and market research.
Understanding LinkedIn Profile Structure
A LinkedIn public profile is a structured document containing several distinct sections. Understanding this structure is essential for building an effective scraper.
The Profile Header
The top section of every profile contains:
- Full name — First and last name
- Headline — Professional tagline (e.g., "Senior Software Engineer at Google")
- Location — City, state/region, country
- Profile photo URL — The member's headshot
- Background/banner image — Optional custom banner
- Connection count — Number of connections (often shown as "500+" for large networks)
- Current company and title — Extracted from the headline or current experience
Work Experience Section
Each experience entry includes:
- Job title — The role held
- Company name — The employer
- Company LinkedIn URL — Link to the company page
- Employment type — Full-time, part-time, contract, freelance, internship
- Date range — Start and end dates (or "Present" for current roles)
- Duration — Calculated time in role
- Location — Where the role was based
- Description — Free-text description of responsibilities and achievements
Education Section
Education entries contain:
- School name — University, college, or institution
- Degree — Bachelor's, Master's, PhD, etc.
- Field of study — Major or concentration
- Date range — Start and end years
- Grade/GPA — Sometimes included
- Activities and societies — Extracurricular involvement
- Description — Additional context about the education
Skills and Endorsements
The skills section shows:
- Skill name — e.g., "Python", "Project Management"
- Endorsement count — Number of connections who endorsed this skill
- Top endorsers — Names of people who endorsed (on full profiles)
Additional Sections
Profiles may also include:
- Certifications — Professional certificates with issuing organization and dates
- Publications — Articles, papers, books
- Languages — Spoken languages with proficiency levels
- Volunteer experience — Non-profit roles
- Recommendations — Written testimonials from connections
- Honors and awards — Professional recognition
- Projects — Named projects with descriptions and collaborators
- Contact info — Email, phone, website (if made public)
Contact Information Patterns
LinkedIn profiles can reveal contact information through several patterns:
Directly Available
- Public email — Some users list their email in the contact info section
- Website URLs — Personal sites, portfolios, blogs
- Twitter/X handle — Social media links
- Phone number — Rarely shared publicly
Derivable Patterns
While not directly scraped from LinkedIn, professional emails often follow predictable patterns based on company domain:
function generateEmailPatterns(firstName, lastName, domain) {
const f = firstName.toLowerCase();
const l = lastName.toLowerCase();
return [
`${f}.${l}@${domain}`, // john.doe@company.com
`${f}${l}@${domain}`, // johndoe@company.com
`${f[0]}${l}@${domain}`, // jdoe@company.com
`${f}@${domain}`, // john@company.com
`${f}_${l}@${domain}`, // john_doe@company.com
`${f}-${l}@${domain}`, // john-doe@company.com
`${l}.${f}@${domain}`, // doe.john@company.com
`${l}${f[0]}@${domain}`, // doej@company.com
];
}
These patterns can then be verified using email verification APIs — never send emails to unverified addresses.
Setting Up Your LinkedIn Profile Scraper
Method 1: Public Profile Scraping with Cheerio
For public LinkedIn profiles (those visible without login), you can use a lightweight HTTP-based approach:
const Apify = require('apify');
const cheerio = require('cheerio');
Apify.main(async () => {
const input = await Apify.getInput();
const { profileUrls = [], searchParams = {} } = input;
const proxyConfig = await Apify.createProxyConfiguration({
groups: ['RESIDENTIAL']
});
const requestList = await Apify.openRequestList('linkedin-profiles',
profileUrls.map(url => ({
url: url.replace(/\/$/, '') + '/', // Ensure trailing slash
userData: { type: 'profile' }
}))
);
const crawler = new Apify.CheerioCrawler({
requestList,
proxyConfiguration: proxyConfig,
additionalMimeTypes: ['application/json'],
prepareRequestFunction: ({ request }) => {
request.headers = {
'Accept': 'text/html,application/xhtml+xml',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
};
},
handlePageFunction: async ({ request, $ }) => {
// LinkedIn embeds structured data in JSON-LD
const jsonLd = $('script[type="application/ld+json"]').html();
let structuredData = {};
if (jsonLd) {
try {
structuredData = JSON.parse(jsonLd);
} catch (e) {
console.log('Failed to parse JSON-LD');
}
}
// Extract from meta tags (reliable for public profiles)
const profile = {
name: $('meta[property="og:title"]').attr('content')
|| $('h1').first().text().trim(),
headline: $('meta[name="description"]').attr('content')?.split(' | ')[0],
location: $('.top-card-layout__second-subline span').first().text().trim(),
profileUrl: request.url,
profileImage: $('meta[property="og:image"]').attr('content'),
connectionCount: $('.top-card-layout__connections')
.text().trim().replace(/[^0-9+]/g, ''),
};
// Extract experience section
profile.experience = [];
$('section.experience .experience-item').each((i, el) => {
profile.experience.push({
title: $(el).find('.experience-item__title').text().trim(),
company: $(el).find('.experience-item__subtitle').text().trim(),
dateRange: $(el).find('.experience-item__duration span')
.first().text().trim(),
duration: $(el).find('.experience-item__duration span')
.last().text().trim(),
location: $(el).find('.experience-item__location').text().trim(),
description: $(el).find('.experience-item__description')
.text().trim()
});
});
// Extract education section
profile.education = [];
$('section.education .education__item').each((i, el) => {
profile.education.push({
school: $(el).find('.education__item--school').text().trim(),
degree: $(el).find('.education__item--degree').text().trim(),
fieldOfStudy: $(el).find('.education__item--field')
.text().trim(),
dateRange: $(el).find('.education__item--dates')
.text().trim()
});
});
// Extract skills
profile.skills = [];
$('section.skills .skill-categories__skill').each((i, el) => {
profile.skills.push($(el).text().trim());
});
profile.scrapedAt = new Date().toISOString();
await Apify.pushData(profile);
}
});
await crawler.run();
});
Method 2: Browser-Based Extraction with Playwright
For richer data extraction, especially from profiles that require JavaScript rendering:
const Apify = require('apify');
Apify.main(async () => {
const input = await Apify.getInput();
const { profileUrls = [], includeSkills = true, includeRecommendations = false } = input;
const proxyConfig = await Apify.createProxyConfiguration({
groups: ['RESIDENTIAL']
});
const crawler = new Apify.PlaywrightCrawler({
proxyConfiguration: proxyConfig,
maxConcurrency: 2, // LinkedIn is strict on concurrency
navigationTimeoutSecs: 45,
handlePageFunction: async ({ request, page }) => {
// Wait for main profile content
await page.waitForSelector('.pv-text-details__left-panel',
{ timeout: 20000 }).catch(() => {});
// Scroll to load lazy sections
await autoScroll(page);
const profileData = await page.evaluate(() => {
const getText = (sel) =>
document.querySelector(sel)?.textContent?.trim() || '';
const getAttr = (sel, attr) =>
document.querySelector(sel)?.getAttribute(attr) || '';
// Header information
const profile = {
fullName: getText('.pv-text-details__left-panel h1'),
headline: getText('.pv-text-details__left-panel .text-body-medium'),
location: getText('.pv-text-details__left-panel .text-body-small.inline'),
about: getText('#about ~ div .pv-shared-text-with-see-more span'),
followerCount: getText('.pv-recent-activity-section__follower-count'),
};
// Work experience
profile.experience = [];
document.querySelectorAll('#experience ~ div .pvs-list__paged-list-item')
.forEach(item => {
const expEntry = {
title: item.querySelector('.t-bold span')
?.textContent?.trim() || '',
company: item.querySelector('.t-normal span')
?.textContent?.trim() || '',
dateRange: item.querySelector('.pvs-entity__caption-wrapper')
?.textContent?.trim() || '',
description: item.querySelector('.pvs-list__outer-container .pv-shared-text-with-see-more span')
?.textContent?.trim() || ''
};
if (expEntry.title) profile.experience.push(expEntry);
});
// Education
profile.education = [];
document.querySelectorAll('#education ~ div .pvs-list__paged-list-item')
.forEach(item => {
const eduEntry = {
school: item.querySelector('.t-bold span')
?.textContent?.trim() || '',
degree: item.querySelector('.t-normal span')
?.textContent?.trim() || '',
dates: item.querySelector('.pvs-entity__caption-wrapper')
?.textContent?.trim() || ''
};
if (eduEntry.school) profile.education.push(eduEntry);
});
// Certifications
profile.certifications = [];
document.querySelectorAll('#certifications ~ div .pvs-list__paged-list-item')
.forEach(item => {
profile.certifications.push({
name: item.querySelector('.t-bold span')
?.textContent?.trim() || '',
issuer: item.querySelector('.t-normal span')
?.textContent?.trim() || '',
date: item.querySelector('.pvs-entity__caption-wrapper')
?.textContent?.trim() || ''
});
});
// Languages
profile.languages = [];
document.querySelectorAll('#languages ~ div .pvs-list__paged-list-item')
.forEach(item => {
profile.languages.push({
language: item.querySelector('.t-bold span')
?.textContent?.trim() || '',
proficiency: item.querySelector('.t-normal span')
?.textContent?.trim() || ''
});
});
return profile;
});
// Extract skills (requires clicking "Show all skills")
if (includeSkills) {
const showAllSkills = await page.$('text="Show all skills"');
if (showAllSkills) {
await showAllSkills.click();
await page.waitForTimeout(2000);
profileData.skills = await page.evaluate(() => {
const skills = [];
document.querySelectorAll('.pv-skill-category-entity__name')
.forEach(el => {
const endorsementCount = el.closest('.pv-skill-category-entity')
?.querySelector('.pv-skill-category-entity__endorsement-count')
?.textContent?.trim();
skills.push({
name: el.textContent.trim(),
endorsements: parseInt(endorsementCount) || 0
});
});
return skills;
});
// Close the modal
const closeBtn = await page.$('[aria-label="Dismiss"]');
if (closeBtn) await closeBtn.click();
}
}
// Add metadata
profileData.profileUrl = request.url;
profileData.scrapedAt = new Date().toISOString();
await Apify.pushData(profileData);
}
});
await crawler.addRequests(
profileUrls.map(url => ({ url, userData: { type: 'profile' } }))
);
await crawler.run();
});
// Helper function to scroll page and trigger lazy loading
async function autoScroll(page) {
await page.evaluate(async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 400;
const timer = setInterval(() => {
const scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= scrollHeight) {
clearInterval(timer);
resolve();
}
}, 300);
});
});
await page.waitForTimeout(2000);
}
Extracting Skills and Endorsement Data
Skills and endorsements reveal what a professional is known for and how their network validates their expertise. Here's a dedicated extraction function:
async function extractSkillsDetailed(page) {
const skills = await page.evaluate(() => {
const skillData = {
topSkills: [],
industryKnowledge: [],
toolsTechnologies: [],
interpersonalSkills: [],
otherSkills: []
};
// Skills are often categorized
document.querySelectorAll('.pv-skill-category-list')
.forEach(category => {
const categoryName = category.querySelector('h3')
?.textContent?.trim()?.toLowerCase() || 'other';
const categorySkills = [];
category.querySelectorAll('.pv-skill-category-entity')
.forEach(skill => {
categorySkills.push({
name: skill.querySelector('.pv-skill-category-entity__name span')
?.textContent?.trim(),
endorsementCount: parseInt(
skill.querySelector('.pv-skill-category-entity__endorsement-count')
?.textContent?.trim()
) || 0
});
});
if (categoryName.includes('industry')) {
skillData.industryKnowledge = categorySkills;
} else if (categoryName.includes('tools') || categoryName.includes('tech')) {
skillData.toolsTechnologies = categorySkills;
} else if (categoryName.includes('interpersonal')) {
skillData.interpersonalSkills = categorySkills;
} else {
skillData.otherSkills.push(...categorySkills);
}
});
return skillData;
});
return skills;
}
Handling LinkedIn's Anti-Scraping Measures
LinkedIn has some of the most aggressive anti-scraping defenses on the web. Here's how to handle them:
Session and Cookie Management
const sessionPool = new Apify.SessionPool({
maxPoolSize: 20,
sessionOptions: {
maxUsageCount: 5, // Retire sessions after 5 uses
maxErrorScore: 1 // Retire on first error
},
createSessionFunction: async (sessionPool) => {
const session = new Apify.Session({ sessionPool });
// Set LinkedIn-specific cookies
session.setCookies([
{ name: 'li_at', value: '', domain: '.linkedin.com' },
{ name: 'JSESSIONID', value: `"ajax:${Date.now()}"`, domain: '.linkedin.com' }
], 'https://www.linkedin.com');
return session;
}
});
Rate Limiting Strategy
const rateLimiter = {
requestCount: 0,
lastReset: Date.now(),
maxRequestsPerMinute: 10,
async throttle() {
this.requestCount++;
const elapsed = Date.now() - this.lastReset;
if (elapsed < 60000 && this.requestCount >= this.maxRequestsPerMinute) {
const waitTime = 60000 - elapsed + Math.random() * 5000;
console.log(`Rate limit reached. Waiting ${Math.round(waitTime/1000)}s`);
await new Promise(resolve => setTimeout(resolve, waitTime));
this.requestCount = 0;
this.lastReset = Date.now();
}
if (elapsed >= 60000) {
this.requestCount = 0;
this.lastReset = Date.now();
}
}
};
Proxy Rotation
const proxyConfig = await Apify.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'US'
});
Data Processing and Enrichment
Raw scraped data needs processing before it's useful. Here's a post-processing pipeline:
function processProfile(rawProfile) {
const processed = {
// Clean and normalize name
firstName: rawProfile.fullName?.split(' ')[0] || '',
lastName: rawProfile.fullName?.split(' ').slice(1).join(' ') || '',
fullName: rawProfile.fullName?.trim(),
// Parse headline into components
...parseHeadline(rawProfile.headline),
// Normalize location
location: normalizeLocation(rawProfile.location),
// Calculate career metrics
totalYearsExperience: calculateTotalExperience(rawProfile.experience),
currentTenure: calculateCurrentTenure(rawProfile.experience),
numberOfCompanies: new Set(
rawProfile.experience?.map(e => e.company)
).size,
averageTenure: calculateAverageTenure(rawProfile.experience),
// Skill analysis
topSkills: rawProfile.skills
?.sort((a, b) => b.endorsements - a.endorsements)
.slice(0, 10)
.map(s => s.name),
skillCount: rawProfile.skills?.length || 0,
// Education level
highestDegree: determineHighestDegree(rawProfile.education),
// Raw data preserved
experience: rawProfile.experience,
education: rawProfile.education,
certifications: rawProfile.certifications,
languages: rawProfile.languages,
// Metadata
profileUrl: rawProfile.profileUrl,
scrapedAt: rawProfile.scrapedAt,
dataQuality: assessDataQuality(rawProfile)
};
return processed;
}
function parseHeadline(headline) {
if (!headline) return { currentTitle: '', currentCompany: '' };
const atMatch = headline.match(/^(.+?)\s+at\s+(.+)$/i);
if (atMatch) {
return { currentTitle: atMatch[1].trim(), currentCompany: atMatch[2].trim() };
}
const pipeMatch = headline.match(/^(.+?)\s*[|]\s*(.+)$/);
if (pipeMatch) {
return { currentTitle: pipeMatch[1].trim(), currentCompany: pipeMatch[2].trim() };
}
return { currentTitle: headline, currentCompany: '' };
}
function calculateTotalExperience(experience) {
if (!experience?.length) return 0;
let totalMonths = 0;
experience.forEach(exp => {
const duration = exp.duration || exp.dateRange;
const yearsMatch = duration?.match(/(\d+)\s*yr/);
const monthsMatch = duration?.match(/(\d+)\s*mo/);
totalMonths += (parseInt(yearsMatch?.[1]) || 0) * 12;
totalMonths += parseInt(monthsMatch?.[1]) || 0;
});
return Math.round(totalMonths / 12 * 10) / 10;
}
function assessDataQuality(profile) {
let score = 0;
if (profile.fullName) score += 20;
if (profile.headline) score += 15;
if (profile.experience?.length > 0) score += 25;
if (profile.education?.length > 0) score += 15;
if (profile.skills?.length > 0) score += 15;
if (profile.location) score += 10;
return { score, grade: score >= 80 ? 'A' : score >= 60 ? 'B' : score >= 40 ? 'C' : 'D' };
}
Using Pre-Built Apify Actors
For production use, consider leveraging existing Apify Store actors that have already solved these challenges:
const Apify = require('apify');
const client = Apify.newClient({ token: 'YOUR_APIFY_TOKEN' });
// Run LinkedIn Profile Scraper
const run = await client.actor('apify/linkedin-profile-scraper').call({
profileUrls: [
'https://www.linkedin.com/in/example-profile-1/',
'https://www.linkedin.com/in/example-profile-2/'
],
proxyConfig: { useApifyProxy: true, apifyProxyGroups: ['RESIDENTIAL'] },
maxRequestRetries: 3,
includeSkills: true,
includeEducation: true,
includeCertifications: true
});
// Download results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} profiles`);
// Export to CSV
const csvContent = items.map(item => ({
name: item.fullName,
title: item.currentTitle,
company: item.currentCompany,
location: item.location,
experience_years: item.totalYearsExperience,
skills: item.topSkills?.join('; ')
}));
Using a pre-built actor saves you from dealing with LinkedIn's complex anti-scraping defenses, session management, and CAPTCHA handling. Many Apify actors for LinkedIn are battle-tested with thousands of users and handle edge cases automatically.
Building a Talent Pipeline
Here's how to combine profile scraping with search to build a complete talent sourcing pipeline:
const Apify = require('apify');
Apify.main(async () => {
const input = await Apify.getInput();
const {
searchKeywords = 'senior software engineer',
targetLocation = 'San Francisco Bay Area',
targetCompanies = [],
maxProfiles = 100
} = input;
const dataset = await Apify.openDataset();
const kvStore = await Apify.openKeyValueStore();
// Step 1: Search for profiles matching criteria
const searchUrl = `https://www.linkedin.com/search/results/people/?keywords=${encodeURIComponent(searchKeywords)}&location=${encodeURIComponent(targetLocation)}`;
// Step 2: Extract profile URLs from search results
// Step 3: Scrape each individual profile
// Step 4: Score and rank candidates
const candidates = [];
// Scoring function
function scoreCandidate(profile) {
let score = 0;
// Years of experience
const years = profile.totalYearsExperience || 0;
if (years >= 5) score += 30;
else if (years >= 3) score += 20;
else score += 10;
// Target company experience
const hasTargetCompany = profile.experience?.some(exp =>
targetCompanies.some(tc =>
exp.company?.toLowerCase().includes(tc.toLowerCase())
)
);
if (hasTargetCompany) score += 25;
// Skill match
const relevantSkills = ['javascript', 'python', 'react', 'node.js', 'aws'];
const matchedSkills = profile.skills?.filter(s =>
relevantSkills.includes(s.name?.toLowerCase())
).length || 0;
score += matchedSkills * 5;
// Education
if (profile.highestDegree === 'Master' || profile.highestDegree === 'PhD') {
score += 10;
}
return score;
}
// Output ranked candidates
const ranked = candidates
.map(c => ({ ...c, score: scoreCandidate(c) }))
.sort((a, b) => b.score - a.score);
await dataset.pushData(ranked);
await kvStore.setValue('summary', {
totalFound: ranked.length,
topCandidates: ranked.slice(0, 10).map(c => ({
name: c.fullName,
title: c.currentTitle,
score: c.score
}))
});
});
Ethical Considerations and Legal Compliance
When scraping LinkedIn profiles, keep these guidelines in mind:
- Only scrape public data — Never attempt to bypass login walls or access private profile sections
- Respect rate limits — Aggressive scraping harms the platform and other users
- GDPR compliance — If processing EU citizens' data, ensure you have a lawful basis
- CCPA compliance — California residents have rights over their personal data
- Data retention — Don't store personal data longer than necessary
- Purpose limitation — Only use data for the stated purpose
- Opt-out mechanisms — Provide a way for individuals to request data deletion
- No discrimination — Never use scraped data for discriminatory purposes
Conclusion
LinkedIn profile scraping unlocks powerful capabilities for recruiting, market research, and professional networking at scale. The key challenges — anti-scraping defenses, dynamic content loading, and data normalization — are all solvable with the right tools and approach.
Whether you build a custom scraper using the Apify SDK or leverage pre-built actors from the Apify Store, the combination of residential proxies, session management, and careful rate limiting will give you reliable access to professional profile data.
Start with a small batch of profiles to validate your pipeline, then scale gradually. Always prioritize data quality over quantity, and make sure your scraping practices comply with applicable privacy laws and platform terms.
Looking for ready-made LinkedIn scraping solutions? Browse the Apify Store for battle-tested profile scrapers, or build your own with the Apify SDK.
Top comments (0)