Introduction: The LinkedIn Data Goldmine
LinkedIn is the world's largest professional network, with over 1 billion members and 67 million company pages. For anyone in sales, recruiting, market research, or competitive intelligence, LinkedIn company data is incredibly valuable — employee counts, industry classifications, recent activity, funding information, and organizational structure.
But LinkedIn's official API is heavily restricted. The Marketing API requires partner approval, and even then, the data you can access is limited. If you need comprehensive company profile data at scale, scraping public LinkedIn pages is often the only practical option.
In this guide, I'll cover how to extract business profiles and employee data from LinkedIn's public company pages, with practical code examples and a production-ready approach using Apify.
What Data Can You Extract from LinkedIn Company Pages?
LinkedIn company pages are surprisingly data-rich. Here's what's publicly available without logging in:
Company Overview
- Company name and tagline
- Industry classification
- Company size (employee range: 1-10, 11-50, 51-200, etc.)
- Headquarters location
- Founded year
- Company type (Public, Private, Nonprofit, etc.)
- Website URL
- Specialties (keywords the company lists)
- About/Description text
Employee Insights
- Total employee count on LinkedIn
- Employee distribution by function (Engineering, Sales, Marketing, etc.)
- Employee growth rate (visible on some pages)
- Notable employees (people with high follower counts)
- New hires vs. departures trend
Activity Data
- Recent posts by the company page
- Post engagement (likes, comments, shares)
- Posting frequency
- Content themes (what topics they post about)
Jobs
- Active job listings count
- Job locations and types
- Growth signals (lots of hiring = growing)
Understanding LinkedIn's Page Structure
LinkedIn company pages follow a consistent URL pattern:
https://www.linkedin.com/company/{company-slug}/
https://www.linkedin.com/company/{company-slug}/about/
https://www.linkedin.com/company/{company-slug}/people/
https://www.linkedin.com/company/{company-slug}/posts/
https://www.linkedin.com/company/{company-slug}/jobs/
The main page shows an overview, while sub-pages provide detailed sections. For scraping, the /about/ page is the most data-dense — it contains structured fields in a consistent format.
The Public vs. Authenticated View
LinkedIn shows different data depending on whether you're logged in:
| Data Point | Public View | Logged-in View |
|---|---|---|
| Company name/description | Yes | Yes |
| Employee count | Approximate | Exact |
| Industry | Yes | Yes |
| Employee list | No | Yes (limited) |
| Follower count | Yes | Yes |
| Recent posts | Limited (3-5) | Full feed |
| Job listings | Count only | Full listings |
For this guide, we'll focus on public data extraction, which doesn't require authentication and avoids ToS complications.
Building a LinkedIn Company Scraper
LinkedIn's public pages are server-side rendered but include structured data in JSON-LD and microdata formats. Here's how to extract it:
import { CheerioCrawler, Dataset } from 'crawlee';
const crawler = new CheerioCrawler({
maxConcurrency: 2, // LinkedIn is aggressive about rate limiting
preNavigationHooks: [
async ({ request }) => {
// Essential: mimic a real browser
request.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Cache-Control': 'no-cache',
};
// Random delay: 3-8 seconds between requests
const delay = 3000 + Math.random() * 5000;
await new Promise(r => setTimeout(r, delay));
},
],
async requestHandler({ request, $, body }) {
const companyData = await extractCompanyData($, body, request.url);
await Dataset.pushData(companyData);
},
});
Extracting Structured Data
LinkedIn embeds JSON-LD structured data in company pages. This is the most reliable extraction method:
async function extractCompanyData($, body, url) {
// Method 1: JSON-LD (most reliable)
const jsonLd = extractJsonLd($);
// Method 2: Meta tags (fallback)
const metaData = extractMetaTags($);
// Method 3: Page content parsing (for data not in structured formats)
const pageData = extractFromPage($);
return {
url,
...mergeData(jsonLd, metaData, pageData),
scrapedAt: new Date().toISOString(),
};
}
function extractJsonLd($) {
const scripts = $('script[type="application/ld+json"]');
const results = {};
scripts.each((_, el) => {
try {
const data = JSON.parse($(el).html());
if (data['@type'] === 'Organization') {
results.name = data.name;
results.description = data.description;
results.url = data.url;
results.logo = data.logo?.url || data.logo;
results.foundingDate = data.foundingDate;
results.numberOfEmployees = data.numberOfEmployees?.value;
if (data.address) {
results.headquarters = {
street: data.address.streetAddress,
city: data.address.addressLocality,
state: data.address.addressRegion,
country: data.address.addressCountry,
};
}
if (data.member) {
results.memberCount = Array.isArray(data.member)
? data.member.length
: null;
}
}
} catch (e) {
// Invalid JSON-LD, skip
}
});
return results;
}
function extractMetaTags($) {
return {
title: $('meta[property="og:title"]').attr('content'),
description: $('meta[property="og:description"]').attr('content'),
image: $('meta[property="og:image"]').attr('content'),
type: $('meta[property="og:type"]').attr('content'),
twitterTitle: $('meta[name="twitter:title"]').attr('content'),
twitterDescription: $('meta[name="twitter:description"]').attr('content'),
};
}
Parsing the Company About Page
The /about/ page contains structured fields that aren't always in JSON-LD:
function extractFromPage($) {
const data = {};
// Company details are in definition list format
$('dl.overflow-hidden').each((_, dl) => {
const terms = $(dl).find('dt');
const definitions = $(dl).find('dd');
terms.each((i, dt) => {
const label = $(dt).text().trim().toLowerCase();
const value = $(definitions[i]).text().trim();
switch (label) {
case 'website':
data.website = value;
break;
case 'industry':
data.industry = value;
break;
case 'company size':
data.companySizeRange = value;
data.employeeCount = parseEmployeeCount(value);
break;
case 'headquarters':
data.headquartersText = value;
break;
case 'type':
data.companyType = value;
break;
case 'founded':
data.founded = parseInt(value) || null;
break;
case 'specialties':
data.specialties = value.split(',').map(s => s.trim());
break;
}
});
});
// Extract follower count
const followerText = $('[data-test-id="about-us__followers"]').text();
const followerMatch = followerText.match(/([\d,]+)/);
data.followers = followerMatch
? parseInt(followerMatch[1].replace(/,/g, ''))
: null;
// Extract the about text
data.about = $('section.about p').text().trim() || null;
return data;
}
function parseEmployeeCount(sizeText) {
// "10,001+ employees" → 10001
// "51-200 employees" → { min: 51, max: 200 }
// "1,001-5,000 employees" → { min: 1001, max: 5000 }
const plusMatch = sizeText.match(/([\d,]+)\+/);
if (plusMatch) {
return { min: parseInt(plusMatch[1].replace(/,/g, '')), max: null };
}
const rangeMatch = sizeText.match(/([\d,]+)\s*-\s*([\d,]+)/);
if (rangeMatch) {
return {
min: parseInt(rangeMatch[1].replace(/,/g, '')),
max: parseInt(rangeMatch[2].replace(/,/g, '')),
};
}
return null;
}
Extracting Employee Distribution Data
LinkedIn's company pages show employee distribution by department. This data is valuable for understanding organizational structure:
async function extractEmployeeInsights($) {
const insights = {
byFunction: [],
growth: null,
totalOnLinkedIn: null,
};
// Employee count on LinkedIn
const countText = $('[data-test-id="about-us__employees-on-linkedin"]').text();
const countMatch = countText.match(/([\d,]+)/);
insights.totalOnLinkedIn = countMatch
? parseInt(countMatch[1].replace(/,/g, ''))
: null;
// Department breakdown
$('[data-test-id="employee-distribution"] li').each((_, li) => {
const functionName = $(li).find('.function-name').text().trim();
const percentage = $(li).find('.function-percentage').text().trim();
const count = $(li).find('.function-count').text().trim();
if (functionName) {
insights.byFunction.push({
department: functionName,
percentage: parseFloat(percentage) || null,
estimatedCount: parseInt(count.replace(/[^\d]/g, '')) || null,
});
}
});
return insights;
}
This gives you data like:
{
"byFunction": [
{ "department": "Engineering", "percentage": 35, "estimatedCount": 4200 },
{ "department": "Sales", "percentage": 18, "estimatedCount": 2160 },
{ "department": "Marketing", "percentage": 12, "estimatedCount": 1440 },
{ "department": "Operations", "percentage": 10, "estimatedCount": 1200 },
{ "department": "Human Resources", "percentage": 8, "estimatedCount": 960 }
],
"totalOnLinkedIn": 12000
}
Extracting Recent Company Posts
Company posts reveal marketing strategy, product launches, and company culture:
async function extractCompanyPosts($) {
const posts = [];
$('div[data-test-id="update-card"]').each((_, card) => {
const post = {
text: $(card).find('.update-text').text().trim(),
timestamp: $(card).find('time').attr('datetime') ||
$(card).find('.update-date').text().trim(),
likes: parseEngagement($(card).find('.likes-count').text()),
comments: parseEngagement($(card).find('.comments-count').text()),
shares: parseEngagement($(card).find('.shares-count').text()),
hasImage: $(card).find('img.update-image').length > 0,
hasVideo: $(card).find('video').length > 0,
hasArticle: $(card).find('.article-card').length > 0,
};
// Calculate engagement rate
const totalEngagement = (post.likes || 0) +
(post.comments || 0) +
(post.shares || 0);
post.totalEngagement = totalEngagement;
posts.push(post);
});
return posts;
}
function parseEngagement(text) {
if (!text) return 0;
text = text.trim().toLowerCase();
// Handle "1.2K", "3.5M" formats
const multipliers = { k: 1000, m: 1000000 };
const match = text.match(/([\d.]+)\s*([km])?/);
if (match) {
const num = parseFloat(match[1]);
const mult = multipliers[match[2]] || 1;
return Math.round(num * mult);
}
return parseInt(text.replace(/[^\d]/g, '')) || 0;
}
Using Apify for Production LinkedIn Scraping
LinkedIn is one of the most challenging sites to scrape at scale. Here's why Apify is the right tool:
Challenge 1: Aggressive Rate Limiting
LinkedIn will block your IP after just a few dozen requests. Apify's residential proxy pool rotates IPs automatically:
import { Actor } from 'apify';
import { CheerioCrawler } from 'crawlee';
await Actor.init();
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'US',
});
const crawler = new CheerioCrawler({
proxyConfiguration,
maxConcurrency: 2,
maxRequestRetries: 5,
sessionPoolOptions: {
maxPoolSize: 20,
sessionOptions: {
maxUsageCount: 5, // Retire session after 5 uses
},
},
});
Challenge 2: Authentication Walls
Some LinkedIn data requires being logged in. For public company data, you can often get what you need from the unauthenticated view by targeting specific endpoints and structured data.
Ready-Made Solution on Apify Store
If you don't want to build from scratch, there are ready-made LinkedIn company scrapers on the Apify Store. For example, you can find scrapers that accept a list of company URLs or search queries and return structured company profile data — no coding required.
These actors handle proxy rotation, anti-bot detection, and data normalization out of the box. You just provide the input (company names or URLs) and get clean JSON output.
Building a Complete Company Intelligence Pipeline
Here's how to combine everything into a useful data pipeline:
import { Actor } from 'apify';
import { CheerioCrawler, Dataset } from 'crawlee';
await Actor.init();
const input = await Actor.getInput();
const {
companyUrls = [],
companyNames = [],
includeEmployeeInsights = true,
includePosts = true,
maxPostsPerCompany = 10,
} = input;
// Build URLs from company names
const startUrls = [
...companyUrls.map(url => ({
url: url.endsWith('/about/') ? url : `${url.replace(/\/$/, '')}/about/`,
userData: { label: 'ABOUT' },
})),
...companyNames.map(name => ({
url: `https://www.linkedin.com/company/${name.toLowerCase().replace(/\s+/g, '-')}/about/`,
userData: { label: 'ABOUT', originalName: name },
})),
];
const proxyConfig = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
});
const crawler = new CheerioCrawler({
proxyConfiguration: proxyConfig,
maxConcurrency: 2,
async requestHandler({ request, $, body, enqueueLinks }) {
const { label } = request.userData;
if (label === 'ABOUT') {
const companyData = await extractCompanyData($, body, request.url);
// Enqueue sub-pages if requested
const baseUrl = request.url.replace(/\/about\/?$/, '');
if (includeEmployeeInsights) {
await enqueueLinks({
urls: [`${baseUrl}/people/`],
userData: {
label: 'PEOPLE',
companyName: companyData.name
},
});
}
if (includePosts) {
await enqueueLinks({
urls: [`${baseUrl}/posts/`],
userData: {
label: 'POSTS',
companyName: companyData.name
},
});
}
await Dataset.pushData({
...companyData,
dataType: 'company_profile',
});
}
if (label === 'PEOPLE') {
const insights = await extractEmployeeInsights($);
await Dataset.pushData({
companyName: request.userData.companyName,
...insights,
dataType: 'employee_insights',
});
}
if (label === 'POSTS') {
const posts = await extractCompanyPosts($);
await Dataset.pushData({
companyName: request.userData.companyName,
posts: posts.slice(0, maxPostsPerCompany),
dataType: 'company_posts',
});
}
},
});
await crawler.run(startUrls);
await Actor.exit();
Practical Use Cases for LinkedIn Company Data
1. Sales Prospecting and Lead Generation
Filter companies by industry, size, and location to build targeted prospect lists:
function qualifyLead(company) {
const score = {
total: 0,
factors: [],
};
// Company size scoring
if (company.employeeCount?.min >= 50 && company.employeeCount?.max <= 500) {
score.total += 30;
score.factors.push('Ideal company size (50-500)');
}
// Industry fit
const targetIndustries = ['Technology', 'Software', 'SaaS', 'Information Technology'];
if (targetIndustries.some(ind => company.industry?.includes(ind))) {
score.total += 25;
score.factors.push('Target industry match');
}
// Growth signals
if (company.recentJobCount > 10) {
score.total += 20;
score.factors.push('Actively hiring (growth signal)');
}
// Engagement level
if (company.followers > 5000) {
score.total += 15;
score.factors.push('Strong LinkedIn presence');
}
// Has website (can research further)
if (company.website) {
score.total += 10;
score.factors.push('Website available for research');
}
return { ...company, leadScore: score.total, scoringFactors: score.factors };
}
2. Competitive Intelligence Dashboard
Track competitors' growth, hiring patterns, and content strategy:
async function buildCompetitorReport(companies) {
return companies.map(company => ({
name: company.name,
// Growth metrics
employeeCount: company.totalOnLinkedIn,
hiringIntensity: company.activeJobCount,
topHiringDepartments: company.byFunction
?.sort((a, b) => (b.percentage || 0) - (a.percentage || 0))
.slice(0, 3)
.map(f => f.department),
// Content metrics
postFrequency: calculatePostFrequency(company.posts),
avgEngagement: calculateAvgEngagement(company.posts),
topContentThemes: extractThemes(company.posts),
// Company details
industry: company.industry,
size: company.companySizeRange,
specialties: company.specialties,
}));
}
function calculatePostFrequency(posts) {
if (!posts || posts.length < 2) return null;
const dates = posts
.map(p => new Date(p.timestamp))
.filter(d => !isNaN(d))
.sort((a, b) => b - a);
if (dates.length < 2) return null;
const daySpan = (dates[0] - dates[dates.length - 1]) / (1000 * 60 * 60 * 24);
return {
postsPerWeek: Math.round((posts.length / daySpan) * 7 * 10) / 10,
period: `${dates.length} posts over ${Math.round(daySpan)} days`,
};
}
3. Market Research and Industry Analysis
Aggregate company data across an industry to spot trends:
function analyzeIndustry(companies) {
const stats = {
totalCompanies: companies.length,
// Size distribution
sizeDistribution: {
startup: companies.filter(c => (c.employeeCount?.max || 0) <= 50).length,
smb: companies.filter(c => {
const min = c.employeeCount?.min || 0;
const max = c.employeeCount?.max || 0;
return min >= 51 && max <= 500;
}).length,
enterprise: companies.filter(c => (c.employeeCount?.min || 0) > 500).length,
},
// Common specialties
topSpecialties: getTopItems(
companies.flatMap(c => c.specialties || []),
20
),
// Geographic distribution
topLocations: getTopItems(
companies.map(c => c.headquartersText).filter(Boolean),
10
),
// Hiring intensity
averageJobCount: average(
companies.map(c => c.activeJobCount).filter(n => n != null)
),
};
return stats;
}
function getTopItems(items, limit = 10) {
const counts = {};
items.forEach(item => {
const normalized = item.toLowerCase().trim();
counts[normalized] = (counts[normalized] || 0) + 1;
});
return Object.entries(counts)
.sort(([, a], [, b]) => b - a)
.slice(0, limit)
.map(([item, count]) => ({ item, count }));
}
Handling Edge Cases and Data Quality
LinkedIn company data isn't always clean. Here's how to handle common issues:
function cleanCompanyData(raw) {
const cleaned = { ...raw };
// Normalize company names (remove "| LinkedIn" suffix)
if (cleaned.name) {
cleaned.name = cleaned.name.replace(/\s*\|\s*LinkedIn\s*$/i, '').trim();
}
// Validate URLs
if (cleaned.website) {
try {
new URL(cleaned.website.startsWith('http')
? cleaned.website
: `https://${cleaned.website}`);
} catch {
cleaned.website = null;
cleaned.websiteInvalid = raw.website;
}
}
// Normalize industry names
const industryAliases = {
'information technology & services': 'Information Technology',
'computer software': 'Software Development',
'internet': 'Technology',
};
if (cleaned.industry) {
cleaned.industry = industryAliases[cleaned.industry.toLowerCase()]
|| cleaned.industry;
}
// Flag data completeness
const requiredFields = ['name', 'industry', 'employeeCount', 'headquarters'];
const presentFields = requiredFields.filter(f => cleaned[f] != null);
cleaned.dataCompleteness = Math.round(
(presentFields.length / requiredFields.length) * 100
);
return cleaned;
}
Legal and Ethical Considerations
LinkedIn scraping exists in a complex legal landscape:
- hiQ v. LinkedIn (2022): The Ninth Circuit ruled that scraping publicly available LinkedIn profiles doesn't violate the CFAA. However, this ruling is narrow and jurisdiction-specific.
- LinkedIn's ToS: Explicitly prohibits scraping. While ToS violations aren't criminal, they can result in account restrictions or civil action.
- GDPR/CCPA: Employee data may be considered personal data under privacy regulations. Ensure your use case has a legitimate basis.
-
Best practices:
- Only scrape publicly available data (no login required)
- Respect
robots.txtdirectives - Implement rate limiting to avoid impacting LinkedIn's servers
- Don't store personal employee data without a legitimate purpose
- Provide opt-out mechanisms if you're building a public tool
Output and Integration
Once you've collected company data, here's how to make it useful:
// Export enriched company profiles
const enrichedCompanies = companies.map(company => ({
// Core identity
name: company.name,
linkedinUrl: company.url,
website: company.website,
// Classification
industry: company.industry,
companyType: company.companyType,
specialties: company.specialties,
// Size and growth
employeeRange: company.companySizeRange,
linkedinEmployees: company.totalOnLinkedIn,
activeJobs: company.activeJobCount,
isHiring: (company.activeJobCount || 0) > 0,
// Engagement
followers: company.followers,
recentPostCount: company.posts?.length || 0,
avgPostEngagement: company.posts?.length
? Math.round(company.posts.reduce((sum, p) => sum + p.totalEngagement, 0) / company.posts.length)
: null,
// Metadata
dataCompleteness: company.dataCompleteness,
scrapedAt: company.scrapedAt,
}));
// Access via Apify API
const datasetUrl = `https://api.apify.com/v2/datasets/${datasetId}/items`;
// Formats: ?format=json, ?format=csv, ?format=xlsx
Conclusion
LinkedIn company scraping is one of the most valuable data extraction tasks in the B2B space. The combination of company profiles, employee insights, and activity data gives you a comprehensive view of any business — useful for sales prospecting, competitive intelligence, market research, and investment analysis.
The key challenges are LinkedIn's aggressive anti-scraping measures and the legal nuances around professional data. Using Apify's infrastructure with residential proxies and session management solves the technical challenges, while focusing on publicly available company-level data (not individual profiles) keeps you on the right side of ethics.
Check out the Apify Store for ready-to-use LinkedIn company scrapers, or build your own using the patterns in this guide. For job-specific LinkedIn data, scrapers like the LinkedIn Jobs Scraper can extract public job listings with structured salary and location data.
The professional data landscape is evolving rapidly — LinkedIn continues to tighten access while the demand for business intelligence data only grows. Building a robust, ethical scraping pipeline now positions you well for the future.
Questions about LinkedIn company scraping? Share your use case in the comments — I'd love to hear what you're building with this data.
Top comments (0)