LinkedIn has over 67 million company pages. Every B2B sales team, investor, and recruiter needs company data from LinkedIn. And yet getting that data programmatically is genuinely difficult — not because the data is hidden, but because LinkedIn has built one of the most aggressive anti-scraping systems on the web, and their official API is priced for enterprise budgets only.
This post covers what data is actually available on LinkedIn company pages, why it's hard to get at scale, who needs it and why, and how to run our actor to extract it without building or maintaining any scraping infrastructure.
Why LinkedIn company data is hard to get
The official API is not a real option for most teams. LinkedIn's Marketing Developer Platform costs $15,000+/year and requires a partner application process. The data endpoints available through official channels are primarily designed for ad targeting and HR software integrations — not bulk company research. For a solo founder or a small data team doing ICP analysis or competitive research, the API is effectively unavailable.
The anti-scraping stack is serious. LinkedIn runs browser fingerprinting, behavioral analysis, IP reputation scoring, and bot challenge pages. A naive Python requests script gets blocked within minutes. Even headless browsers get flagged quickly without significant investment in evasion infrastructure. High-volume extraction requires residential proxies — which add meaningful cost — and constant maintenance as LinkedIn updates its detection methods.
Terms of service add legal ambiguity. LinkedIn's ToS restricts automated data collection. The hiQ Labs v. LinkedIn ruling (affirmed by the Ninth Circuit) established that scraping publicly available data is not a Computer Fraud and Abuse Act violation, but companies still need to assess their own risk tolerance. The data on public company pages — the kind visible to any logged-out visitor — sits in the clearest legal territory.
The result: most teams either pay for expensive data vendors (ZoomInfo, Clearbit), build fragile in-house scrapers that need constant maintenance, or just do it manually. None of these scale.
Who actually needs this data
B2B sales and ICP research. Building an ideal customer profile requires enriching company lists with industry, headcount, HQ location, and founding year. Teams doing outbound at scale need to filter thousands of companies down to the 200 that actually fit their ICP. LinkedIn company pages are the canonical source for this data — more accurate and more current than most third-party databases.
Investor due diligence. Before a call, investors verify headcount growth signals (employee count vs. last quarter), check the company description for pivot signals, and confirm website and contact details. LinkedIn is the ground truth that other sources pull from. Automating this enrichment across a deal pipeline saves hours per week.
Competitive landscape analysis. Mapping a competitive landscape means collecting industry, size, HQ, founding year, and specialties for 20-100 companies. Doing this manually in a spreadsheet is an afternoon of copy-paste. Automated extraction turns it into a 5-minute job.
Recruitment targeting. Identifying companies in a specific industry, headcount band, and city before sourcing candidates from those companies is a standard recruiting workflow. LinkedIn company data is the filter layer.
Market research and data products. Research teams building industry reports, data enrichment services, or market intelligence products need bulk company data as a raw material. The same fields that power sales enrichment also power competitive benchmarking tools and market maps.
What data you actually get
Our actor extracts the following fields from public LinkedIn company pages — no login required:
- name — official company name as listed on LinkedIn
- industry — LinkedIn industry classification (e.g., "Software Development", "Financial Services")
- employee_count — headcount range (e.g., "501-1000", "10001+")
- follower_count — LinkedIn follower count
- headquarters — city, state/region, country
- founded_year — year the company was founded
- website — official company website URL
- company_url — canonical LinkedIn company page URL
- description — full company description text
- tagline — short company tagline
- specialties — list of self-reported specialty areas
- logo_url — URL to the company logo image
- scraped_at — timestamp of extraction
How to run the actor
Via Apify Console (no code needed):
- Go to apify.com/cryptosignals/linkedin-company-scraper
- Click Try for free
- Paste your company list into the
companiesfield — accepts LinkedIn slugs (e.g.,stripe) or full URLs - Set
max_resultsif you want to cap the run - Click Start and download results as JSON or CSV
Input JSON:
{
"companies": [
"stripe",
"https://www.linkedin.com/company/shopify",
"notion"
],
"max_results": 50
}
Via Apify API:
curl -X POST "https://api.apify.com/v2/acts/cryptosignals~linkedin-company-scraper/runs" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-d '{
"companies": ["stripe", "shopify"],
"max_results": 10
}'
Sample output record:
{
"company_id": "stripe",
"name": "Stripe",
"tagline": "Financial infrastructure for the internet",
"description": "Stripe is a financial infrastructure platform for businesses...",
"industry": "Software Development",
"employee_count": "5001-10000",
"follower_count": "1240000",
"headquarters": "San Francisco, California, US",
"founded_year": 2010,
"website": "https://stripe.com",
"specialties": ["Payments", "Financial Infrastructure", "Developer Tools"],
"logo_url": "https://media.licdn.com/dms/image/...",
"company_url": "https://www.linkedin.com/company/stripe",
"scraped_at": "2026-05-04T09:00:00+00:00"
}
Pricing
The actor uses pay-per-event pricing: $0.008 per company starting May 17, 2026. The first 5 results are free so you can verify output quality before committing. For a list of 1,000 companies, that's $8.
For high-volume runs (10,000+ companies), residential proxy coverage becomes important for reliability. Oxylabs is the proxy infrastructure we've tested and trust for this kind of workload — their residential network handles LinkedIn's IP reputation checks without constant rotation failures that plague datacenter proxies.
What you don't get
Company pages don't include employee email addresses, phone numbers, or individual employee profiles. For contact-level data, you need a separate enrichment step. The actor extracts company-level public metadata — the data visible to any unauthenticated visitor on a public company page.
LinkedIn also rate-limits aggressively on certain pages. The actor handles this, but very large runs (5,000+ companies) benefit from Bright Data's residential network as a proxy layer to maintain throughput.
The alternative
You can build this yourself. The engineering work involves: handling LinkedIn's anti-bot detection, managing proxy rotation, parsing the structured data out of the page (LinkedIn embeds JSON-LD in company pages), dealing with partial responses and retry logic, and maintaining the scraper when LinkedIn changes its page structure — which happens several times per year.
That's 2-4 weeks of engineering time to build, and ongoing maintenance after that. At $0.008 per company, you'd need to scrape over 1 million companies before the build-vs-buy math favors building.
For most teams, the answer is clear.
Actor: apify.com/cryptosignals/linkedin-company-scraper
By: Web Data Labs — data infrastructure for B2B teams.
Top comments (0)