I Tried Scraping LinkedIn Posts with Puppeteer and It Fought Back
I thought this would take 20 minutes.
Open Puppeteer.
Grab some text.
Save it to JSON. Done.
Instead, LinkedIn reminded me that modern web apps are not just pages.
They are systems.
They react.
They delay.
They break your assumptions.
And suddenly, a simple script turns into a late night debugging session.
This is what actually worked.
๐ Table of Contents
- ๐ Setting up Puppeteer
- ๐ Launching a real browser
- ๐ Logging into LinkedIn
- โณ Handling the security check
- ๐ Providing the post links
- ๐ Visiting each post
- ๐ Scrolling to load content
- ๐ผ๏ธ Waiting for images
- ๐ง Extracting content and images
- ๐ฆ Storing the data
- ๐พ Saving it to a file
- ๐งฉ Final thoughts
- ๐งพ Full working script
๐ Setting up Puppeteer
We start simple.
const puppeteer = require('puppeteer');
const fs = require('fs');
Puppeteer controls the browser
fs stores the data
Clean. Minimal. Enough to begin.
Then we wrap everything so we can use async and await properly.
(async () => {
๐ Launching a real browser
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
I kept headless mode off on purpose.
Because LinkedIn is very sensitive to automation.
Headless browsers get flagged faster.
A visible browser behaves more like a real user.
And when you are debugging
visibility is everything.
๐ Logging into LinkedIn
await page.goto('https://www.linkedin.com/login', {
waitUntil: 'networkidle2',
timeout: 60000
});
await page.type('#username', 'xxxx');
await page.type('#password', 'xxxxxx');
await page.click('button[type="submit"]');
await page.waitForNavigation({ timeout: 60000 });
At this point, everything felt done.
I thought I was finished.
I was not.
LinkedIn often throws:
- security checks
- captchas
- verification screens
So even after login, we pause.
โณ Handling the security check
console.log('Please complete the security check...');
await new Promise(resolve => setTimeout(resolve, 12000));
No hacks. No bypass.
Just wait.
Sometimes the most reliable solution
is the least clever one.
๐ Providing the post links
const postLinks = [ ... ];
Instead of crawling everything, I used a fixed list of URLs.
Why?
Predictable input gives predictable output.
LinkedIn infinite scroll can get messy very quickly.
๐ Visiting each post
for (const link of postLinks) {
await page.goto(link, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
Notice I used domcontentloaded.
LinkedIn keeps making background requests forever.
Waiting for network idle can hang your script.
This small change
saved a lot of frustration.
๐ Scrolling to load content
This is where things started breaking.
await page.evaluate(async () => {
let scrollHeight = document.body.scrollHeight;
let scrollInterval = setInterval(() => {
window.scrollBy(0, 1000);
if (document.body.scrollHeight > scrollHeight) {
scrollHeight = document.body.scrollHeight;
}
if (document.body.scrollHeight === scrollHeight) {
clearInterval(scrollInterval);
}
}, 300);
});
LinkedIn loads content lazily.
No scroll.
No content.
So we simulate a real user:
Scroll
Pause
Repeat
Until nothing new loads.
๐ผ๏ธ Waiting for images
await page.waitForSelector('img[src*="media.licdn.com"]', {
timeout: 15000
});
Without this step, your data will be incomplete.
This ensures images are fully loaded before extraction.
๐ง Extracting content and images
const postData = await page.evaluate(() => {
const contentElement =
document.querySelector('[data-test-post-container] .break-words') ||
document.querySelector('.feed-shared-update-v2__description') ||
document.querySelector('.feed-shared-text__text-view') ||
document.querySelector('span.break-words');
const images = [];
const imageElements = document.querySelectorAll('img[src*="media.licdn.com/"]');
if (imageElements.length > 1) {
images.push(imageElements[1].src);
}
return {
content: contentElement
? contentElement.innerText.trim()
: 'No content found',
images: images.length > 0 ? images : ['No post image found']
};
});
This is real world scraping.
There is no single selector that always works.
So we try multiple fallbacks until something works.
Also:
- first image โ profile
- second image โ actual post
Not perfect.
But practical.
๐ฆ Storing the data
scrapedPosts.push({
link,
content: postData.content,
images: postData.images
});
We collect everything step by step.
Simple. Clean. Reliable.
๐พ Saving it to a file
fs.writeFileSync(
'linkedInScrapedPosts.json',
JSON.stringify(scrapedPosts, null, 2)
);
Now everything is structured and ready to use.
๐งฉ Final thoughts
This script is not perfect.
But it is real.
It reflects how scraping actually works:
You try something
It breaks
You tweak
You retry
And slowly, it starts working.
The real lesson?
Web pages are not static
They are systems reacting to user behavior
Once you understand that, everything changes.
โ ๏ธ A small note
Always respect platform rules and use scraping responsibly.
๐งพ Full working script
If you just want it to work, copy this and run it.
const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.linkedin.com/login', {
waitUntil: 'networkidle2',
timeout: 60000
});
await page.type('#username', 'YOUR_EMAIL');
await page.type('#password', 'YOUR_PASSWORD');
await page.click('button[type="submit"]');
await page.waitForNavigation({ timeout: 60000 });
console.log('Complete the security check if prompted...');
await new Promise(resolve => setTimeout(resolve, 12000));
const postLinks = [
// add your links here
];
const scrapedPosts = [];
for (const link of postLinks) {
await page.goto(link, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.evaluate(async () => {
let scrollHeight = document.body.scrollHeight;
let scrollInterval = setInterval(() => {
window.scrollBy(0, 1000);
if (document.body.scrollHeight > scrollHeight) {
scrollHeight = document.body.scrollHeight;
}
if (document.body.scrollHeight === scrollHeight) {
clearInterval(scrollInterval);
}
}, 300);
});
await page.waitForSelector('img[src*="media.licdn.com"]', {
timeout: 15000
});
const postData = await page.evaluate(() => {
const contentElement =
document.querySelector('[data-test-post-container] .break-words') ||
document.querySelector('.feed-shared-update-v2__description') ||
document.querySelector('.feed-shared-text__text-view') ||
document.querySelector('span.break-words');
const images = [];
const imageElements = document.querySelectorAll('img[src*="media.licdn.com/"]');
if (imageElements.length > 1) {
images.push(imageElements[1].src);
}
return {
content: contentElement
? contentElement.innerText.trim()
: 'No content found',
images: images.length > 0 ? images : ['No post image found']
};
});
scrapedPosts.push({
link,
content,
images
});
}
fs.writeFileSync(
'linkedInScrapedPosts.json',
JSON.stringify(scrapedPosts, null, 2)
);
console.log(scrapedPosts);
await browser.close();
})();
Top comments (0)