DEV Community

Web Scraping — Scrape data from your instagram page with Nodejs, Playwright and Firebase.

Divine Hycenth on June 24, 2020

An introduction to web scraping with playwright, nodejs and firebase. Prerequisites If you want to follow along this tutorial...

Read full post

Aleccc • Jul 1 '20 • Edited

Worth noting that the documentation has alternatives to hard-coding the wait times (page.waitForTimeout). Some commands like fill and click have auto waits built-in. Or you can explicitly wait for an object to appear in the DOM.

// Playwright waits for #search element to be in the DOM
await page.fill('#search', 'query');

// Wait for #search to appear in the DOM.
await page.waitForSelector('#search', { state: 'attached' });

https://playwright.dev/path=docs%2Fcore-concepts.md&q=auto-waiting#version=master

Divine Hycenth • Mar 5 '21

Hi Aleccc,

I've updated the article to use this approach as recommended in the docs. Thank you for pointing that out :)

Johnny Dev • Oct 8 '20

Hi Divine,
Just note that currently this approach only works on localhost with firebase serve, it would fail when you deploy it to the Cloud. In my observation, Firebase can't figure out where the binary browsers used for scraping are stored, therefore can't initialize the browsers. I am still finding a way to modify this behaviour. Do you have any ideas?

Divine Hycenth • Mar 5 '21

Hi Johnny,

I apologize for my late response.
What you said is true and I haven't figured out a way to make it run on firebase cloud. I will be glad to know if you've figured that out :).

Thank you for your patience.

amm297 • Jul 19 '22

Hi, any o you find a solution for this bug?

restyler • Nov 9 '21

Nice writetup! If someone decides to launch this script on datacenter, I would definitely recommend using some clean (preferably residential) proxies to avoid your accounts being flagged and save your cookies to re-use them later (this was actually mentioned here in comments).

I've recently published a simple tutorial on Instagram scraping and discovering micro-influencers via Node.js and MySQL.

How to scrape Instagram followers with Node.js, put results to MySQL, and discover micro-influencers

restyler ・ Oct 3 ・ 9 min read

#javascript #scraping #productivity

Good luck!

andyajhis • Jul 10 '20

Do u know, how to save login session after browser close and want to scraping again and again ?

Cicada1033➿ • Jul 8 '21

go to your project directory
using the terminal,run the command below,
npx playwright open --save-storage websitename.json

a browser will open,now navigate to the website and sign-in/solve captcha,
then close browser. You will notice that a file "websitename.json" has been created.

now in playwright,set you browser context using this code below

const context = await browser.newContext({
storageState: "websitename.json"
});

you are now automatically logged in. :)

Divine Hycenth • Jul 10 '20

I haven't tried that and maybe it's possible but i'm sure it's not going to work if you randomly spin up browsers using Playwright.
Let me know if you are able to do that :)