DEV Community

Optimizing and Deploying Puppeteer Web Scraper

Waqasabi on February 08, 2020

In this post, we'll look into how we can optimize and improve our puppeteer Web Scraping API. We'll also look into several puppeteer plugins to imp...
Collapse
 
apanjwani0 profile image
Aman Panjwani

Hii, I am making an instagram scraping tool.
Instagram divs weren't loading in headless:true mode, than I changed to puppeteer-extra and added stealth plugin. Everything worked fine on localhost, thanks to you.

But, unfortunately when deployed to heroku, the divs are not loading again, even page.waitForSelector shows timeout error.

PS-: 1) I've added the args: ['--no-sandbox']
2) I've also added github.com/jontewks/puppeteer-hero... buildpack in my heroku-app-settings.

Link to my project-: github.com/apanjwani0/Scrape-Insta...

Thanks in advance !

Collapse
 
olcay97 profile image
Olcay Gören

did you find any solution for that?

Collapse
 
apanjwani0 profile image
Aman Panjwani

No, I thought maybe dockerizing my project would solve the issue (that way we can also run headless:false), but never continued with the project.
Do let me know if it works for you, or you find any other solution.

Thread Thread
 
alireza4130 profile image
Alireza

await page.goto(BASE_URL, { waitUntil: "networkidle0" })

waitUntil: "networkidle0" is nessary for this issue and set the headless to new

Collapse
 
riittagirl profile image
Margarita

Great article!
Have you faced a problem with Heroku IP being blocked by the website scraped? If yes, how did you bypass it? Example: stackoverflow.com/questions/143289...