DEV Community

kathir b
kathir b

Posted on

Mastering Web Scraping with Puppeteer

Mastering Web Scraping with Puppeteer

Introduction

Web scraping is a powerful technique for extracting data from websites. Puppeteer, a Node.js library, provides an easy-to-use API to automate browser tasks.

Step 1: Install Puppeteer

Run the following command:

npm install puppeteer
Enter fullscreen mode Exit fullscreen mode

Step 2: Create a Scraper Script

Use Puppeteer to navigate a webpage and extract data:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();
    await page.goto('https://example.com');

    const data = await page.evaluate(() => {
        return document.querySelector('h1').innerText;
    });

    console.log("Extracted Data:", data);

    await browser.close();
})();
Enter fullscreen mode Exit fullscreen mode

Step 3: Handling Dynamic Content

If the content loads dynamically, use Puppeteer's wait functions:

await page.waitForSelector('.dynamic-content');
Enter fullscreen mode Exit fullscreen mode

Step 4: Optimizing Scraping

  • Use page.setUserAgent to mimic real browsers.
  • Avoid detection by rotating headers and proxies.

Conclusion

Puppeteer is a powerful tool for web scraping, automation, and testing. Experiment with different techniques and optimize based on your needs.

Top comments (1)

Collapse
 
subhashini_sathyanarayana profile image
subhashini sathyanarayanan

very effective