DEV Community

Cover image for Step-by-Step Guide to Scraping JavaScript-Rich Websites in Laravel with PuPHPeteer
Asfia Aiman
Asfia Aiman

Posted on

8

Step-by-Step Guide to Scraping JavaScript-Rich Websites in Laravel with PuPHPeteer

Web scraping can be particularly challenging for JavaScript-heavy websites. Fortunately, PuPHPeteer, a PHP bridge for Puppeteer, can help. In this detailed tutorial, we'll walk through setting up a web scraper in Laravel using PuPHPeteer.

Prerequisites

Ensure you have the following installed:

  1. PHP 7.3+
  2. Node.js
  3. Composer
  4. Laravel 9+

Step 1: Set Up Laravel Project

First, create a new Laravel project or navigate to your existing project directory:

laravel new puphpeteer-scraper
cd puphpeteer-scraper
Enter fullscreen mode Exit fullscreen mode

Step 2: Install PuPHPeteer

Install PuPHPeteer via Composer and Puppeteer via npm:

composer require zoonru/puphpeteer
npm install github:zoonru/puphpeteer
Enter fullscreen mode Exit fullscreen mode

Step 3: Create a Scraper Command

Laravel Artisan commands are perfect for creating scrapers. Generate a new command:

php artisan make:command ScrapeWebsite
Enter fullscreen mode Exit fullscreen mode

Open the newly created command file at app/Console/Commands/ScrapeWebsite.php and update it:

<?php

namespace App\Console\Commands;

use Illuminate\Console\Command;
use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

class ScrapeWebsite extends Command
{
    protected $signature = 'scrape:website';
    protected $description = 'Scrape data from a JavaScript-heavy website';

    public function __construct()
    {
        parent::__construct();
    }

    public function handle()
    {
        $puppeteer = new Puppeteer;
        $browser = $puppeteer->launch();
        $page = $browser->newPage();

        $page->goto('https://example.com', ['waitUntil' => 'networkidle0']);

        $page->waitForSelector('#element-id');

        $data = $page->evaluate(JsFunction::createWithBody("
            const elements = document.querySelectorAll('.data-class');
            return Array.from(elements).map(element => element.innerText);
        "));

        print_r($data);

        $browser->close();
    }
}
Enter fullscreen mode Exit fullscreen mode

Explanation

Command Setup: The __construct() method sets up the command. The handle() method contains the scraping logic.

Launching Puppeteer: Puppeteer is instantiated, and a browser instance is launched.

Navigating to the Website: The goto method loads the specified URL and waits until the network is idle.

Waiting for Elements: waitForSelector ensures that JavaScript-generated content is loaded.

Extracting Data: evaluate executes JavaScript in the browser context to extract the desired data.

Closing the Browser: close method closes the browser instance.

Step 4: Run the Scraper Command

Run the scraper command using Artisan:

php artisan scrape:website
Enter fullscreen mode Exit fullscreen mode

This command will navigate to the specified website, wait for JavaScript to load, extract the data, and print it.

Additional Tips

Error Handling: Add error handling to manage navigation failures or element selection issues.

Dynamic Interaction: You can add more interaction with the page, like clicking buttons or filling forms, before extracting data.

Conclusion

PuPHPeteer makes it easy to scrape JavaScript-heavy websites using PHP within a Laravel framework. By following the steps outlined above, you can set up a robust web scraper that handles JavaScript-rendered content efficiently.

Happy scraping!

For more information, visit the PuPHPeteer GitHub page.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay