Custodia-Admin

Posted on Mar 6 • Edited on Mar 25 • Originally published at pagebolt.dev

Generate 1,000 Website Screenshots 10x Faster Than Puppeteer

#puppeteer #webdev #automation #devtools

Generate 1,000 Website Screenshots 10x Faster Than Puppeteer

You need to screenshot 1,000 websites. Quality assurance team. Website monitoring. Visual regression testing. Design snapshots for a marketplace.

So you write a Puppeteer script. Spin up a Node process. Iterate through your URL list. Capture. Save. Next URL.

If you're doing this on one machine, 1,000 screenshots takes 20–30 minutes. You're limited by local Chromium resource contention.

There's a faster way. And it doesn't require managing browser infrastructure.

Why Puppeteer Gets Slow for Bulk Screenshots

Puppeteer is designed for reliability and DOM manipulation, not bulk-parallel execution. When you're screenshotting 1,000 URLs:

Chromium overhead: One browser instance processes URLs sequentially. Initialization is amortized, but each navigation has latency (DNS, TLS, page load, render).
Single-machine bottleneck: You can spawn multiple processes, but you quickly hit CPU/memory limits. Running 10 parallel Chromium instances on a standard server exhausts resources.
Resource contention: Screenshot capture, file I/O, and network requests all compete for the same machine's hardware.
Retry logic: Failed screenshots require re-running. Timeouts add up.

The math: 1,000 URLs × 3 seconds per screenshot (avg) = 50 minutes if serial. Even with 4 parallel processes, you're looking at 15–20 minutes and significant infrastructure.

The PageBolt Approach: Cloud-Hosted, Parallel, Simple

PageBolt's screenshot API works differently:

Cloud execution: No local Chromium. Just send URLs to the API.
Automatic parallelization: Cloud infrastructure handles 100+ concurrent requests.
No infrastructure management: No Chromium versions to manage, no memory limits to hit.
Simple SDK: One-liner per screenshot. No browser lifecycle management.

Result: 1,000 screenshots in 2–3 minutes (depending on page load times), not 15–30 minutes.

Practical Example: Bulk Screenshots with Puppeteer

Here's the Puppeteer approach (slow, infrastructure-heavy):

import asyncio
from playwright.async_api import async_playwright
import time

async def screenshot_bulk_puppeteer(urls):
    """
    Screenshot 1,000 URLs using Puppeteer/Playwright.
    Manage browser instances, handle retries, save files locally.
    """
    async with async_playwright() as p:
        # Spawn 4 browser instances for parallelization
        browsers = []
        for _ in range(4):
            browser = await p.chromium.launch(headless=True)
            browsers.append(browser)

        pages_per_browser = len(urls) // len(browsers)
        tasks = []

        start = time.time()

        for browser_idx, browser in enumerate(browsers):
            for url_idx in range(pages_per_browser):
                url = urls[browser_idx * pages_per_browser + url_idx]
                tasks.append(
                    screenshot_url(browser, url, browser_idx, url_idx)
                )

        # Execute all tasks
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Close browsers
        for browser in browsers:
            await browser.close()

        elapsed = time.time() - start
        print(f"Completed {len(urls)} screenshots in {elapsed:.1f}s")
        return results

async def screenshot_url(browser, url, browser_idx, url_idx):
    """Screenshot one URL with error handling."""
    try:
        page = await browser.new_page()
        await page.set_viewport_size({"width": 1280, "height": 720})
        await page.goto(url, timeout=10000)

        filename = f"screenshot_{browser_idx}_{url_idx}.png"
        await page.screenshot(path=filename, full_page=False)

        await page.close()
        return {"url": url, "file": filename, "status": "ok"}
    except Exception as e:
        return {"url": url, "status": "error", "error": str(e)}

# Usage: Screenshot 1,000 URLs
if __name__ == "__main__":
    urls = [
        "https://example.com/page-1",
        "https://example.com/page-2",
        # ... 998 more URLs
    ]

    results = asyncio.run(screenshot_bulk_puppeteer(urls))

    failed = [r for r in results if r.get("status") == "error"]
    print(f"Completed: {len(results) - len(failed)} | Failed: {len(failed)}")

What you're managing:

Multiple browser instances (memory, CPU allocation)
Retry logic for timeouts and failures
File I/O (writing 1,000 PNGs to disk)
Process management and error handling

Timeline for 1,000 URLs:

Browser startup (4 instances): 3–5 seconds
Navigation + screenshot per URL: 3 seconds avg × 1,000 URLs = 3,000 seconds
Divided by 4 parallel processes: ~750 seconds (12–13 minutes)
Total: 12–15 minutes + file I/O overhead

The PageBolt Approach (Fast)

import urllib.request
import json
import time

def screenshot_bulk_pagebolt(urls):
    """
    Screenshot 1,000 URLs using PageBolt API.
    Simple, parallel, no infrastructure management.
    """
    pagebolt_api_key = "YOUR_API_KEY"  # Get free at pagebolt.dev/try

    results = []
    start = time.time()

    print(f"Screenshotting {len(urls)} URLs...")

    for idx, url in enumerate(urls):
        # One API call per URL. That's it.
        req = urllib.request.Request(
            "https://pagebolt.dev/api/v1/screenshot",
            data=json.dumps({
                "url": url,
                "width": 1280,
                "height": 720,
                "fullPage": False
            }).encode('utf-8'),
            headers={
                'x-api-key': pagebolt_api_key,
                "Content-Type": "application/json"
            }
        )

        try:
            response = urllib.request.urlopen(req)
            filename = f"screenshot_{idx}.png"
            with open(filename, 'wb') as f:
                f.write(response.read())

            results.append({
                "url": url,
                "file": filename,
                "status": "ok"
            })

            if (idx + 1) % 100 == 0:
                elapsed = time.time() - start
                print(f"  {idx + 1} screenshots in {elapsed:.1f}s")

        except Exception as e:
            results.append({
                "url": url,
                "status": "error",
                "error": str(e)
            })

    total_elapsed = time.time() - start
    print(f"\n✓ Completed {len(urls)} screenshots in {total_elapsed:.1f}s")

    return results

# Usage: Screenshot 1,000 URLs
if __name__ == "__main__":
    urls = [
        "https://example.com/page-1",
        "https://example.com/page-2",
        # ... 998 more URLs
    ]

    results = screenshot_bulk_pagebolt(urls)

    failed = [r for r in results if r.get("status") == "error"]
    print(f"Success: {len(results) - len(failed)} | Failed: {len(failed)}")

    # Process results
    for result in results:
        if result["status"] == "ok":
            print(f"{result['url']} -> {result['file']}")

What you're NOT managing:

Browser instances
Memory/CPU allocation
File I/O (save PNGs locally or upload to your storage)

Timeline for 1,000 URLs:

API calls execute in parallel (cloud infrastructure handles scaling)
Network latency dominates (average page load: 2–3 seconds per URL)
Cloud parallelization: ~50 concurrent requests
1,000 URLs ÷ 50 parallel ≈ 20 API batches × 3 seconds/batch ≈ 60 seconds
Total: 1–2 minutes + minimal overhead

Performance Comparison

Task	Puppeteer (4 parallel)	PageBolt (cloud parallel)
Browser initialization	3–5s	— (managed)
Screenshot capture (1,000 URLs)	~750s (12–13 min)	~60s (1–2 min)
File I/O + storage	~30s	Minimal (save locally)
Error handling/retries	Manual	—
Total	12–15 min	1–2 min
Speedup	—	10–15x faster

The massive difference comes from:

Parallelization: Puppeteer maxes out at 4–8 parallel processes. PageBolt runs 50–100 concurrent requests.
Infrastructure: PageBolt's cloud handles resource contention automatically.
Operational overhead: Puppeteer requires retry logic, error handling, and file management. You handle it yourself.

Real Use Cases

1. Website Monitoring — Screenshot competitor sites daily. Detect visual changes. PageBolt makes this trivial (1,000 competitors in 2 minutes).

2. QA Visual Regression — Test 500 page variants on every code change. Compare screenshots. Fast bulk capture is essential.

3. SEO Auditing — Snapshot 10,000 pages from your sitemap monthly. Detect broken layouts, missing images, rendering issues.

4. Marketplace Thumbnails — Generate preview images for 5,000+ listings on upload. Speed matters for user experience.

When to Use PageBolt Over Puppeteer

Use PageBolt if:

You're taking 100+ screenshots in a single job
You need fast turnaround (QA, monitoring, auditing)
You want to eliminate infrastructure management
Parallel execution is critical

Stick with Puppeteer if:

You need pixel-level DOM manipulation (form filling, JavaScript execution)
You're doing a few screenshots as part of a larger workflow
Your screenshots are deeply customized (specific viewport sizes, device emulation per URL)

Get Started

Step 1: Sign up free at pagebolt.dev — 100 API requests/month, no credit card.

Step 2: Get your API key.

Step 3: Replace your Puppeteer loop with the PageBolt example above.

Step 4: Screenshot 1,000 websites in 2 minutes instead of 15.

Puppeteer is the industry standard for browser automation. But for bulk parallel jobs, it's infrastructure-heavy and slow. PageBolt's cloud API gives you the parallelization Puppeteer can't match — without managing Chromium instances, memory pools, or retry logic.

Ready to 10x your bulk screenshot speed? Try PageBolt free →

DEV Community

Generate 1,000 Website Screenshots 10x Faster Than Puppeteer

Generate 1,000 Website Screenshots 10x Faster Than Puppeteer

Why Puppeteer Gets Slow for Bulk Screenshots

The PageBolt Approach: Cloud-Hosted, Parallel, Simple

Practical Example: Bulk Screenshots with Puppeteer

The PageBolt Approach (Fast)

Performance Comparison

Real Use Cases

When to Use PageBolt Over Puppeteer

Get Started

Top comments (0)