Generate 1,000 Website Screenshots 10x Faster Than Puppeteer
You need to screenshot 1,000 websites. Quality assurance team. Website monitoring. Visual regression testing. Design snapshots for a marketplace.
So you write a Puppeteer script. Spin up a Node process. Iterate through your URL list. Capture. Save. Next URL.
If you're doing this on one machine, 1,000 screenshots takes 20–30 minutes. You're limited by local Chromium resource contention.
There's a faster way. And it doesn't require managing browser infrastructure.
Why Puppeteer Gets Slow for Bulk Screenshots
Puppeteer is designed for reliability and DOM manipulation, not bulk-parallel execution. When you're screenshotting 1,000 URLs:
- Chromium overhead: One browser instance processes URLs sequentially. Initialization is amortized, but each navigation has latency (DNS, TLS, page load, render).
- Single-machine bottleneck: You can spawn multiple processes, but you quickly hit CPU/memory limits. Running 10 parallel Chromium instances on a standard server exhausts resources.
- Resource contention: Screenshot capture, file I/O, and network requests all compete for the same machine's hardware.
- Retry logic: Failed screenshots require re-running. Timeouts add up.
The math: 1,000 URLs × 3 seconds per screenshot (avg) = 50 minutes if serial. Even with 4 parallel processes, you're looking at 15–20 minutes and significant infrastructure.
The PageBolt Approach: Cloud-Hosted, Parallel, Simple
PageBolt's screenshot API works differently:
- Cloud execution: No local Chromium. Just send URLs to the API.
- Automatic parallelization: Cloud infrastructure handles 100+ concurrent requests.
- No infrastructure management: No Chromium versions to manage, no memory limits to hit.
- Simple SDK: One-liner per screenshot. No browser lifecycle management.
Result: 1,000 screenshots in 2–3 minutes (depending on page load times), not 15–30 minutes.
Practical Example: Bulk Screenshots with Puppeteer
Here's the Puppeteer approach (slow, infrastructure-heavy):
import asyncio
from playwright.async_api import async_playwright
import time
async def screenshot_bulk_puppeteer(urls):
"""
Screenshot 1,000 URLs using Puppeteer/Playwright.
Manage browser instances, handle retries, save files locally.
"""
async with async_playwright() as p:
# Spawn 4 browser instances for parallelization
browsers = []
for _ in range(4):
browser = await p.chromium.launch(headless=True)
browsers.append(browser)
pages_per_browser = len(urls) // len(browsers)
tasks = []
start = time.time()
for browser_idx, browser in enumerate(browsers):
for url_idx in range(pages_per_browser):
url = urls[browser_idx * pages_per_browser + url_idx]
tasks.append(
screenshot_url(browser, url, browser_idx, url_idx)
)
# Execute all tasks
results = await asyncio.gather(*tasks, return_exceptions=True)
# Close browsers
for browser in browsers:
await browser.close()
elapsed = time.time() - start
print(f"Completed {len(urls)} screenshots in {elapsed:.1f}s")
return results
async def screenshot_url(browser, url, browser_idx, url_idx):
"""Screenshot one URL with error handling."""
try:
page = await browser.new_page()
await page.set_viewport_size({"width": 1280, "height": 720})
await page.goto(url, timeout=10000)
filename = f"screenshot_{browser_idx}_{url_idx}.png"
await page.screenshot(path=filename, full_page=False)
await page.close()
return {"url": url, "file": filename, "status": "ok"}
except Exception as e:
return {"url": url, "status": "error", "error": str(e)}
# Usage: Screenshot 1,000 URLs
if __name__ == "__main__":
urls = [
"https://example.com/page-1",
"https://example.com/page-2",
# ... 998 more URLs
]
results = asyncio.run(screenshot_bulk_puppeteer(urls))
failed = [r for r in results if r.get("status") == "error"]
print(f"Completed: {len(results) - len(failed)} | Failed: {len(failed)}")
What you're managing:
- Multiple browser instances (memory, CPU allocation)
- Retry logic for timeouts and failures
- File I/O (writing 1,000 PNGs to disk)
- Process management and error handling
Timeline for 1,000 URLs:
- Browser startup (4 instances): 3–5 seconds
- Navigation + screenshot per URL: 3 seconds avg × 1,000 URLs = 3,000 seconds
- Divided by 4 parallel processes: ~750 seconds (12–13 minutes)
- Total: 12–15 minutes + file I/O overhead
The PageBolt Approach (Fast)
import urllib.request
import json
import time
def screenshot_bulk_pagebolt(urls):
"""
Screenshot 1,000 URLs using PageBolt API.
Simple, parallel, no infrastructure management.
"""
pagebolt_api_key = "YOUR_API_KEY" # Get free at pagebolt.dev/try
results = []
start = time.time()
print(f"Screenshotting {len(urls)} URLs...")
for idx, url in enumerate(urls):
# One API call per URL. That's it.
req = urllib.request.Request(
"https://pagebolt.dev/api/v1/screenshot",
data=json.dumps({
"url": url,
"width": 1280,
"height": 720,
"fullPage": False
}).encode('utf-8'),
headers={
"Authorization": f"Bearer {pagebolt_api_key}",
"Content-Type": "application/json"
}
)
try:
response = urllib.request.urlopen(req)
filename = f"screenshot_{idx}.png"
with open(filename, 'wb') as f:
f.write(response.read())
results.append({
"url": url,
"file": filename,
"status": "ok"
})
if (idx + 1) % 100 == 0:
elapsed = time.time() - start
print(f" {idx + 1} screenshots in {elapsed:.1f}s")
except Exception as e:
results.append({
"url": url,
"status": "error",
"error": str(e)
})
total_elapsed = time.time() - start
print(f"\n✓ Completed {len(urls)} screenshots in {total_elapsed:.1f}s")
return results
# Usage: Screenshot 1,000 URLs
if __name__ == "__main__":
urls = [
"https://example.com/page-1",
"https://example.com/page-2",
# ... 998 more URLs
]
results = screenshot_bulk_pagebolt(urls)
failed = [r for r in results if r.get("status") == "error"]
print(f"Success: {len(results) - len(failed)} | Failed: {len(failed)}")
# Process results
for result in results:
if result["status"] == "ok":
print(f"{result['url']} -> {result['file']}")
What you're NOT managing:
- Browser instances
- Memory/CPU allocation
- File I/O (save PNGs locally or upload to your storage)
Timeline for 1,000 URLs:
- API calls execute in parallel (cloud infrastructure handles scaling)
- Network latency dominates (average page load: 2–3 seconds per URL)
- Cloud parallelization: ~50 concurrent requests
- 1,000 URLs ÷ 50 parallel ≈ 20 API batches × 3 seconds/batch ≈ 60 seconds
- Total: 1–2 minutes + minimal overhead
Performance Comparison
| Task | Puppeteer (4 parallel) | PageBolt (cloud parallel) |
|---|---|---|
| Browser initialization | 3–5s | — (managed) |
| Screenshot capture (1,000 URLs) | ~750s (12–13 min) | ~60s (1–2 min) |
| File I/O + storage | ~30s | Minimal (save locally) |
| Error handling/retries | Manual | — |
| Total | 12–15 min | 1–2 min |
| Speedup | — | 10–15x faster |
The massive difference comes from:
- Parallelization: Puppeteer maxes out at 4–8 parallel processes. PageBolt runs 50–100 concurrent requests.
- Infrastructure: PageBolt's cloud handles resource contention automatically.
- Operational overhead: Puppeteer requires retry logic, error handling, and file management. You handle it yourself.
Real Use Cases
1. Website Monitoring — Screenshot competitor sites daily. Detect visual changes. PageBolt makes this trivial (1,000 competitors in 2 minutes).
2. QA Visual Regression — Test 500 page variants on every code change. Compare screenshots. Fast bulk capture is essential.
3. SEO Auditing — Snapshot 10,000 pages from your sitemap monthly. Detect broken layouts, missing images, rendering issues.
4. Marketplace Thumbnails — Generate preview images for 5,000+ listings on upload. Speed matters for user experience.
When to Use PageBolt Over Puppeteer
Use PageBolt if:
- You're taking 100+ screenshots in a single job
- You need fast turnaround (QA, monitoring, auditing)
- You want to eliminate infrastructure management
- Parallel execution is critical
Stick with Puppeteer if:
- You need pixel-level DOM manipulation (form filling, JavaScript execution)
- You're doing a few screenshots as part of a larger workflow
- Your screenshots are deeply customized (specific viewport sizes, device emulation per URL)
Get Started
Step 1: Sign up free at pagebolt.dev — 100 API requests/month, no credit card.
Step 2: Get your API key.
Step 3: Replace your Puppeteer loop with the PageBolt example above.
Step 4: Screenshot 1,000 websites in 2 minutes instead of 15.
Puppeteer is the industry standard for browser automation. But for bulk parallel jobs, it's infrastructure-heavy and slow. PageBolt's cloud API gives you the parallelization Puppeteer can't match — without managing Chromium instances, memory pools, or retry logic.
Ready to 10x your bulk screenshot speed? Try PageBolt free →
Top comments (0)