DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

Headless browser API: Self-hosted vs managed, when each makes sense

Headless browser API: Self-hosted vs managed, when each makes sense

You need to automate browser tasks — screenshots, PDFs, form fills, testing. You have two paths:

  1. Self-hosted — Run Puppeteer/Playwright on your servers
  2. Hosted API — Call a managed headless browser service

Each has tradeoffs. Most teams pick wrong and regret it.

The self-hosted trap

Self-hosting a headless browser sounds simple: npm install puppeteer, write a script, deploy. In reality:

// This looks easy...
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const screenshot = await page.screenshot();
Enter fullscreen mode Exit fullscreen mode

But production is messy.

Hidden costs of self-hosting

Infrastructure

  • Each browser instance needs 300-500MB RAM
  • 10 concurrent requests = 3-5GB RAM minimum
  • Add margin for spikes = you need 8GB+ instance
  • EC2 instance: $50-150/month just for browser capacity

Orchestration

  • Browser pools fail silently
  • Connection timeouts need retry logic
  • Memory leaks require process recycling
  • You're now managing lifecycle, health checks, auto-restart

Scaling

  • Vertical scaling hits ceiling (instance size limit)
  • Horizontal scaling adds complexity (load balancing, session affinity)
  • 100 concurrent users = multiple servers, Kubernetes cluster management

Maintenance

  • Chrome versions change → tests break
  • Security patches → deployments
  • Dependency updates → regression testing
  • 5+ hours/month firefighting

Monitoring & debugging

  • Timeouts are silent failures (what actually went wrong?)
  • OOM kills are catastrophic (no graceful degradation)
  • Performance degradation is hard to diagnose
  • On-call stress when things fail at 2 AM

Real cost (often hidden):

  • Infrastructure: $50-300/month
  • DevOps time: 5-10 hours/month (~$1,000-2,000)
  • Opportunity cost: time spent firefighting vs building features
  • Total: $1,500-2,500/month (in most companies' effective hourly rate)

When self-hosting makes sense

Self-hosting is worth it if:

✅ You're running 1,000+ screenshots/day (economies of scale)
✅ You have a dedicated DevOps engineer anyway
✅ You need sub-millisecond response times (not possible over HTTP)
✅ You have strict data residency requirements (EU data never leaves EU)
✅ Your use case is internal-only (no user-facing latency pressure)

For most teams: not worth it.

The hosted API approach

Hosted headless browser APIs (like PageBolt) invert the tradeoff:

curl -X POST https://api.pagebolt.dev/take_screenshot \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"url": "https://example.com"}'

# Response: PNG in 2-3 seconds
# No infrastructure. No servers. Done.
Enter fullscreen mode Exit fullscreen mode

Advantages of hosted

Zero infrastructure

  • No servers to manage
  • No scaling to worry about
  • No DevOps work

Fast

  • API latency: 2-3 seconds
  • Global CDN for edge locations
  • Consistent performance

Reliable

  • 99.9% uptime SLA
  • Automatic failover
  • Managed by specialists

Scalable

  • 1 request or 10,000/day — same API
  • No performance degradation
  • Auto-scaling built-in

Cost-predictable

  • Per-request pricing
  • No surprise infrastructure bills
  • Scale down anytime

Self-hosted vs hosted: Direct comparison

Factor Self-hosted Hosted API
Setup 2-3 hours 10 minutes
Infra cost/month $50-300 $0
DevOps time/month 5-10 hours 0 hours
Latency 5-10s (cold start) 2-3s
Scaling Vertical (capped) Unlimited
Uptime 99% (if lucky) 99.9% SLA
On-call stress High None
Per-screenshot cost $0.05-0.20 (infra) $0.01-0.03 (API)
Best for Internal tools, high volume User-facing, unpredictable load

Real-world example: E-commerce screenshots

Scenario: Capture product page screenshots for every listing (500 new products/day).

Self-hosted approach

const puppeteer = require('puppeteer');

// Launch browser pool
const POOL_SIZE = 5;
const browsers = [];

async function initPool() {
  for (let i = 0; i < POOL_SIZE; i++) {
    browsers.push(await puppeteer.launch({
      args: ['--no-sandbox', '--disable-dev-shm-usage']
    }));
  }
}

let currentBrowser = 0;
async function captureScreenshot(url) {
  const browser = browsers[currentBrowser++ % POOL_SIZE];

  try {
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
    const screenshot = await page.screenshot({ format: 'jpeg', quality: 90 });
    await page.close();
    return screenshot;
  } catch (error) {
    console.error(`Failed to capture ${url}:`, error);
    // Retry? Fallback? Manual intervention?
    return null;
  }
}

// Run on 5x EC2 t3.large ($0.10/hour each)
// 120 hours/month = $300/month infrastructure
// Plus: monitoring, alerting, debugging, scale planning
Enter fullscreen mode Exit fullscreen mode

Cost: $300+ infrastructure + 5-10 hours DevOps = ~$1,500/month effective cost.

Hosted API approach

# Daily cron job: capture 500 screenshots
for product_id in $(curl https://api.example.com/products/new); do
  curl -X POST https://api.pagebolt.dev/take_screenshot \
    -H "Authorization: Bearer $PAGEBOLT_API_KEY" \
    -d '{"url": "https://store.example.com/product/'$product_id'"}'
done

# Cost: 500/day × 30 days = 15,000 requests/month → Growth plan = $79/month
# DevOps: 0 hours
# Infra: $0
# Total: $79/month (no hidden costs)
Enter fullscreen mode Exit fullscreen mode

Cost: $79/month (Growth plan), $0 infrastructure, $0 DevOps = $79/month total.

Decision tree: Self-hosted or hosted?

Ask these questions:

  1. Volume: 1,000+ requests/day?

    • Yes → Self-hosted might pay off (if you have DevOps)
    • No → Hosted API is cheaper
  2. Predictability: Do you know your peak load?

    • No → Hosted API (no scaling surprises)
    • Yes → Could go either way
  3. Data sensitivity: Must data stay in-region?

    • Yes → Self-hosted (or check API provider's data residency)
    • No → Hosted API
  4. DevOps capacity: Do you have someone dedicated?

    • No → Hosted API (essential)
    • Yes → Self-hosted becomes viable
  5. Time-to-market: Do you need this running today?

    • Yes → Hosted API (10 minutes vs 2-3 hours)
    • No → Could go either way

If you answer "No" to 3+ questions above, use a hosted API.

For most teams (especially startups, small teams, unpredictable load): hosted wins.

Hybrid approach: Best of both?

Some teams try hybrid:

  • Internal dashboards/tools: self-hosted Puppeteer (full control, zero latency)
  • User-facing features: hosted API (reliability, scaling, no DevOps)

This works if you have 2+ distinct use cases with very different requirements. For most: overkill complexity.

Getting started with hosted

  1. Sign up at pagebolt.dev (free: 100 requests/month)
  2. Get API key (1 minute)
  3. Make first API call (5 minutes)
  4. Evaluate: Does it solve your use case?
  5. Migrate if it does (or keep self-hosted if self-hosting was the right call)

The real question

Self-hosting isn't about "better control" or "not trusting external APIs." It's about: Can you afford 5+ hours/month of DevOps overhead plus the on-call stress?

If yes: self-hosted is viable.
If no: hosted API solves the problem immediately.

Most teams say "no."

Start free — 100 screenshots/month, no credit card. See if hosted makes sense for your use case.

Top comments (0)