Headless browser API: Self-hosted vs managed, when each makes sense
You need to automate browser tasks — screenshots, PDFs, form fills, testing. You have two paths:
- Self-hosted — Run Puppeteer/Playwright on your servers
- Hosted API — Call a managed headless browser service
Each has tradeoffs. Most teams pick wrong and regret it.
The self-hosted trap
Self-hosting a headless browser sounds simple: npm install puppeteer, write a script, deploy. In reality:
// This looks easy...
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const screenshot = await page.screenshot();
But production is messy.
Hidden costs of self-hosting
Infrastructure
- Each browser instance needs 300-500MB RAM
- 10 concurrent requests = 3-5GB RAM minimum
- Add margin for spikes = you need 8GB+ instance
- EC2 instance: $50-150/month just for browser capacity
Orchestration
- Browser pools fail silently
- Connection timeouts need retry logic
- Memory leaks require process recycling
- You're now managing lifecycle, health checks, auto-restart
Scaling
- Vertical scaling hits ceiling (instance size limit)
- Horizontal scaling adds complexity (load balancing, session affinity)
- 100 concurrent users = multiple servers, Kubernetes cluster management
Maintenance
- Chrome versions change → tests break
- Security patches → deployments
- Dependency updates → regression testing
- 5+ hours/month firefighting
Monitoring & debugging
- Timeouts are silent failures (what actually went wrong?)
- OOM kills are catastrophic (no graceful degradation)
- Performance degradation is hard to diagnose
- On-call stress when things fail at 2 AM
Real cost (often hidden):
- Infrastructure: $50-300/month
- DevOps time: 5-10 hours/month (~$1,000-2,000)
- Opportunity cost: time spent firefighting vs building features
- Total: $1,500-2,500/month (in most companies' effective hourly rate)
When self-hosting makes sense
Self-hosting is worth it if:
✅ You're running 1,000+ screenshots/day (economies of scale)
✅ You have a dedicated DevOps engineer anyway
✅ You need sub-millisecond response times (not possible over HTTP)
✅ You have strict data residency requirements (EU data never leaves EU)
✅ Your use case is internal-only (no user-facing latency pressure)
For most teams: not worth it.
The hosted API approach
Hosted headless browser APIs (like PageBolt) invert the tradeoff:
curl -X POST https://api.pagebolt.dev/take_screenshot \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"url": "https://example.com"}'
# Response: PNG in 2-3 seconds
# No infrastructure. No servers. Done.
Advantages of hosted
Zero infrastructure
- No servers to manage
- No scaling to worry about
- No DevOps work
Fast
- API latency: 2-3 seconds
- Global CDN for edge locations
- Consistent performance
Reliable
- 99.9% uptime SLA
- Automatic failover
- Managed by specialists
Scalable
- 1 request or 10,000/day — same API
- No performance degradation
- Auto-scaling built-in
Cost-predictable
- Per-request pricing
- No surprise infrastructure bills
- Scale down anytime
Self-hosted vs hosted: Direct comparison
| Factor | Self-hosted | Hosted API |
|---|---|---|
| Setup | 2-3 hours | 10 minutes |
| Infra cost/month | $50-300 | $0 |
| DevOps time/month | 5-10 hours | 0 hours |
| Latency | 5-10s (cold start) | 2-3s |
| Scaling | Vertical (capped) | Unlimited |
| Uptime | 99% (if lucky) | 99.9% SLA |
| On-call stress | High | None |
| Per-screenshot cost | $0.05-0.20 (infra) | $0.01-0.03 (API) |
| Best for | Internal tools, high volume | User-facing, unpredictable load |
Real-world example: E-commerce screenshots
Scenario: Capture product page screenshots for every listing (500 new products/day).
Self-hosted approach
const puppeteer = require('puppeteer');
// Launch browser pool
const POOL_SIZE = 5;
const browsers = [];
async function initPool() {
for (let i = 0; i < POOL_SIZE; i++) {
browsers.push(await puppeteer.launch({
args: ['--no-sandbox', '--disable-dev-shm-usage']
}));
}
}
let currentBrowser = 0;
async function captureScreenshot(url) {
const browser = browsers[currentBrowser++ % POOL_SIZE];
try {
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
const screenshot = await page.screenshot({ format: 'jpeg', quality: 90 });
await page.close();
return screenshot;
} catch (error) {
console.error(`Failed to capture ${url}:`, error);
// Retry? Fallback? Manual intervention?
return null;
}
}
// Run on 5x EC2 t3.large ($0.10/hour each)
// 120 hours/month = $300/month infrastructure
// Plus: monitoring, alerting, debugging, scale planning
Cost: $300+ infrastructure + 5-10 hours DevOps = ~$1,500/month effective cost.
Hosted API approach
# Daily cron job: capture 500 screenshots
for product_id in $(curl https://api.example.com/products/new); do
curl -X POST https://api.pagebolt.dev/take_screenshot \
-H "Authorization: Bearer $PAGEBOLT_API_KEY" \
-d '{"url": "https://store.example.com/product/'$product_id'"}'
done
# Cost: 500/day × 30 days = 15,000 requests/month → Growth plan = $79/month
# DevOps: 0 hours
# Infra: $0
# Total: $79/month (no hidden costs)
Cost: $79/month (Growth plan), $0 infrastructure, $0 DevOps = $79/month total.
Decision tree: Self-hosted or hosted?
Ask these questions:
-
Volume: 1,000+ requests/day?
- Yes → Self-hosted might pay off (if you have DevOps)
- No → Hosted API is cheaper
-
Predictability: Do you know your peak load?
- No → Hosted API (no scaling surprises)
- Yes → Could go either way
-
Data sensitivity: Must data stay in-region?
- Yes → Self-hosted (or check API provider's data residency)
- No → Hosted API
-
DevOps capacity: Do you have someone dedicated?
- No → Hosted API (essential)
- Yes → Self-hosted becomes viable
-
Time-to-market: Do you need this running today?
- Yes → Hosted API (10 minutes vs 2-3 hours)
- No → Could go either way
If you answer "No" to 3+ questions above, use a hosted API.
For most teams (especially startups, small teams, unpredictable load): hosted wins.
Hybrid approach: Best of both?
Some teams try hybrid:
- Internal dashboards/tools: self-hosted Puppeteer (full control, zero latency)
- User-facing features: hosted API (reliability, scaling, no DevOps)
This works if you have 2+ distinct use cases with very different requirements. For most: overkill complexity.
Getting started with hosted
- Sign up at pagebolt.dev (free: 100 requests/month)
- Get API key (1 minute)
- Make first API call (5 minutes)
- Evaluate: Does it solve your use case?
- Migrate if it does (or keep self-hosted if self-hosting was the right call)
The real question
Self-hosting isn't about "better control" or "not trusting external APIs." It's about: Can you afford 5+ hours/month of DevOps overhead plus the on-call stress?
If yes: self-hosted is viable.
If no: hosted API solves the problem immediately.
Most teams say "no."
Start free — 100 screenshots/month, no credit card. See if hosted makes sense for your use case.
Top comments (0)