How to convert a webpage to PDF (any language, no browser required)

#pdf #webdev #node #python

How to Convert a Webpage to PDF (Any Language, No Browser Required)

Every approach to webpage-to-PDF conversion eventually hits the same wall: you need a headless browser. Puppeteer, Playwright, wkhtmltopdf, WeasyPrint — they all spin up a browser engine to render the page and print it.

An API call skips that entirely.

curl

curl -X POST https://api.pagebolt.dev/v1/pdf \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "A4", "printBackground": true}' \
  --output page.pdf

Node.js

import fs from 'fs';

const res = await fetch('https://api.pagebolt.dev/v1/pdf', {
  method: 'POST',
  headers: { 'x-api-key': process.env.PAGEBOLT_API_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({ url: 'https://example.com', format: 'A4', printBackground: true })
});

fs.writeFileSync('page.pdf', Buffer.from(await res.arrayBuffer()));

Python

import requests

resp = requests.post(
    'https://api.pagebolt.dev/v1/pdf',
    headers={'x-api-key': 'YOUR_API_KEY'},
    json={'url': 'https://example.com', 'format': 'A4', 'printBackground': True}
)

with open('page.pdf', 'wb') as f:
    f.write(resp.content)

Options that matter

{
  "url": "https://example.com",
  "format": "A4",
  "printBackground": true,
  "blockBanners": true,
  "blockAds": true,
  "margin": { "top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm" },
  "landscape": false,
  "waitForSelector": "#content-loaded"
}

waitForSelector is useful for JS-heavy pages — it waits for a specific element before capturing, so dynamic content is included in the PDF.

Pages that block scrapers

For pages behind bot detection, add stealth: true:

{
  "url": "https://example.com",
  "stealth": true,
  "format": "A4"
}

This masks browser fingerprints so the page renders normally before capture.

Why not just use Puppeteer?

Puppeteer works — but it requires a Chromium binary (~300MB), careful memory management for concurrent requests, and a persistent process that needs to be kept alive. In serverless environments (Lambda, Cloud Run, Vercel), Chromium is a deployment headache.

An API call works the same everywhere fetch or requests works. No binary, no process management, no memory tuning.

Try it free — 100 requests/month, no credit card. → pagebolt.dev

DEV Community