DEV Community: PDFops

Migrating off wkhtmltopdf on AWS Lambda

PDFops — Thu, 23 Jul 2026 13:07:58 +0000

wkhtmltopdf's GitHub repo was archived in January 2023, its organization followed in July 2024, and its last release still carries an unpatched, CVSS-9.8 server-side request forgery. None of that has stopped it from running inside a Lambda Layer behind thousands of invoice and receipt endpoints — because "swap out the rendering engine" sounds like a multi-week project. Usually it isn't. If what you're generating is the same layout with different data every time, the fix is smaller than a rendering migration: stop rendering HTML and fill a form instead.

The state of wkhtmltopdf, plainly

Three facts, so you can decide how urgent this is without taking my word for it:

None of this is a knock on the original project — it did its job for over a decade. But "archived + unpatched critical CVE" is the point where a security review stops asking "is this still fine" and starts asking "why is this still here."

Why it's worse specifically on Lambda

wkhtmltopdf isn't a library — it's a compiled binary wrapping a decade-old fork of QtWebKit, which means running it in a function means shipping the binary alongside your code, not importing a package:

When this migration doesn't apply

Be honest about what you're actually rendering before doing this. PDFops fills fields in a PDF template — it doesn't render arbitrary HTML and CSS. If your output genuinely varies in layout — long-form articles, portfolio pages, dashboards with charts, anything where the structure changes per document, not just the values — you still need a real browser renderer. That's a different migration (Puppeteer or Playwright in a proper sandbox, not a Lambda Layer binary), and this guide isn't it.

But check what you're actually generating. Most wkhtmltopdf endpoints render the same HTML template every time — an invoice, a receipt, a monthly statement, a contract — with only the data changing. That's a form problem wearing an HTML renderer's clothes, and it's the case this guide covers.

The swap

Re-author the HTML template once as a real PDF with AcroForm fields — where a Liquid/Handlebars variable used to sit, there's now a named form field. Acrobat's Prepare Form tool, LibreOffice Draw's Form Controls toolbar, or pdftk on an existing PDF all work; it's a one-time job per template, not a per-request cost. Don't have one handy to test against? Grab the sample invoice-template.pdf — it ships with customer_name and total fields already wired up.

Once the template has real fields, POST /api/fill-form replaces the entire render step: upload the template plus a JSON object of field values, get back a filled PDF. No HTML string assembly, no CSS, no rendering engine — just data going into boxes that already exist.

The Lambda handler

Node.js 22.x runtime, ESM, no Layer. The template ships inside the deployment zip and is read once at cold start — outside the handler — so a warm invocation does zero filesystem work before calling the API:

// index.mjs — AWS Lambda (Node.js 22.x), Function URL
import { readFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import { fileURLToPath } from 'node:url';
import { PdfOps, PdfOpsError } from 'pdfops-sdk';

const here = dirname(fileURLToPath(import.meta.url));
// Bundled in the deployment package — no Layer, no /tmp write.
const template = await readFile(join(here, 'invoice-template.pdf'));

const pdfops = new PdfOps({ apiKey: process.env.PDFOPS_API_KEY });

export const handler = async (event) => {
  if (event.requestContext?.http?.method !== 'POST') {
    return { statusCode: 405, body: 'POST a JSON body: { customer_name, total }' };
  }

  const invoice = JSON.parse(event.body ?? '{}');

  try {
    const filled = await pdfops.fillForm(template, {
      customer_name: String(invoice.customer_name ?? ''),
      total: String(invoice.total ?? ''),
    });

    return {
      statusCode: 200,
      headers: { 'content-type': 'application/pdf' },
      body: Buffer.from(filled).toString('base64'),
      isBase64Encoded: true,
    };
  } catch (e) {
    if (e instanceof PdfOpsError) {
      return { statusCode: e.status, body: JSON.stringify({ error: e.code, details: e.message }) };
    }
    throw e;
  }
};

npm install pdfops-sdk, zip index.mjs, node_modules, and invoice-template.pdf together, set PDFOPS_API_KEY as an environment variable, done. Compare that deployment package — a handful of KB of pure JavaScript — to a Lambda Layer carrying a 50-80MB native binary, and the size difference alone is most of why cold starts get faster: there's no subprocess to spin up before your code even runs.

What you get beyond removing the CVE

Filling a form is a narrower operation than rendering HTML, and narrower means more predictable. The same input bytes and the same field values produce byte-identical output every time — no font-substitution drift, no headless-renderer version bump silently reflowing last month's invoices. If that determinism matters for hashing, caching, or diffing PDFs in CI, that's the whole argument in the case for deterministic PDF filling.

Getting a key

The handler above needs a key to run past the anonymous trial. POST /api/signup with an email gets a free one — 250 requests/month, no card, delivered to your inbox (never in the response body, so inbox possession doubles as ownership proof):

curl -X POST https://pdfops.dev/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email":"you@example.com"}'

Anonymous calls (no X-API-Key header) still work at 100 requests/IP/month — enough to run the example below before deciding whether to wire in a key.

Try it

Prove the fill step from a terminal before touching Lambda:

curl -X POST https://pdfops.dev/api/fill-form \
  -F "pdf=@invoice-template.pdf" \
  -F 'fields={"customer_name":"Acme Co","total":"$1,250.00"}' \
  -o filled.pdf

filled.pdf opens with both fields populated — that curl call is exactly what the Lambda handler above does under the hood, just wrapped in the SDK's typed fillForm method instead of raw multipart.

Assembling a KYC packet on Deno Deploy

PDFops — Tue, 21 Jul 2026 21:12:02 +0000

Compliance owns the KYC application template, not engineering — and they revise it without filing a ticket. Hardcode the field names once and the next template swap either breaks silently (a field goes unfilled) or throws unknown_field in production. The fix is to stop assuming the shape of a PDF you don't control: ask it, at request time, what fields it actually has, fill whatever comes back, then merge the filled application with the applicant's uploaded ID scan into one packet. Here's the whole flow in ~45 lines on Deno Deploy, using the new pdfops-sdk instead of raw fetch calls.

Why inspect first

Every other example on this blog hardcodes a JSON object of field names — reasonable when the template PDF is a file you committed to your own repo. A KYC application template isn't that. It's owned by a compliance or ops team who edit it in Acrobat, rename a field, add a new disclosure checkbox, and re-upload it to wherever your app fetches it from — all without a code review on your side. Filling against a hardcoded key list against a template you don't control is a bug waiting for the next revision.

POST /api/inspect answers "what fields does this PDF have right now?" — every AcroForm field's name, type, and (for dropdowns/radios) its options, plus a ready-to-fill fillTemplate object keyed by name. Call it once per request against the live template, and the fill step only ever writes into fields that actually exist. A template revision changes what gets filled, not whether the request throws.

Install the SDK, get a key

Every earlier post here uses fetch and FormData directly against the HTTP API — that still works and always will. pdfops-sdk is a thin typed wrapper around the same endpoints: no dependencies, edge-safe (Workers, Vercel Edge, Deno, Bun, Node 18+, browsers), and it turns the multipart plumbing into three method calls.

npm install pdfops-sdk

You need a key to run more than the anonymous trial. A free one costs nothing and needs no card — email in, key out (the key arrives by email, not in the response body, so inbox possession is the verification step):

import { PdfOps } from 'pdfops-sdk';

const trial = new PdfOps(); // no key: 100 requests/IP/month
await trial.signup('you@example.com'); // free key, 250/month, no card — check your inbox

Once the key lands, wire it in and every call meters against your own quota instead of your IP's:

const pdfops = new PdfOps({ apiKey: Deno.env.get('PDFOPS_API_KEY') });

The Deno Deploy handler

The endpoint takes a multipart POST — the applicant's data as a JSON field, their ID scan as a file — pulls the current application template, inspects it, fills only the fields it finds, and merges in the ID scan. No browser, no second runtime, ~45 lines.

// main.ts — Deno Deploy
import { PdfOps, PdfOpsError } from 'pdfops-sdk';

const pdfops = new PdfOps({ apiKey: Deno.env.get('PDFOPS_API_KEY') });

Deno.serve(async (req: Request) => {
  if (req.method !== 'POST') {
    return new Response('POST multipart/form-data: applicant, id_scan', { status: 405 });
  }

  const form = await req.formData();
  const idScan = form.get('id_scan');
  const applicantRaw = form.get('applicant');
  if (!(idScan instanceof File) || typeof applicantRaw !== 'string') {
    return new Response('expected fields "applicant" (JSON) and "id_scan" (file)', { status: 400 });
  }
  const applicant: Record<string, unknown> = JSON.parse(applicantRaw);

  // 1. Pull the CURRENT template — compliance swaps this file on their
  //    own schedule, so the fetch (not a bundled asset) is deliberate.
  const templateResp = await fetch(Deno.env.get('TEMPLATE_URL')!);
  const template = new Uint8Array(await templateResp.arrayBuffer());

  // 2. Ask the template what fields it has right now, instead of
  //    assuming a fixed shape.
  const { fillTemplate } = await pdfops.inspect(template);

  // 3. Map applicant data onto whatever fields actually exist. A field
  //    the template dropped is silently skipped; a field the applicant
  //    didn't send stays at the template's default.
  const values: Record<string, string> = {};
  for (const name of Object.keys(fillTemplate)) {
    if (name in applicant) values[name] = String(applicant[name]);
  }

  try {
    // 4. Fill the application.
    const filled = await pdfops.fillForm(template, values);

    // 5. Merge the filled application with the uploaded ID scan into
    //    one packet — page order follows array order.
    const idBytes = new Uint8Array(await idScan.arrayBuffer());
    const packet = await pdfops.merge([filled, idBytes]);

    return new Response(packet, { headers: { 'content-type': 'application/pdf' } });
  } catch (e) {
    if (e instanceof PdfOpsError) {
      return new Response(JSON.stringify({ error: e.code, details: e.message }), { status: e.status });
    }
    throw e;
  }
});

deployctl deploy main.ts, set PDFOPS_API_KEY and TEMPLATE_URL as environment variables, and you have a live endpoint. The fillForm and merge calls both return Uint8Array PDF bytes — hand them to storage, a Response, or an email attachment as-is; nothing in this handler buffers more than one document at a time.

Why merge goes last, and what else it can hold

merge([filled, idBytes]) concatenates pages in array order, so the filled application always lands as page 1 — the reviewer opens the packet and reads the application first, attachments after. The array isn't capped at two: a real KYC packet is usually merge([filled, idFront, idBack, proofOfAddress]) — every attachment the applicant uploaded, in the order you want a reviewer to see them, still one POST /api/merge call.

Errors and watching your quota

Every non-2xx response the SDK sees throws a PdfOpsError with the API's stable error slug — the catch block above turns unknown_field, invalid_pdf, or a 429 rate_limited straight into the right HTTP status for the caller, instead of a generic 500. And because the free tier has a real ceiling (250/month per key), check it before you're surprised by one:

const { used, remaining, resets_at } = await pdfops.usage();

That's GET /api/usage under the hood — it reads the same counter the 429 fires on, so the number you show a user is never out of sync with the number that blocks them.

Where this fits a real app

Swap the runtime and the trigger, and the same three-call shape — inspect, fill, merge — covers most "combine a filled form with something a user uploaded" flows:

The part that's new here is inspect in the chain — reach for it whenever the template isn't a file you fully control, not just for KYC packets.

Try it

All three endpoints are live. Prove the chain from your terminal before writing a line of Deno:

# 1. What fields does this template have?
curl -X POST https://pdfops.dev/api/inspect -F "pdf=@application.pdf"

# 2. Fill it
curl -X POST https://pdfops.dev/api/fill-form \
  -F "pdf=@application.pdf" \
  -F 'fields={"applicant_name":"Ada Lovelace","country":"UK"}' \
  -o filled.pdf

# 3. Merge with the ID scan
curl -X POST https://pdfops.dev/api/merge \
  -F "pdf=@filled.pdf" -F "pdf=@id-scan.pdf" \
  -o packet.pdf

You get 100 keyless requests per IP per month, or pdfops.signup('you@example.com') / the form at /pricing for a free key (250/mo, no card — delivered by email).

Building a KYC or onboarding flow and something about inspect's field-type coverage doesn't fit yours? Drop a note on the feedback form — it's the fastest way to influence what ships next.

Fill a PDF form inside a Cloudflare Worker — no Chromium, no Lambda

PDFops — Tue, 30 Jun 2026 22:50:40 +0000

You're building on Cloudflare Workers and you need to write values into a PDF form — an invoice, a contract, a government form. You reach for the thing you used last time: headless Chrome, or a PDF library bundled into a Lambda. On Workers, neither one fits. Chrome won't run in a V8 isolate, and standing up a Lambda just to render a PDF drags a whole second runtime into an app that didn't need one. The fill itself is one HTTP call. Here's the minimal Worker, and the reason the hosting substrate — not the PDF code — is the part that actually matters.

Why headless Chrome doesn't fit Workers

The default way to "make a PDF in JavaScript" for the last decade has been Puppeteer driving headless Chrome: render HTML, call page.pdf(), done. It works on a normal Node server or a fat Lambda with a Chromium layer. It does not work on Cloudflare Workers, and the reason is architectural, not a missing flag. A Worker runs in a V8 isolate — the same engine Chrome uses, but with no operating system underneath it. There's no filesystem, no ability to spawn a subprocess, and a hard memory and CPU-time budget per request. Headless Chrome is an entire browser binary that expects all of those things. You can't bundle a ~150 MB browser into a Worker, and even if you could, there's nothing to exec() it.

AcroForm filling — writing values into a form-enabled PDF's existing fields — doesn't need a browser at all. The fields are already defined in the PDF; you're setting their values and flattening, which is pure byte manipulation. The work is a poor match for a render engine and a perfect match for a stateless function. The only question is where that function runs.

Why not just put it on Lambda

The usual escape hatch is "keep the PDF work on AWS Lambda and call it from the Worker." That works, but look at what it costs you. You're now running two runtimes for one feature: the Worker that owns the request, and a Lambda that exists only to hold a PDF library. You inherit Lambda's cold starts on a path your edge app was specifically built to keep fast. You're routing edge → us-east-1 → edge for every document, so a user in Sydney pays a trans-Pacific round trip to fill a one-page form. And you've split your deploy: two log streams, two IAM surfaces, two things to keep in sync. None of that is about PDFs. It's all substrate drag, bolted onto an app that chose Workers to avoid exactly this.

The alternative is to treat the fill as what it is — a stateless transform — and call a hosted endpoint that runs on the same kind of globally-distributed substrate your Worker already lives on. The Worker stays the only thing you deploy.

The Worker

Here's the whole thing. It takes a JSON body of field values, fetches an AcroForm template from R2, calls /api/fill-form, and returns the filled PDF. No browser, no second runtime, ~35 lines.

// src/index.ts — Cloudflare Worker (module syntax)
export interface Env {
  TEMPLATES: R2Bucket;        // bucket holding your blank AcroForm PDFs
}

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    if (req.method !== 'POST') {
      return new Response('POST a JSON body of field values', { status: 405 });
    }

    // 1. The values to write into the form's named fields.
    const fields = await req.json<Record<string, string>>();

    // 2. Pull the blank template from R2 (cached at the edge after first read).
    const obj = await env.TEMPLATES.get('invoice-template.pdf');
    if (!obj) return new Response('template missing', { status: 500 });
    const templatePdf = await obj.arrayBuffer();

    // 3. One call to PDFops. Field keys must match the PDF's AcroForm
    //    field names — use /tools/inspect to list them if you're unsure.
    const fd = new FormData();
    fd.append('pdf', new Blob([templatePdf], { type: 'application/pdf' }), 'template.pdf');
    fd.append('fields', JSON.stringify(fields));

    const resp = await fetch('https://pdfops.dev/api/fill-form', { method: 'POST', body: fd });
    if (!resp.ok) {
      return new Response(`fill failed: ${await resp.text()}`, { status: 502 });
    }

    // 4. Stream the filled PDF straight back to the caller.
    return new Response(resp.body, {
      headers: { 'Content-Type': 'application/pdf' },
    });
  },
};

That's the production shape. Bind an R2 bucket named TEMPLATES in wrangler.toml, drop a form-enabled PDF into it, wrangler deploy, and POST a JSON object of field values. You get a filled PDF back with no second service in the picture. The resp.body stream means the Worker never buffers the whole document in memory — it pipes PDFops' response through, which keeps you well inside the isolate's memory budget even for large forms.

Two things worth knowing. The field keys have to match the names baked into the PDF's AcroForm — if you're not sure what they are, the free Form-Field Inspector lists every field name in any PDF you drop on it. And the R2 get() is edge-cached after the first read, so you're not re-fetching the template on every request — the steady-state path is just the one fetch to fill-form.

Where this fits a real app

The bare Worker above is the primitive. In practice the trigger is usually a webhook or a queue, and the output goes to storage plus an email. Those are the same pattern with more wiring around the fill step:

In every one of these the fill is the same single fetch. What changes is the trigger and the destination — never the PDF substrate.

When this pattern doesn't fit

Try it

The endpoint is live and works against any AcroForm PDF. Before you even write the Worker, prove the fill from your terminal:

curl -X POST https://pdfops.dev/api/fill-form \
  -F "pdf=@invoice-template.pdf" \
  -F 'fields={"customer_name":"Acme Corp","invoice_no":"INV-1042","amount_due":"$2,400.00"}' \
  -o filled-invoice.pdf

You'll get the filled PDF back. From there the Worker above is just that same call wrapped in a fetch handler. During beta it's 100 requests per IP per month, free, no signup.

Workers-specific questions, a binding that's fighting you, or an endpoint you wish existed? Drop a note on the waitlist form — the message field is the fastest way to influence what ships next.

Fill a PDF in JavaScript — in the browser or via API

PDFops — Sat, 13 Jun 2026 18:03:40 +0000

JavaScript gives you a choice no other language quite does: you can fill a PDF entirely in the browser, so the file never leaves the user’s machine, or you can fill it server-side when the output must be authoritative. This page shows both with runnable code, and is honest about when each is the right call.

In the browser, client-side (the file never uploads)

With pdf-lib the whole fill runs in the page. Read a chosen file into bytes, set the AcroForm fields, and offer the result as a download — nothing is sent anywhere:

import { PDFDocument } from "pdf-lib";

const input = document.querySelector("#pdf");      // <input type="file">
input.addEventListener("change", async () => {
  const bytes = await input.files[0].arrayBuffer();
  const doc = await PDFDocument.load(bytes);
  const form = doc.getForm();
  form.getTextField("customer_name").setText("Acme Co");
  form.getTextField("invoice_total").setText("$1,250.00");
  // form.flatten();  // optional: bake values in

  const out = await doc.save();
  const url = URL.createObjectURL(new Blob([out], { type: "application/pdf" }));
  Object.assign(document.createElement("a"), { href: url, download: "filled.pdf" }).click();
});

This is the right default for anything sensitive — tax forms, contracts, medical intake — because the document stays on the device. You can see exactly this pattern, fill and merge, running live in the PDFops playground, which does all its work client-side.

Server-side, via one `fetch`

When the filled PDF needs to be authoritative — generated from data the browser doesn’t have, or produced where the client can’t be trusted to make the canonical copy — fill it through the API. The same fetch works from a Worker, an edge function, or a Node backend:

const form = new FormData();
form.append("pdf", fileBlob, "template.pdf");      // a Blob/File of the template
form.append("fields", JSON.stringify({
  customer_name: "Acme Co",
  invoice_total: "$1,250.00",
}));

const resp = await fetch("https://pdfops.dev/api/fill-form", {
  method: "POST",
  body: form,
});
const filled = await resp.arrayBuffer();           // the filled PDF bytes

In production, call this from your own backend or edge function rather than directly from the browser — that keeps rate limits and keys (post-beta) under your control and out of end-user hands. During beta there’s no key, so a direct browser call is fine for a prototype. Use the Form-Field Inspector to get the exact field names for your fields object.

Browser or server: how to choose

	Client-side (pdf-lib in browser)	Server-side (PDFops API)
File leaves the device	No — stays in the browser	Yes — sent to the API to fill
Authoritative output	Client-produced (trust the client)	Server-produced (canonical)
Fill from server-only data	No — only what the page has	Yes
Form internals you manage	Typed accessors, flatten, appearances	None — handled server-side
Merge available	Yes — pdf-lib copyPages	Yes — /api/merge
Determinism	Deterministic (no AI)	Deterministic, audit-safe
Best for	Privacy-first, zero-upload, instant preview	Canonical records, server data, store/sign/merge

The honest summary: if privacy or zero-upload is the point, fill client-side with pdf-lib — it’s free and the file never moves. If the output must be the system-of-record copy, or you fill from data the browser doesn’t hold, fill server-side. A common shape is both: a fast client-side preview, then a server-side canonical fill on submit.

Merging PDFs in JavaScript too

Server-side, merge is the same one-call primitive:

const form = new FormData();
for (const blob of pdfBlobs) form.append("pdfs", blob);
const merged = await (await fetch("https://pdfops.dev/api/merge", {
  method: "POST", body: form,
})).arrayBuffer();

Client-side, pdf-lib merges with copyPages into a fresh document. Either way the fill-then-merge flow stays deterministic end to end. Endpoint details: fill-form docs, merge docs.

Frequently asked

How do I fill a PDF form in the browser with JavaScript?

Read the file into an ArrayBuffer, then use pdf-lib client-side: load the bytes, getForm(), set field values, save() back to bytes you offer as a download. Everything runs in the browser, so the PDF never leaves the device. The playground does exactly this, live.

Should I fill in the browser or on the server?

Browser when privacy or zero-upload matters — the file stays on the device. Server when the output must be authoritative, when you fill from data the browser lacks, or when you also store/sign/merge server-side. Many apps do both: client-side preview, server-side canonical fill.

Can I call the PDFops API from browser JavaScript?

Yes — it's a normal fetch with FormData. For production, call it from your own backend or edge function so you control rate limits and keys (post-beta) and don't expose usage to end users. During beta there's no key, so a direct browser call works for prototypes.

Is pdf-lib enough to fill a PDF in JavaScript?

Often yes — pdf-lib is pure JS, runs in browsers and Node, and fills AcroForm fields well. You own the form internals (typed accessors, flatten, appearance/font edge cases). PDFops is built on a modernized pdf-lib fork and offers that engine as a deterministic hosted API for when you'd rather not own those edge cases or need server-authoritative output.

Does filling change the bytes unpredictably?

It shouldn't. Both pdf-lib and the PDFops API fill deterministically — same template plus same values yields the same field-level output, no AI in the path. Outputs stay diffable and audit-safe. The argument is in this essay.

Try it in 30 seconds

Fill and merge a real PDF client-side, right now, in the playground — no signup. On another stack? See fill a PDF in Python and fill a PDF in Node.

If the deterministic fill + merge primitive fits your usage, join the waitlist and tell me the forms you fill most — that signal is what the pricing tiers and the in-function library get built around.

Fill a PDF in Node.js — one fetch, no headless browser

PDFops — Sat, 13 Jun 2026 17:57:32 +0000

To fill a PDF form from Node you can run it in-process with pdf-lib, or POST the template to an HTTP API and get the filled PDF back. On Node 18+ the HTTP path needs zero dependencies — fetch and FormData are global — and the same code runs unchanged on Workers, Lambda, and edge functions. Here is both, with an honest read on which fits.

The fastest path: one `fetch`

PDFops fills AcroForm fields server-side. From modern Node it is a few lines, no packages installed:

import { readFile, writeFile } from "node:fs/promises";

const form = new FormData();
form.append("pdf", new Blob([await readFile("template.pdf")]), "template.pdf");
form.append("fields", JSON.stringify({
  customer_name: "Acme Co",
  invoice_total: "$1,250.00",
  paid: "Yes",
}));

const resp = await fetch("https://pdfops.dev/api/fill-form", {
  method: "POST",
  body: form,
});
if (!resp.ok) throw new Error(`fill failed: ${resp.status}`);
await writeFile("filled.pdf", Buffer.from(await resp.arrayBuffer()));

The keys in the fields object must match the AcroForm field names in the template. Run the PDF through the Form-Field Inspector to see the exact names and types. No API key or signup during beta.

Doing it in-process: pdf-lib

The local route is pure JavaScript and keeps everything in-process — no network call:

import { readFile, writeFile } from "node:fs/promises";
import { PDFDocument } from "pdf-lib";

const doc = await PDFDocument.load(await readFile("template.pdf"));
const form = doc.getForm();
form.getTextField("customer_name").setText("Acme Co");
form.getTextField("invoice_total").setText("$1,250.00");
// form.flatten();  // optional: bake values in, drop interactivity
await writeFile("filled.pdf", await doc.save());

This is a genuinely good library and for many backends it is the right answer. The work you take on is the form internals: getting the right typed accessor per field (getTextField vs getCheckBox vs getRadioGroup), deciding when to flatten(), and handling appearance/font cases on unusual templates. PDFops is built on a modernized fork of pdf-lib, so the API path is essentially that same engine offered as a managed, deterministic service — you trade a network call for not owning the edge cases.

Why this matters on serverless and the edge

The reason the fetch version is interesting isn’t brevity — it’s portability. Because the call is plain fetch + FormData, both Web-standard, the identical code runs on Cloudflare Workers, Vercel Edge, Deno Deploy, and Bun, not only Node. There is no headless Chromium to bundle and no native addon to compile, which is exactly what tends to block PDF tooling on those platforms.

	PDFops (HTTP)	pdf-lib (in-process)
Dependencies on Node 18+	None — global fetch/FormData	pdf-lib (pure JS, no native addons)
Runs on Workers / Edge / Deno / Bun	Yes — same code, unchanged	Yes — pure JS, but you bundle it
Network call required	Yes — one POST per fill	No — fully in-process
Form internals you manage	None — handled server-side	Typed accessors, flatten, appearances
Merge in the same tool	Yes — /api/merge	Yes — copyPages + save
Determinism	Deterministic, audit-safe	Deterministic (no AI either)
Best for	No-dependency serverless, fill+merge as a service	In-process control, no-network constraints

The honest summary: if you want the fill fully in-process and don’t mind owning form internals, pdf-lib is an excellent, free choice. If you want one deterministic API for fill and merge, identical behavior from local Node to the edge, and nothing to bundle or compile, the HTTP call is worth the round-trip.

Merging PDFs from Node too

The same primitive merges several PDFs into one in a single call:

const form = new FormData();
for (const path of ["a.pdf", "b.pdf", "c.pdf"]) {
  form.append("pdfs", new Blob([await readFile(path)]), path);
}
const resp = await fetch("https://pdfops.dev/api/merge", { method: "POST", body: form });
await writeFile("merged.pdf", Buffer.from(await resp.arrayBuffer()));

A common backend shape is fill-then-merge: fill several templates, then concatenate the results into one document to deliver. Both halves are the same deterministic primitive, so the combined output is reproducible. Endpoint details: fill-form docs, merge docs.

Frequently asked

How do I fill a PDF form in Node.js?

Either run pdf-lib in-process to set AcroForm values, or build a FormData with the template plus a JSON field map and POST it. On Node 18+, fetch/FormData/Blob are global, so the HTTP fill needs no dependencies and runs unchanged on Workers, Lambda, and edge functions.

Should I use pdf-lib or PDFops?

pdf-lib when you want the fill fully in-process with no network call and are comfortable managing form internals. PDFops when you want one deterministic API for fill and merge, identical behavior across Node and edge, and nothing to bundle. PDFops is built on a modernized pdf-lib fork, so the API gives you that engine as a managed service.

Can I fill a PDF on Cloudflare Workers or Vercel Edge?

Yes — the call is plain fetch + FormData, both Web-standard, so the same code runs on Workers, Vercel Edge, Deno Deploy, and Bun. No headless browser, no native addon — which is the usual blocker for PDF tooling on those platforms.

Do I need a library to call PDFops from Node?

No. On Node 18+, fetch/FormData/Blob are global, so a fill is a few lines with zero dependencies. On older Node, add undici or node-fetch plus form-data, or just upgrade the runtime. There is no SDK to install during beta — the API is plain HTTP.

Is the fill deterministic and audit-safe?

Yes — same template plus same field values yields the same field-level output, no AI inference in the path. Outputs are diffable and reproducible, which matters when a filled PDF is a record you may need to defend. The argument is in this essay.

Try it in 30 seconds

No API key, no signup during beta. Paste the snippet above into a Node script, or explore fields in the playground. On another stack? See fill a PDF in Python and fill a PDF in JavaScript.

Fill a PDF in Python — one HTTP call, no native deps

PDFops — Sat, 13 Jun 2026 17:57:12 +0000

If you need to fill a PDF form from Python, you have two honest options: drive a native library like pypdf or fillpdf, or POST the template to an HTTP API and get the filled PDF back. This page shows the HTTP path in full, then makes the case for when the native library is the better call — because sometimes it is.

The fastest path: one `requests.post`

PDFops fills AcroForm fields server-side on the edge. From Python it is one request — no pdftk, no poppler, nothing to install beyond requests:

import json, requests

with open("template.pdf", "rb") as f:
    resp = requests.post(
        "https://pdfops.dev/api/fill-form",
        files={"pdf": f},
        data={"fields": json.dumps({
            "customer_name": "Acme Co",
            "invoice_total": "$1,250.00",
            "paid": "Yes",
        })},
    )

resp.raise_for_status()
with open("filled.pdf", "wb") as out:
    out.write(resp.content)

The field names in the dict must match the AcroForm field names in the template. If you are not sure what those are, drop the PDF into the Form-Field Inspector — it lists every field name and type, so your keys line up on the first try. No API key or signup is required during beta.

Doing it natively: pypdf and fillpdf

The pure-Python route is real and worth knowing. pypdf fills AcroForm fields without any system binaries:

from pypdf import PdfReader, PdfWriter

reader = PdfReader("template.pdf")
writer = PdfWriter()
writer.append(reader)
writer.update_page_form_field_values(
    writer.pages[0],
    {"customer_name": "Acme Co", "invoice_total": "$1,250.00"},
    auto_regenerate=False,
)
with open("filled.pdf", "wb") as out:
    writer.write(out)

This works, and for many templates it is all you need. The friction shows up at the edges: some viewers won’t render filled values until you set the NeedAppearances flag or regenerate appearance streams; checkboxes and radio groups need the exact on-state name, not True; and embedded-font edge cases can drop characters. None of these are dealbreakers — they are simply PDF-internals work you now own.

fillpdf wraps pdfrw with a friendlier API and can flatten, but flattening pulls in pdftk (and often poppler), which is the usual snag inside containers, Lambda, and other serverless runtimes where shipping system binaries is awkward.

When PDFops fits, when a local library fits

	PDFops (HTTP)	pypdf / fillpdf (local)
Native dependencies	None — just `requests`	pypdf: none; fillpdf: pdftk/poppler to flatten
Serverless / edge friendly	Yes — nothing to provision	pypdf: yes; fillpdf: painful (binaries)
Network call required	Yes — one POST per fill	No — fully local/offline
Appearance-stream quirks	Handled server-side	You own NeedAppearances, fonts, checkboxes
Merge in the same tool	Yes — /api/merge	pypdf can merge; fillpdf cannot
Determinism	Deterministic, audit-safe	Deterministic (no AI either)
Best for	Serverless backends, no-binary stacks, fill+merge at scale	Offline jobs, no-network constraints, full local control

The honest summary: if your code already runs somewhere you control with no network constraint and you don’t mind owning PDF internals, pypdf is a fine, free, dependency-light choice. If you are on serverless/edge, want fill and merge behind one deterministic API, or simply don’t want to debug appearance streams, the HTTP call earns its keep.

Merging PDFs in Python too

The same primitive merges. Concatenate several PDFs into one in a single call:

import requests

files = [("pdfs", open(p, "rb")) for p in ["a.pdf", "b.pdf", "c.pdf"]]
resp = requests.post("https://pdfops.dev/api/merge", files=files)
resp.raise_for_status()
open("merged.pdf", "wb").write(resp.content)

A common shape is fill-then-merge: fill several AcroForm templates from your Python backend, then merge the results into one document to deliver. Both halves are the same deterministic primitive, so the combined output is reproducible end to end. The endpoint details are in the fill-form docs and the merge docs.

Frequently asked

How do I fill a PDF form in Python?

Either drive a native library (pypdf, fillpdf) that writes AcroForm values from Python, or POST the template plus a JSON map of field values to an HTTP API and get the filled PDF back. The native route is fully local; the HTTP route needs no system binaries, which is why it suits serverless and edge runtimes where installing pdftk or poppler is painful.

Should I use pypdf or PDFops?

pypdf when you want zero network calls and full local control and are happy handling appearance-stream quirks yourself. PDFops when you are on serverless/edge with no easy way to ship binaries, want one deterministic API for fill and merge, or would rather not own the PDF-internals edge cases. Many teams prototype with pypdf and move the hot path to the API once the quirks cost time.

Can I fill a PDF in Python without pdftk or poppler?

Yes. pypdf is pure Python and needs no binaries to fill. fillpdf needs pdftk/poppler to flatten, which is the usual container/serverless snag. The PDFops API needs nothing installed locally — the fill runs server-side — so it is a common pick exactly when pdftk and poppler are awkward to provision.

Is the fill deterministic?

Yes — the same template plus the same field values yields the same field-level output, with no AI inference in the path. Outputs are diffable and audit-safe, which matters for regulated documents where identical inputs must produce identical PDFs. The longer argument is in this essay.

What if my PDF has no form fields?

Filling requires AcroForm fields to exist. If a template has none, add them once in Acrobat, Mac Preview, LibreOffice Draw, or pdftk, then fill that template repeatedly. The free Inspector lists the field names a PDF exposes so your Python dict keys match them.

Try it in 30 seconds

No API key, no signup during beta. Fill a real template right now with the snippet above, or explore fields visually in the playground. Filling from another stack? See fill a PDF in Node and fill a PDF in JavaScript.

The case for deterministic PDF filling

PDFops — Sat, 13 Jun 2026 05:28:20 +0000

AI can read almost any document now. The harder question is what
writes the answer back — and for anything an auditor might ever
look at, that write step should not be a language model.

A document workflow has two halves

Most real document automation is a loop: read data out of one document, then write it into another. Read a scanned invoice, write the numbers into your ledger. Read an onboarding packet, write the values into a W-9. Read a claim, write an ACORD form.

The read half is having its moment. Vision-language models are genuinely good at pulling structured data out of messy, never-before-seen documents, and a wave of strong APIs — Extend, Reducto, LlamaParse, the hyperscalers’ document-AI services — have made it a solved-enough problem. If you need to understand an arbitrary PDF, reach for one of those.

The write half is a different problem with a different failure mode — and it’s the half people are quietly bolting an LLM onto because it’s adjacent. That’s the mistake.

Why an LLM shouldn’t fill your W-9

A model that fills a form “mostly” right is worse than useless on the documents that matter. It can misread a field label, conflate two values, or put the correct number in the wrong box. On a marketing one-pager, who cares. On a 1099, an insurance ACORD form, a healthcare pre-authorization, a tax filing — that’s not a typo, it’s a compliance incident.

And here’s the part that doesn’t get said enough: if a filled value can’t be traced to a deterministic rule, it can’t be defended in an audit. “The model was 97% confident” is not an answer when a regulator asks why field 14b says what it says. A probabilistic write step turns every filled form into something you have to trust rather than verify.

Determinism is a feature, not a limitation

A deterministic fill is boring on purpose: field customer_name maps to value "Acme Co", every single time, and you can point at the exact mapping that produced it. Same input, same output, forever — reviewable, diffable, testable, defensible.

The tell is that even the AI-fill vendors know this. The same platforms shipping “fill any form with AI” also ship a deterministic, template-based mode — precisely because the instruction/LLM mode isn’t trusted for the forms where being wrong is expensive. When the stakes are real, everyone reaches for the deterministic path.

The write step the AI wave actually needs

The clean architecture isn’t “AI does everything.” It’s a division of labor that matches each half to the right tool:

Extract with AI — probabilistic, flexible, great for unseen and messy documents. This is where the model earns its keep.
Fill deterministically — a template plus a JSON of field → value, applied exactly, with no model anywhere in the fill path. The output is auditable by construction.

That second step is what PDFops is. You hand it an AcroForm template and a JSON object; it fills the fields exactly as specified, merges the result with any other PDFs you need, and returns the bytes — running on the V8 edge, no headless browser, no model in the loop. It’s the deliberately boring write hand that the clever AI read step can hand off to.

When you should reach for AI fill

To be fair to the other side: if you’re filling arbitrary, never-seen forms with no template — a long tail of one-off PDFs you can’t pre-map — a vision model is the only thing that works, and the AI-fill APIs are good at it. The deterministic path assumes you have, or can make, a template for the form.

But most of what businesses actually fill is not a long tail. It’s the same few dozen recurring, regulated, high-stakes forms — tax, insurance, HR, healthcare, real estate — over and over. For those, you already have the template, and the right write step is the deterministic one.

See it on your own PDF

The fastest way to feel the difference: drop one of your form PDFs into the Form-Field Inspector. It lists every AcroForm field — name, type, options — and hands you the exact fields JSON and API call to fill it. No signup, no model, no guessing:

curl -X POST https://pdfops.dev/api/fill-form \
  -F "pdf=@w9-template.pdf" \
  -F 'fields={"name":"Acme Co","tin":"12-3456789","tax_classification":"C Corporation"}' \
  -o filled.pdf

Same fields in, same PDF out, every run. If that’s the write step your pipeline needs, the fill-form docs are the next stop — and the waitlist is where to tell me about your volume and the forms you fill most.

← PDFops home · Blog · Field Inspector

DEV Community: PDFops

Migrating off wkhtmltopdf on AWS Lambda

The state of wkhtmltopdf, plainly

Why it's worse specifically on Lambda

When this migration doesn't apply

The swap

The Lambda handler

What you get beyond removing the CVE

Getting a key

Try it

Related

Assembling a KYC packet on Deno Deploy

Why inspect first

Install the SDK, get a key

The Deno Deploy handler

Why merge goes last, and what else it can hold

Errors and watching your quota

Where this fits a real app

Try it

Fill a PDF form inside a Cloudflare Worker — no Chromium, no Lambda

Why headless Chrome doesn't fit Workers

Why not just put it on Lambda

The Worker

Where this fits a real app

When this pattern doesn't fit

Try it

Fill a PDF in JavaScript — in the browser or via API

In the browser, client-side (the file never uploads)

Server-side, via one fetch

Browser or server: how to choose

Merging PDFs in JavaScript too

Frequently asked

Try it in 30 seconds

Fill a PDF in Node.js — one fetch, no headless browser

The fastest path: one fetch

Doing it in-process: pdf-lib

Why this matters on serverless and the edge

Merging PDFs from Node too

Frequently asked

Try it in 30 seconds

Fill a PDF in Python — one HTTP call, no native deps

The fastest path: one requests.post

Doing it natively: pypdf and fillpdf

When PDFops fits, when a local library fits

Merging PDFs in Python too

Frequently asked

Try it in 30 seconds

The case for deterministic PDF filling

A document workflow has two halves

Why an LLM shouldn’t fill your W-9

Determinism is a feature, not a limitation

The write step the AI wave actually needs

When you should reach for AI fill

See it on your own PDF

Server-side, via one `fetch`

The fastest path: one `fetch`

The fastest path: one `requests.post`