You generated a "fillable" PDF. It looks perfect in Chrome. Then a user opens it in Firefox and the checkboxes are blank, the text fields show nothing until clicked, and one field renders its value an inch to the left. You did not change anything. Two PDF viewers just disagreed about what your form means.
That is the first trap with PDF forms: a field is not "filled" until it renders the same way across viewers. The PDF spec lets a viewer either trust the appearance stream you baked in or regenerate its own, and the two paths drift apart constantly. If you do not control the appearance stream, you are at the mercy of whichever engine opens the file.
The second trap is licensing. Reach for the well-known Python PDF tools and you hit a wall fast: the strongest ones are AGPL (viral copyleft you cannot ship inside a closed product without consequences), or they are paid, or they are a cloud API you have to send documents to. For a lot of teams, "send our forms to someone else's server" is a non-starter, and "relicense our whole app" is worse.
I wanted a small library that does one job: take a flat, non-interactive PDF and turn it into a real, fillable AcroForm, deterministically, locally, with a permissive license. That is acroforge.
pip install acroforge
Apache-2.0, Python 3.11+, no network calls, no AI, no cloud.
The 5-line core
Three functions. They all take bytes and return bytes, so they compose into any pipeline.
import acroforge as af
from acroforge import FieldSpec, FieldType
fields = [FieldSpec(type=FieldType.TEXT, page=0, rect=(200, 700, 450, 730), name="full_name")]
fillable = af.build(flat_pdf_bytes, fields) # inject real AcroForm fields
filled = af.fill(fillable, {"full_name": "Jane Doe"}) # set values by name
final = af.flatten(filled) # bake appearances, lock the form
-
buildinjects standards-compliant interactive fields at the coordinates you specify. -
fillsets field values by name. -
flattenbakes the field appearances into the page content, removes interactivity, and locks the result. The flattened PDF is the one you hand to anyone, because there is no interactive layer left to render differently.
Describing fields
A FieldSpec is a plain pydantic model:
class FieldSpec(BaseModel):
type: FieldType
page: int # 0-indexed
rect: tuple[float, float, float, float] # (x0, y0, x1, y1) in PDF points
name: str # AcroForm field name
options: list[str] | None = None # radio group members
maxlen: int | None = None # TEXT cap / COMB cell count
export_value: str | None = None # checkbox/radio on-value
confidence: float = 1.0 # 1.0 = you authored it
The field types cover the forms people actually fill:
-
TEXTsingle-line text, optionalmaxlento cap length. -
COMBthe segmented box style, wheremaxlenis the number of cells. An SSN field ismaxlen=9. -
CHECKBOXwith anexport_valuefor the on-state (default"Yes"). -
RADIOoneFieldSpecper button, sharing aname, each with its ownexport_value. -
SIGNATUREa placeholder signature widget.
Here is text plus a checkbox plus a radio group:
fields = [
FieldSpec(type=FieldType.TEXT, page=0, rect=(200, 700, 450, 730), name="full_name"),
FieldSpec(type=FieldType.COMB, page=0, rect=(200, 660, 360, 690), name="ssn", maxlen=9),
FieldSpec(type=FieldType.CHECKBOX, page=0, rect=(200, 620, 220, 640), name="agree", export_value="Yes"),
FieldSpec(type=FieldType.RADIO, page=0, rect=(200, 580, 220, 600), name="plan", export_value="basic"),
FieldSpec(type=FieldType.RADIO, page=0, rect=(260, 580, 280, 600), name="plan", export_value="pro"),
]
fillable = af.build(flat_pdf_bytes, fields)
final = af.flatten(af.fill(fillable, {"full_name": "Jane Doe", "ssn": "123456789", "agree": True, "plan": "pro"}))
The honest two-layer design
acroforge is deliberately split into two layers, and the split is the whole point.
Layer one is the deterministic engine: build, fill, flatten. You tell it exactly where fields go, and it puts them there, fills them, and flattens them reliably. This works on any PDF, vector or scanned, because it does not need to understand the document. It only needs coordinates. This is the part you should depend on.
Layer two is a best-effort detector, and it is labeled best-effort everywhere on purpose. af.detect(pdf) reads the PDF's vector geometry and nearby text labels and guesses where fields belong, returning a FormManifest where every field carries confidence < 1.0 to flag it as a guess. af.make_fillable(pdf) runs detect then build in one step.
manifest = af.detect(pdf) # guesses, each confidence < 1.0
for f in manifest.fields:
print(f.type, f.name, f.rect, f.confidence)
fillable = af.make_fillable(pdf) # detect() then build(), one call
The detector handles underline-style forms (write-on rules become text fields), bordered table and grid forms (cells become text fields, label-aware), vector checkbox squares, and font-glyph checkboxes like the box and check characters. It is vector-only: scanned or image-only PDFs are refused with ScannedPDFError, because there is no OCR and I would rather refuse than pretend.
I make no accuracy promises about detection. It will miss fields and invent spurious ones, and quality varies wildly by document. The intended workflow is: run detect, review the draft manifest, correct it, and hand the corrected specs to the engine. The fuzzy layer bootstraps you; the deterministic layer is what ships. Keeping that boundary sharp is what stops the guessing from contaminating the guaranteed path.
Cross-viewer correctness is the contract
Back to the first trap. The way acroforge avoids the Chrome-vs-Firefox split is by treating cross-viewer rendering as the actual test contract, not an afterthought.
Every field type has golden-image render tests in two engines: pdfium, which is what Chrome uses, and pdf.js, which is what Firefox uses. A change that makes a field render differently in either engine fails CI. Adobe Reader is a manual spot-check on top of that. The rule I hold the library to is simple: a field does not "work" until it renders correctly across viewers, so I test before claiming it.
It has been exercised on real documents, including IRS forms W-9 and 1040 and a 43-page credentialing packet, which is where comb fields, duplicate field names, and multi-page layouts stop being theoretical.
The zero-copyleft story
The runtime dependency tree is strictly permissive, by design and by enforcement:
| Package | License | Role |
|---|---|---|
| reportlab | BSD | field widget rendering |
| pypdf | BSD-3-Clause | PDF read / merge / flatten |
| pdfplumber | MIT | geometry utilities |
| PyPDFForm | MIT | fill helpers |
| pydantic | MIT | model validation |
No GPL, AGPL, LGPL, or SSPL anywhere in the runtime tree, not even MPL. This is not a promise I am asking you to take on faith. CI enforces it on every push with pip-licenses --fail-on='GPL;AGPL;LGPL;SSPL', so a copyleft dependency cannot sneak in through a transitive bump without turning the build red. You can drop acroforge into a commercial product without a licensing conversation.
read_fields: the inverse of build
read_fields(pdf) ingests the AcroForm fields already in a fillable PDF back into FieldSpecs (real registered fields, so confidence = 1.0). It is the exact inverse of build, which means the two round-trip:
specs = af.read_fields(open("fillable.pdf", "rb").read()) # -> list[FieldSpec]
# copy one form's field layout onto another PDF
af.build(other_pdf, af.read_fields(template_pdf))
You get one FieldSpec per widget with coordinates, type, name, and checkbox/radio on-states recovered. Dropdowns come back as text; pushbuttons are skipped. It is handy for inspecting an existing form, diffing layouts, or lifting a known-good field arrangement onto a new document.
Try it and tell me where it breaks
acroforge 0.2.0 is live on PyPI and the source is on GitHub. It is small, focused, and meant to stay that way.
If you run it against a form that renders wrong in some viewer, or a layout the detector mangles, that is exactly the report I want. Open an issue with the PDF (or a reproducer) and the viewer. And if it saves you from an AGPL dependency or a cloud round-trip, a star helps other people find it.
pip install acroforge
Top comments (0)