Hack WKHTMLTOPDF PDF to enable Adobe Acrobat Field Editing

#cfml #coldfusion

I recently integrated the auto-generation of survey results into a downloadable PDF using ColdFusion and WKHTMLTOPDF 0.12.6. Our client provided a pre-generated PDF cover page with some editable fields that we prepended to the PDF using PDFtk. Unfortunately, all auto-generated bookmarks became unusable after cover page is prepended.

I generated the HTML for a new cover page by using PDF-XChange Editor Plus (portable, perpetual license) to determine the font size, font face and element spacing. In the end, this new made-with-WKTHMLTOPDF version looked far superior to the client-provided cover and seemed to work fine... until my business partner reviewed it. He indicated that it only worked in Edge, but not in Google Chrome or Adobe Acrobat. This seemed very odd since I initially tested it using PDF-XChange. I then tested it using Firefox, Foxit PDF Reader and even the Total Commander List plugin. The PDF's input fields are 100% editable, printable & able to be saved. I then tried it using a workstation that has Adobe Acrobat Reader installed and, sure enough, none of the input fields were editable. $%@!

This issue was a difficult to research, but I did find a complaint on GitHub. The problem is due to the fact that an "interactive" flag is missing, but I don't believe that this is even a feature within WKHTMLTOPDF (apart from the --enable-forms option). WK also generates a 1.4 versioned PDF. Is the interactive option even available in v1.4 PDFs? The hack to make PDFs compatible with Adobe Acrobat required modifying the binary file and performing a search-and-replace. (You can use NotePad++ to do this manually.) Apparently, the Parent string in Annot \Parent nn needs to be changed to something other than Parent. (I chose to use Papent since it was the same length.) The regex pattern Annot\s.{1,5}Parent\s+\d+ enabled the identification of the Parent nn string that needed to be changed. This approach required performing a simple read of the file (to identify the string pattern to use) and then reading the binary PDF file, converting the binary data to HEX and then performing the search-and-replace using HEX-encoded strings.

SHORT STORY: This workaround of having to modify binary files manually transported me back to 1980's when I was a founding member of HACK (Hibbing Area Commodore Club). My parents purchased a C64 for Christmas, but refused to purchase any games. I was a teen and many of my early programming days required entering, searching and/or editing HEX code. I personally purchased books & magazines to "learn-by-doing" and sought out others in the area with similar interests. In 1990, I served as a Chaplain Assistant in the Army in Germany and used GeoWorks for desktop publishing all chapel-related printed publications so that our promoted events looked more appealing on the bulletin boards than the night club located next door. After that, I learned Lotus Approach at my job while attending college. I merged database entries to create a really nice resource book for my senior capstone project. I later used Microsoft Access to convert my capstone project into my first database-driven website using Allaire Cold Fusion 3.0 back in 1997. I've continued being a ColdFusion/CFML developer as I'm still able to get it to do most of what I ever need it to do and enjoy the occasional challenge.

Generating PDFs via WKHTMLTOPDF

If you want to check out any CFML libraries to assist in generating PDFs using WKHTMLTOPDF, here's a couple options:

https://github.com/inLeagueLLC/wkhtmltopdf
https://github.com/abramadams/wkhtml
https://gist.github.com/JamoCA/74e556e7d4f1a715a41d (my 2015 public CFFTag version; I have a newer internal version available that we use. I may need to release an updated version and blog post at some point.)
https://wkhtmltopdf.org/usage/wkhtmltopdf.txt (Here's the command line arguments in case you want to roll-your-own solution.)

Source Code

https://gist.github.com/JamoCA/607e2b1b28f2a55006ba9bdf26d4df9b

Top comments (2)

Harry Klein • Oct 14

Hi James, is wkhtmltopdf still active? I used it before, but found out that the latest release is 4 years old. Switched to gotenberg.dev/

James Moberg • Oct 15

Still active? It uses the QT Webkit rendering engine and hasn't been updated since 2020. The project was archived by the owner in early 2023.

Are you able to "optionally" create editable fields using Gotenberg? (Normally fields are flattened with WKHTMLTOPDF unless the --enable-forms option is used.) I checked the documentation and am not entirely sure.

I currently prefer using WK because I can generate the necessary HTML files (header, body & footer) and the BAT file. When troubleshooting issues, it's easy to manually tweak the HTML directly and re-run the BAT file locally to see what could be done differently/better. It may be a black box, but the entire process is entirely portable (no install or configuration required) and easy to test.

Custom Headers & Footers w/page numbers WKHTMLTOPDF can do this, but requires adding javascript to the header/footer HTML files. I see that Gotenberg has Header and footer support too, but there appears to be some limitations (only fonts installed in Docker, external resources are not loaded, JS is not executed, images must be base 64). These are kinda show stoppers since it would be a pain to have to pass every webfont in order to use it. With that said, I have been meaning to install it to evaluate the conversion functions.

WK Issues: I've only encountered a single issue where a third-party website started using javascript to perform feature tests before rendering the webpage. I had to use Node.js w/Puppeteer to take a screenshot of the rendered page whereas we could just use an iFrame before.

Gotenberg uses a number of different programs to support PDFs. (I already use ExifTool & PDFtk.) unoconv is used to convert to PDF/A & PDF/UA, but it was deprecated back in 2017 (7 years ago). Gotenberg's actual HTML-to-PDF conversion is performed by Chromium and uses similar protocols as Puppeteer for communication behind the scenes.

DEV Community

Hack WKHTMLTOPDF PDF to enable Adobe Acrobat Field Editing

Generating PDFs via WKHTMLTOPDF

Source Code

Top comments (2)

Read next

Docker Swarm vs Kubernetes - A Deep Technical Analysis

Basic SQL Question for Interview, everyone should know

Hero to Zero: How Not To Manage Staff Redundancies

#11 Next.js 15: Revolutionizing Server-Side Rendering (SSR) for Modern Applications😯🤓