DEV Community

boiledsteak
boiledsteak

Posted on • Updated on

Automatic PDF Form Filler App Part 1: CLI

During my year 3 polytechnic internship, I was given an extremely mundane and repetitive task. I won't go into specifics but I knew I could automate the process.

This was my solution; a python app that reads from a .csv file and automatically fills a PDF form with data from the list.
Initially I wrote a simple local app to be run on command line but I further improved it to be a full fledged web app.

This post is part one of two: Command Line Interface app

The app essentially makes use of the pdfjinja library together with a read from .csv function.

Disclaimer: I am a security student with no professional programming/ software engineer experience so my code may not be following best practices...but it works

Contents

All files needed can be found in my github repository. Higher resolution versions of all images used can be found in there too.

πŸš€ back to contents

Prerequisites

Python

This post is targeted to novice Python programmers with some experience so pip, virtual environments, dependencies, these are to name of but a few of the many Python concepts needed.

The app is written with Python 3.9.6. You need to have Python installed together with all other libraries and dependencies used. You could create an environment, but I didn’t bother so my dependencies are global hahaha.
And of course, ensure you have pip installed as well.

πŸš€ back to contents

pdfjinja

Download pdfjinja from pip. This is to enable variable interpolation within PDF files.

pip install pdfjinja
Enter fullscreen mode Exit fullscreen mode

πŸš€ back to contents

PDFtk

Short for PDF toolkit, the PDFjinja github states that PDFtk is needed. Download the PDFtk app.
Download the PDFtk library from pip as well

pip install pypdftk
Enter fullscreen mode Exit fullscreen mode

πŸš€ back to contents

PDF Form

And of course, you will need a PDF form to fill. I have included examples in my github repo. The PDF form doesn't actually have to be like an actual form for example, an application form or a particulars form. It can be any document that has fixed fields to fill, in the format of a PDF form.
To elaborate what I meant, the app can be used to create name cards that need to be filled with names from a list. It just depends on how the PDF form is designed.

However to create a PDF form, you need to use Adobe Acrobat Pro. Acrobat Reader does not have the function to create PDF forms. Perhaps there are free alternatives out there but for my case I used Acrobat Pro.

πŸš€ back to contents

Jinja Templates

After creating the PDF form, you will need to set "variable names" for the fields you want to programmatically fill.

This is I did it with Adobe Acrobat Pro.

adding jinja template with Acrobat Pro

Right click on the form field and open its properties. In the "Tooltip" field, insert your desired 'variable' name and enclose it with

{{  }}
Enter fullscreen mode Exit fullscreen mode

So for instance I want to name the variable "name", I would insert

{{name}}
Enter fullscreen mode Exit fullscreen mode

πŸš€ back to contents

Dataset

Next, you need to create a .csv with the PDF form field names as the column names. I have included examples in my github repo.

πŸš€ back to contents

The Code

Oh boy here comes the spaghetti. This part is a bit more complicated. You need to modify the code to specify where the files will be read and output to. And to make the input/ output simpler, run the .py relative to where you want the I/O to be.

I developed this app in Windows 10 and ran it with PowerShell. You might run into issues if you're using other operating systems. Contact me if you do, I'll try to help.

# Auto PDF filler
import os
import csv
import sys
import pprint
from pdfjinja import PdfJinja
import shutil
import pathlib
import pypdftk

# glabal variables
datasetPath="ds1.csv"
templatePath="form1.pdf"
group="groupA"



#### <-------[ START OF FUNCTIONS]------->
# create list of dictionaries. One dict woud be one dataset. The whole list carries all data
def lister(csvPath):
    try:
        print("\nreading CSV from\n"+csvPath, file=sys.stderr)
        reader = csv.DictReader(open(csvPath, 'r'))
        theList = []
        for line in reader:
            theList.append(line)

        pprint.pprint(theList)
        return theList

    except Exception as e:
        print(e, file=sys.stderr)

# takes in list and writes PDFs of them
def PDFer(allData, pdfPath, group):
    try:
        print("\nreading PDF from\n"+pdfPath, file=sys.stderr)
        thePDF = PdfJinja(pdfPath)
        print("\ncreating filled PDFs ...")
        # will always overwrite if existing group name folder exists
        shutil.rmtree("./filled/"+group, ignore_errors=True)
        pathlib.Path('./filled/'+group).mkdir(parents=True)
        count = 0
        for x in allData:
            count = count + 1
            pdfout = thePDF(x)
            pdfout.write(open("./filled/"+group+"/filled" +"-"+
                              group+"-"+str(count)+".pdf", "wb"))

        print(str(count)+" files created for "+group)

    except Exception as e:
        print(e, file=sys.stderr)

# Reads PDFs from specified directory and compiles them into single PDF with many pages
def masher(pdfdir, group):
    allPDF = []
    for file in os.listdir(pdfdir):
        allPDF.append(os.path.normpath(os.path.join(pdfdir, file))  )

    try:
        # creates the "compiled" folder. If it already exists, do nothing
        pathlib.Path('./compiled').mkdir(parents=True, exist_ok=True)
        outFilePath = "./compiled/all-forms-"+group+".pdf"
        pypdftk.concat(allPDF, outFilePath)
        print("\n\nCompleted compiling PDFs!")
        print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        outFile = "all-forms-"+group+".pdf"

        return outFile

    except Exception as e:
        print(e)
        print("\nerror compiling PDFs\n")

#### <-------[END OF FUNCTIONS]------->


# Calling the functions
PDFer(lister(datasetPath), templatePath, group)
outFileName = masher("./filled/"+group, group)
print("output file name: "+outFileName+"\n\n")
Enter fullscreen mode Exit fullscreen mode

First, you need to have the pdf form and the csv dataset in the same directory as the .py app. It should look something like this.

initial file structure

Next, modify the global variables to fit your pdf form and csv dataset file names. You can also change the group name if you're using this app for several groups. For instance filling forms for Client A and Client B.

global variables

You could take the fillerz.py file from mu github repo instead.

I know, this is hardcoded and could be better. The web application improves on this!

πŸš€ back to contents

Run!

With everything set up, you can finally run the app! I used PowerShell on Windows 10. I believe the code can run on Linux too you might just need to use a different library for modifying files and directories. Contact me if you need help with this.

python fillerz.py
Enter fullscreen mode Exit fullscreen mode

You should see the output file being output in a folder called "compiled". I know the app and its process is quite clunky. This was my first attempt to get the job done. See part 2 for a beautified web app solution!

πŸš€ back to contents

That's it!

Thank you for reading. As mentioned in my disclaimer, I'm still learning, I am definitely no expert but this solved my issue and I hope it helps someone out there :)

Top comments (1)

Collapse
 
pramit_shende_3844e853dd7 profile image
Pramit Shende

Could you help me out with altering the code slightly.