Python - PDF to JPG & back Converter

#tutorial #python #pdf2image #img2pdf

Motivation

Quite often, I have to submit the same documents in either .pdf or .jpg/jpeg format. I used online conversion sites to convert between the two formats. Recently a concern had been lingering in my mind that why am I uploading my personal documents on clouds and shouldn't I just create a converter so that the process stays local on my device. Hence, I decided to make a program to do the conversions.

Steps I Took

- Step 1: Decide framework/language for development

I'd like to include this as a documented step :) The reason is short: I do web development in my day-to-day life, so I decided that coding this small program in Python will be refreshing.

- Step 2: Search for packages

You can either implement conversions on your own or you can look for packages that are well-maintained and have a good download count. I chose following packages for conversions:

pdf2image for PDF to JPG conversion.
img2pdf for JPG/JPEG to PDF conversion.

- Step 3: Make GUI

I used PAGE to build static GUI. It is a very simple and easy-to-use drag-and-drop GUI generator. I made two GUIs, one for the converter and the other for showing success message. You can also handle all in one.

Note: tkinter applications look different on different OS. I use Windows. If you want to make application same across all platforms, you can add conditional layouts.

After generating scripts from PAGE, I made the GUI dynamic on the basis of conversion_mode:

If the conversion_mode == 'jpg', then set text accordingly and show entry field for inputting name of output PDF file.
Similarly, if the conversion_mode == 'pdf', then make relevant textual changes. No need to add entry field as the output JPGs will have the PDF's name followed by index.

- Step 4: Upload Files

I used tkinter's filedialog module to upload files. I set initialdir to the directory where script is placed. curr_dir = os.getcwd()

Upload PDF: Since single PDF file will be uploaded, hence filedialog.askopenfilename is used.

filedialog.askopenfilename(initialdir=curr_dir, title="Select PDF",
                                   filetypes=[("PDF File", "*.pdf")])

Upload JPGs: Multiple JPG files can be uploaded to create a PDF file, hence filedialog.askopenfilenames is used.

filedialog.askopenfilenames(initialdir=curr_dir, title="Select JPG(s)",
            filetypes=[("JPG File", "*.jpeg *.jpg")

- Step 5: Convert JPG to PDF:

To convert JPG/JPEG files to a single PDF, simply provide file_paths, which is a list of image file paths, to img2pdf's convert function.

import img2pdf

with open(name+".pdf", "wb+") as file:
        file.write(img2pdf.convert(*file_paths))

name is the output PDF name provided by the user.

- Step 6: Convert PDF to JPG:

To use pdf2image, firstly download poppler on your system. (Instructions). Since I'm a Windows user, I downloaded poppler and placed the bin/ folder in the same directory as the script.
poppler_path=r"./bin" if (sys.platform == "win32" or sys.platform == "cygwin") else None
Image files created are stored in output_folder. To avoid out-of-memory error, I set paths_only=True so that convert_from_path(...) doesn't need to return PIL Image objects.

from pdf2image import convert_from_path
from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

convert_from_path(
        file_path,
        output_folder=output_folder_path,
        output_file=output_folder_name,
        fmt="jpg",
        paths_only=True,
        poppler_path=r"./bin" if (sys.platform == "win32" or sys.platform == "cygwin") else None
)

You can run the function in chunks (see) if the PDF file has hundreds of pages or you want to work on Image object and are concerned about running into OOM error.

After conversion, I shift the user to different interface where success message is displayed.

Get the full implementation from my GitHub.

So this is how I made a PDF ⇌ JPG converter that fulfills my requirements and now I don't have to worry about security/privacy.

Share your thoughts. Feedback's always welcome :)