DEV Community

Discussion on: Converting Word to PDF Using A Python-Based Lambda

Collapse
 
abhinavwalia95 profile image
Abhinav Walia • Edited

Few things that I change in 2021 to make this work for python 3.8 runtime in Lambda:

  1. According to brotlipy API documentation, change decompressor.process to decompressor.decompress
  2. Build/copy brotlipy dependency from Linux environment, as targeted Lambda runtime is AmazonLinux
  3. Create fonts/fonts.conf in your dependency package with following content (assuming libreoffice is extracted under /tmp/instdir dir):

    <?xml version="1.0"?>
    <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
    <fontconfig>
    <dir>/tmp/instdir/share/fonts/truetype</dir>
    <cachedir>/tmp/fonts-cache/</cachedir>
    <config></config>
    </fontconfig>
    
  4. Environment variables:
    FONTCONFIG_FILE= /var/task/fonts/fonts.conf
    HOME=/tmp

  5. Update return statement, from '{}/program/soffice'... to '{}/program/soffice.bin'...

To make use of libreoffice, I've used subprocess in python and please note that you have to call the command twice to make it work (reason:
still unknown).

soffice_path = load_libre_office()
word_file_path = "/tmp/file.docx"
conv_cmd = f"{soffice_path} --headless --norestore --invisible --nodefault --nofirststartwizard --nolockcheck --nologo --convert-to pdf:writer_pdf_Export --outdir /tmp {word_file_path}"
response = subprocess.run(conv_cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if response.returncode != 0:
    response = subprocess.run(conv_cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    if response.returncode != 0:
        print("cannot convert this document to pdf")
Enter fullscreen mode Exit fullscreen mode

Just to bring to your kind notice: I didn't wrap pdf:writer_pdf_Export in quotes like ... --convert-to "pdf:writer_pdf_Export"... because it won't work. Many bloggers wrote this command wrong, resulting in failure of conversion.

Enjoy serverless libreoffice with python, Cheers!