Raymond Camden for Foxit Developers

Posted on Aug 8 • Originally published at developer-api.foxit.com

How to Chain PDF Actions with Foxit

#pdf #api #python

When working with Foxit's PDF Services, you'll remember that the basic flow involves:

Uploading your document to Foxit to get an ID
Starting a job
Checking the job
Downloading the result

This is handy for one off operations, for example, converting a Word document to PDF, but what if you need to do two or more operations? Luckily this is easy enough by simply handing off one result to the next. Let's take a look at how this can work.

Credentials

Remember, to start developing and testing with the APIs, you'll need to head over to our developer portal and grab a set of free credentials. This will include a client ID and secret values you’ll need to make use of the API.

If you would rather watch a video (or why not both?) – you can watch the walkthrough below:

Creating a Document Optimization Workflow

To demonstrate how to chain different operations together, we're going to build a basic document optimization workflow that will:

Compress the document by reducing image resolution and other compression algorithims.
Linearize the document to make it better viewable on the web.

Given the basic flow described above, you may be tempted to do this:

Upload the PDF
Kick off the Compress job
Check until done
Download the compressed PDF
Upload the PDF
Kick off the Linearize job
Check until done
Download the compressed and linearized PDF

This wouldn’t require much code, but we can simplify the process by using the result of the compress job—once it's complete—as the source for the linearize job. This gives us the following streamlined flow:

Upload the PDF
Kick off the Compress job
Check until done
Kick off the Linearize job
Check until done
Download the compressed and linearized PDF

Less is better! Alright, let's look at the code.

First, here's the typical code used to bring in our credentials from the environment, and define the Upload job:

import os
import requests
import sys 
from time import sleep 

CLIENT_ID = os.environ.get('CLIENT_ID')
CLIENT_SECRET = os.environ.get('CLIENT_SECRET')
HOST = os.environ.get('HOST')

def uploadDoc(path, id, secret):

    headers = {
        "client_id":id,
        "client_secret":secret
    }

    with open(path, 'rb') as f:
        files = {'file': f}

        request = requests.post(f"{HOST}/pdf-services/api/documents/upload", files=files, headers=headers)
        return request.json()

Next, here are two utility methods to wrap calling Compress and Linearize:

def compressPDF(doc, level, id, secret):

    headers = {
        "client_id":id,
        "client_secret":secret,
        "Content-Type":"application/json"
    }

    body = {
        "documentId":doc,
        "compressionLevel":level    
    }

    request = requests.post(f"{HOST}/pdf-services/api/documents/modify/pdf-compress", json=body, headers=headers)
    return request.json()

def linearizePDF(doc, id, secret):

    headers = {
        "client_id":id,
        "client_secret":secret,
        "Content-Type":"application/json"
    }

    body = {
        "documentId":doc
    }

    request = requests.post(f"{HOST}/pdf-services/api/documents/optimize/pdf-linearize", json=body, headers=headers)
    return request.json()

Note that the compressPDF method takes a required level argument that defines the level of compression. From the docs, we can see the supported values are LOW, MEDIUM, and HIGH.

Now, two more utility methods - one that checks the task returned by the API operations above and one that downloads a result to the file system:

def checkTask(task, id, secret):

    headers = {
        "client_id":id,
        "client_secret":secret,
        "Content-Type":"application/json"
    }

    done = False
    while done is False:

        request = requests.get(f"{HOST}/pdf-services/api/tasks/{task}", headers=headers)
        status = request.json()
        if status["status"] == "COMPLETED":
            done = True
            # really only need resultDocumentId, will address later
            return status
        elif status["status"] == "FAILED":
            print("Failure. Here is the last status:")
            print(status)
            sys.exit()
        else:
            print(f"Current status, {status['status']}, percentage: {status['progress']}")
            sleep(5)

def downloadResult(doc, path, id, secret):

    headers = {
        "client_id":id,
        "client_secret":secret
    }

    with open(path, "wb") as output:

        bits = requests.get(f"{HOST}/pdf-services/api/documents/{doc}/download", stream=True, headers=headers).content 
        output.write(bits)

Alright, so that's all the utility methods and setup. Time to actually do what we said we would:

input = "../../inputfiles/input.pdf"
print(f"File size of input: {os.path.getsize(input)}")
doc = uploadDoc(input, CLIENT_ID, CLIENT_SECRET)
print(f"Uploaded doc to Foxit, id is {doc['documentId']}")

task = compressPDF(doc["documentId"], "HIGH", CLIENT_ID, CLIENT_SECRET)
print(f"Created task, id is {task['taskId']}")

result = checkTask(task["taskId"], CLIENT_ID, CLIENT_SECRET)
print("Done converting to PDF. Now doing linearize.")

task = linearizePDF(result["resultDocumentId"], CLIENT_ID, CLIENT_SECRET)
print(f"Created task, id is {task['taskId']}")

result = checkTask(task["taskId"], CLIENT_ID, CLIENT_SECRET)
print("Done with linearize task.")

output = "../../output/really_optimized.pdf"
downloadResult(result["resultDocumentId"], output , CLIENT_ID, CLIENT_SECRET)
print(f"Done and saved to: {output}.")
print(f"File size of output: {os.path.getsize(output)}")

This code matches the flow described above, with the exception of outputting the size as a handy way to see the result of the compression call. When run, the initial size is 355994 bytes and the final size is 16733. That's a great saving! You should, however, ensure the result matches the quality you desire and if not, consider reducing the level of compression. Linearize doesn't impact the file size, but as stated above will make it work nicer on the web.

For a complete listing, find the sample on our GitHub repo.

Next Steps

Obviously, you could do even more chaining based on the code above. For example, as part of your optimization flow, you could even split the PDF to return a 'sample' of a document that may be for sale. You could extract information to use for AI purposes and more. Dig more into our PDF Service APIs to get an idea and let us know what you build on our developer forums!

DEV Community

How to Chain PDF Actions with Foxit

Credentials

Creating a Document Optimization Workflow

Next Steps

Top comments (0)