When working with Foxit's PDF Services, you'll remember that the basic flow involves:
- Uploading your document to Foxit to get an ID
- Starting a job
- Checking the job
- Downloading the result
This is handy for one off operations, for example, converting a Word document to PDF, but what if you need to do two or more operations? Luckily this is easy enough by simply handing off one result to the next. Let's take a look at how this can work.
Credentials
Remember, to start developing and testing with the APIs, you'll need to head over to our developer portal and grab a set of free credentials. This will include a client ID and secret values you’ll need to make use of the API.
If you would rather watch a video (or why not both?) – you can watch the walkthrough below:
Creating a Document Optimization Workflow
To demonstrate how to chain different operations together, we're going to build a basic document optimization workflow that will:
- Compress the document by reducing image resolution and other compression algorithims.
- Linearize the document to make it better viewable on the web.
Given the basic flow described above, you may be tempted to do this:
- Upload the PDF
- Kick off the Compress job
- Check until done
- Download the compressed PDF
- Upload the PDF
- Kick off the Linearize job
- Check until done
- Download the compressed and linearized PDF
This wouldn’t require much code, but we can simplify the process by using the result of the compress job—once it's complete—as the source for the linearize job. This gives us the following streamlined flow:
- Upload the PDF
- Kick off the Compress job
- Check until done
- Kick off the Linearize job
- Check until done
- Download the compressed and linearized PDF
Less is better! Alright, let's look at the code.
First, here's the typical code used to bring in our credentials from the environment, and define the Upload
job:
import os
import requests
import sys
from time import sleep
CLIENT_ID = os.environ.get('CLIENT_ID')
CLIENT_SECRET = os.environ.get('CLIENT_SECRET')
HOST = os.environ.get('HOST')
def uploadDoc(path, id, secret):
headers = {
"client_id":id,
"client_secret":secret
}
with open(path, 'rb') as f:
files = {'file': f}
request = requests.post(f"{HOST}/pdf-services/api/documents/upload", files=files, headers=headers)
return request.json()
Next, here are two utility methods to wrap calling Compress and Linearize:
def compressPDF(doc, level, id, secret):
headers = {
"client_id":id,
"client_secret":secret,
"Content-Type":"application/json"
}
body = {
"documentId":doc,
"compressionLevel":level
}
request = requests.post(f"{HOST}/pdf-services/api/documents/modify/pdf-compress", json=body, headers=headers)
return request.json()
def linearizePDF(doc, id, secret):
headers = {
"client_id":id,
"client_secret":secret,
"Content-Type":"application/json"
}
body = {
"documentId":doc
}
request = requests.post(f"{HOST}/pdf-services/api/documents/optimize/pdf-linearize", json=body, headers=headers)
return request.json()
Note that the compressPDF
method takes a required level
argument that defines the level of compression. From the docs, we can see the supported values are LOW
, MEDIUM
, and HIGH
.
Now, two more utility methods - one that checks the task returned by the API operations above and one that downloads a result to the file system:
def checkTask(task, id, secret):
headers = {
"client_id":id,
"client_secret":secret,
"Content-Type":"application/json"
}
done = False
while done is False:
request = requests.get(f"{HOST}/pdf-services/api/tasks/{task}", headers=headers)
status = request.json()
if status["status"] == "COMPLETED":
done = True
# really only need resultDocumentId, will address later
return status
elif status["status"] == "FAILED":
print("Failure. Here is the last status:")
print(status)
sys.exit()
else:
print(f"Current status, {status['status']}, percentage: {status['progress']}")
sleep(5)
def downloadResult(doc, path, id, secret):
headers = {
"client_id":id,
"client_secret":secret
}
with open(path, "wb") as output:
bits = requests.get(f"{HOST}/pdf-services/api/documents/{doc}/download", stream=True, headers=headers).content
output.write(bits)
Alright, so that's all the utility methods and setup. Time to actually do what we said we would:
input = "../../inputfiles/input.pdf"
print(f"File size of input: {os.path.getsize(input)}")
doc = uploadDoc(input, CLIENT_ID, CLIENT_SECRET)
print(f"Uploaded doc to Foxit, id is {doc['documentId']}")
task = compressPDF(doc["documentId"], "HIGH", CLIENT_ID, CLIENT_SECRET)
print(f"Created task, id is {task['taskId']}")
result = checkTask(task["taskId"], CLIENT_ID, CLIENT_SECRET)
print("Done converting to PDF. Now doing linearize.")
task = linearizePDF(result["resultDocumentId"], CLIENT_ID, CLIENT_SECRET)
print(f"Created task, id is {task['taskId']}")
result = checkTask(task["taskId"], CLIENT_ID, CLIENT_SECRET)
print("Done with linearize task.")
output = "../../output/really_optimized.pdf"
downloadResult(result["resultDocumentId"], output , CLIENT_ID, CLIENT_SECRET)
print(f"Done and saved to: {output}.")
print(f"File size of output: {os.path.getsize(output)}")
This code matches the flow described above, with the exception of outputting the size as a handy way to see the result of the compression call. When run, the initial size is 355994 bytes and the final size is 16733. That's a great saving! You should, however, ensure the result matches the quality you desire and if not, consider reducing the level of compression. Linearize doesn't impact the file size, but as stated above will make it work nicer on the web.
For a complete listing, find the sample on our GitHub repo.
Next Steps
Obviously, you could do even more chaining based on the code above. For example, as part of your optimization flow, you could even split the PDF to return a 'sample' of a document that may be for sale. You could extract information to use for AI purposes and more. Dig more into our PDF Service APIs to get an idea and let us know what you build on our developer forums!
Top comments (0)