George Moustakas
George Moustakas

Posted on


PDF file manipulation with python 3 (Problem)

Hi, I have a problem I would like help with, i'm working on a project/script with python 3 where I want to manipulate a large PDF file (50-60 plus pages long) where I would like to find a specific keyword in that file, this keyword is repeated multiple times in the file and each time this keyword is referring to a different data set, then save how many times the keyword was found, in what pages was found and then split those pages from the original file and then merge those pages together in a single file.

I will use multithreading of course, because this script will run alongside other's in a small in-house server and it's already running quite a lot, other scripts.

I found some things online but no luck in what my problem is, except some python libraries that is possible to do what i'm looking for, but i have no idea how i will found this keyword in the file, because the keyword isn't in the same page order in the files, it's different in every file!!

