Have you ever had multiple PDF files that you need to merge into one single document? It is easier than you might think to merge or combine two or more PDF's into one single file in python using the PyPDF2 module.
PyPDF2 is a python library used to work with PDF files. You can use it to extract document information, split document page by page, merge multiple pages, encrypt and decrypt, etc. In this tutorial, you will learn how to merge multiple files using this module.
A program to merge multiple PDF files
You first need to install the package using pip:
pip install PyPDF2
Open any editor of your choice and create a new file "pdfMerger.py". Make sure the PDF files to be appended are in the same directory as the python file.
The following block of code allows you to merge two or more PDF files:
import PyPDF2
mergeFile = PyPDF2.PdfFileMerger()
mergeFile.append(PyPDF2.PdfFileReader('file1.pdf', 'rb'))
mergeFile.append(PyPDF2.PdfFileReader('file2.pdf', 'rb'))
mergeFile.write("NewMergedFile.pdf")
Line 1: Import the PdfFileReader class and PdfFileWriter class from the PyPDF2 module.
Line 2: Created an object of the PdfFileMerger class and assign it to mergeFile
Line 3 and 4: Used the append method to concatenate all pages onto the end of the file
Line 5: Writes all data that has been merged to NewMergedFile
The code block above looks very simple but what if you would like to merge more than two files? You would have to repeat line 3 for each file you want to add and this will make your program very long. You can use a for loop in this situation.
The following block of code is another way to merge mutliple PDF files
import PyPDF2
def merge_pdfs(_pdfs):
mergeFile = PyPDF2.PdfFileMerger()
for _pdf in _pdfs:
mergeFile.append(PyPDF2.PdfFileReader(_pdf, 'rb'))
mergeFile.write("New_Merged_File.pdf")
if __name__ == '__main__':
_pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']
merge_pdfs(_pdfs)
Line 2: Define a function merge_pdfs which takes a list _pdfs as a parameter.
Line 4: A for loop to loop through the list _pdfs and concatenate the pages.
Line 7: Check if the python file is the main module or it's been imported.
Line 8: Specify the list of files
Line 9: Call the function
I hope you enjoyed this short and simple tutorial! 😎
Top comments (1)
nice tutorial!
after merging the booksmarks of pdfs dont update to correct pagenumber,have u faced the issue?