DEV Community

Grace Huynh
Grace Huynh

Posted on

How to Convert PDF File to Word file on Python?

To convert a PDF file to a Word document using Python, you will need to use a library called PyPDF2. This library allows you to read and write PDF files, as well as perform other operations such as merging and splitting PDFs.

To use PyPDF2, you will first need to install it. You can do this by running the following command:

pip install PyPDF2

Once PyPDF2 is installed, you can use it to convert a PDF file to a Word document. Here is an example of how you might do this:

# Import the PyPDF2 library
import PyPDF2

# Open the PDF file for reading
with open("input.pdf", "rb") as input_file:
    # Create a PdfFileReader object to read the PDF file
    pdf_reader = PyPDF2.PdfFileReader(input_file)

    # Open the Word document for writing
    with open("output.docx", "wb") as output_file:
        # Create a PdfFileWriter object to write the Word document
        pdf_writer = PyPDF2.PdfFileWriter()

        # Loop through each page of the PDF file
        for page_num in range(pdf_reader.numPages):
            # Get the current page
            page = pdf_reader.getPage(page_num)

            # Add the page to the Word document
            pdf_writer.addPage(page)

        # Write the Word document
        pdf_writer.write(output_file)

Enter fullscreen mode Exit fullscreen mode

This code first imports the PyPDF2 library, then opens the input PDF file for reading using the open() function. A PdfFileReader object is then created to read the PDF file.

Next, the output Word document is opened for writing, and a PdfFileWriter object is created to write the Word document. The code then loops through each page of the PDF file and adds it to the Word document using the addPage() method. Finally, the Word document is written to the output file using the write() method.

Keep in mind that this code is just an example, and you may need to modify it to fit your specific needs. Additionally, converting a PDF file to a Word document in this way may not retain all of the formatting and layout of the original PDF file.

Top comments (3)

Collapse
 
mellen profile image
Matt Ellen • Edited

I'm not sure this does create a Word document. According to the PyPDF2 documentation PdfWriter (previously called PdfFileWriter, but renamed in version 2) creates PDF files, not word files.

Word can open PDF documents, so that's probably why it looks like a transformation.

Collapse
 
szabgab profile image
Gabor Szabo

Nice post.

Are you aware that you could add the language after the opening 3 back-ticks and get syntax highlighting.

Writing this

```python
# Import the PyPDF2 library
import PyPDF2
with open("input.pdf", "rb") as input_file:
```
Enter fullscreen mode Exit fullscreen mode

you'd get this:

# Import the PyPDF2 library
import PyPDF2
with open("input.pdf", "rb") as input_file:
Enter fullscreen mode Exit fullscreen mode
Collapse
 
giancannavaro profile image
GianCannavaro

What I like more is
pip install PyPDF2[crypto] that allows you to encode your PDF files