DEV Community

loading...

Convert PDF to Editable DOCX with Python

tilalahmad profile image Tilal Ahmad Sana Originally published at blog.groupdocs.cloud ・2 min read

While working with document conversion feature, you came across a requirement to convert PDF to DOCX. I would like to introduce GroupDocs.Conversion Cloud SDK for Python for the purpose. It can also convert all popular industry standard documents from one format to another without depending on any third-party tool or software.

All you need to convert PDF to DOCX in Python follow these steps:

  • Before we begin with coding, sign up with groupdocs.cloud to get your APP SID and APP Key.

  • Install groupdocs-conversion-cloud package from pypi with the following command.

>pip install groupdocs-conversion-cloud

  • Open your favorite editor and copy paste following code into the script file
    1. Import the GroupDocs.Conversion Cloud Python package
    2. Initialize the API
    3. Upload source PDF document to GroupDocs default storage
    4. Convert the PDF document to editable DOCX
# Import module
import groupdocs_conversion_cloud

# Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)

try:

        #upload soruce file to storage
        filename = 'Sample.pdf'
        remote_name = 'Sample.pdf'
        output_name= 'sample.docx'
        strformat='docx'

        request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
        response_upload = file_api.upload_file(request_upload)
        #Convert PDF to Word document
        settings = groupdocs_conversion_cloud.ConvertSettings()
        settings.file_path =remote_name
        settings.format = strformat
        settings.output_path = output_name

        loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
        loadOptions.hide_pdf_annotations = True
        loadOptions.remove_embedded_files = False
        loadOptions.flatten_all_fields = True

        settings.load_options = loadOptions

        convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
        convertOptions.from_page = 1
        convertOptions.pages_count = 1

        settings.convert_options = convertOptions
 .               
        request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
        response = convert_api.convert_document(request)

        print("Document converted successfully: " + str(response))
except groupdocs_conversion_cloud.ApiException as e:
        print("Exception when calling get_supported_conversion_types: {0}".format(e.message))
  • And that’s it. PDF document is converted to DOCX and API response includes the URL of the resultant document. Read more.

Alt Text

Discussion (3)

pic
Editor guide
Collapse
medhagupta098 profile image
medhagupta098 • Edited

Hello @tilalahmad ! I fail to execute the above code correctly. I've added the app_sid and app_key to the code as per my created test.pdf app on groupdocs. This error is coming when I run python3 filename.py. It does give that the document is converted successfully and when I go to the url, there's an error. I'm sharing the ss of the same

Collapse
tilalahmad profile image
Tilal Ahmad Sana Author

@medhagupta098
If your conversion is successful then either you can view/download your file from cloud storage using dashboard.groupdocs.cloud or download file to local drive as follows:

#Download Document from Storage        
request_download = groupdocs_conversion_cloud.DownloadFileRequest(output_name)
response_download = file_api.download_file(request_download)

copyfile(response_download, 'sample_copy.docx')
print("Result {}".format(response_download))
Collapse
medhagupta098 profile image
medhagupta098

Unable to attach the ss