DEV Community

Tilal Ahmad Sana
Tilal Ahmad Sana

Posted on • Originally published at

Convert PDF to Editable DOCX with Python

While working with document conversion feature, you came across a requirement to convert PDF to DOCX. I would like to introduce GroupDocs.Conversion Cloud SDK for Python for the purpose. It can also convert all popular industry standard documents from one format to another without depending on any third-party tool or software.

All you need to convert PDF to DOCX in Python follow these steps:

  • Before we begin with coding, sign up with to get your APP SID and APP Key.

  • Install groupdocs-conversion-cloud package from pypi with the following command.

>pip install groupdocs-conversion-cloud

  • Open your favorite editor and copy paste following code into the script file
    1. Import the GroupDocs.Conversion Cloud Python package
    2. Initialize the API
    3. Upload source PDF document to GroupDocs default storage
    4. Convert the PDF document to editable DOCX
# Import module
import groupdocs_conversion_cloud

# Get your app_sid and app_key at (free registration is required).
app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)


        #upload soruce file to storage
        filename = 'Sample.pdf'
        remote_name = 'Sample.pdf'
        output_name= 'sample.docx'

        request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
        response_upload = file_api.upload_file(request_upload)
        #Convert PDF to Word document
        settings = groupdocs_conversion_cloud.ConvertSettings()
        settings.file_path =remote_name
        settings.format = strformat
        settings.output_path = output_name

        loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
        loadOptions.hide_pdf_annotations = True
        loadOptions.remove_embedded_files = False
        loadOptions.flatten_all_fields = True

        settings.load_options = loadOptions

        convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
        convertOptions.from_page = 1
        convertOptions.pages_count = 1

        settings.convert_options = convertOptions
        request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
        response = convert_api.convert_document(request)

        print("Document converted successfully: " + str(response))
except groupdocs_conversion_cloud.ApiException as e:
        print("Exception when calling get_supported_conversion_types: {0}".format(e.message))
  • And that’s it. PDF document is converted to DOCX and API response includes the URL of the resultant document. Read more.

Alt Text

Top comments (3)

medhagupta098 profile image
Medha • Edited

Hello @tilalahmad ! I fail to execute the above code correctly. I've added the app_sid and app_key to the code as per my created test.pdf app on groupdocs. This error is coming when I run python3 It does give that the document is converted successfully and when I go to the url, there's an error. I'm sharing the ss of the same

tilalahmad profile image
Tilal Ahmad Sana

If your conversion is successful then either you can view/download your file from cloud storage using or download file to local drive as follows:

#Download Document from Storage        
request_download = groupdocs_conversion_cloud.DownloadFileRequest(output_name)
response_download = file_api.download_file(request_download)

copyfile(response_download, 'sample_copy.docx')
print("Result {}".format(response_download))
medhagupta098 profile image

Unable to attach the ss