DEV Community

Sohail
Sohail

Posted on

11 1

Convert Microsoft Word Document to Other Formats in C++

Sometimes we need to programmatically convert Microsoft Word Document to PDF, HTML, EPUB or various other formats. But, most of the existing libraries require us to write a humungous amount of code. Moreover, complex documents are not converted correctly, either content is disturbed of the resulting document, fonts are missing, tables and lists are rendered incorrectly.

Fortunately, Aspose.Words for C++, a native class library, let us easily and reliably converts documents from one format to another. It just requires two lines of code:

  1. Load a document into a Document object using one of its constructors. By default, Aspose.Words will even auto-detect the file format for us.
  2. Invoke one of the Document.Save methods on the Document object and specify the desired output format.

Convert a Word Document to PDF

To convert a Microsoft Word document to PDF simply invoke the Document.Save method and specify a file name with the “.pdf” extension. The code sample below converts a whole document from DOC to PDF using default options.

// Load the document from disk.
System::SharedPtr<Document> doc = System::MakeObject<Document>(inputDataDir + u"Rendering.doc");
System::String outputPath = outputDataDir + u"Doc2Pdf.SaveDoc2Pdf.pdf";
// Save the document in PDF format.
doc->Save(outputPath);

Convert a Word Document to HTML

To convert a Microsoft Word document to HTML simply invoke the Document.Save method and specify a file name with the “.html” extension.

// The path to the documents directories.
System::String inputDataDir = GetInputDataDir_LoadingAndSaving();
System::String outputDataDir = GetOutputDataDir_LoadingAndSaving();
// Load the document from disk.
System::SharedPtr<Document> doc = System::MakeObject<Document>(inputDataDir + u"Test File (doc).doc");
System::SharedPtr<HtmlSaveOptions> options = System::MakeObject<HtmlSaveOptions>();
// HtmlSaveOptions.ExportRoundtripInformation property specifies
// Whether to write the roundtrip information when saving to HTML, MHTML or EPUB.
// Default value is true for HTML and false for MHTML and EPUB.
options->set_ExportRoundtripInformation(true);
doc->Save(outputDataDir + u"ConvertDocumentToHtmlWithRoundtrip.html", options);

Export Fonts to HTML in Base64 Encoding

Using Aspose.Words, we can check whether font resources should be embedded to HTML in base 64 encodings. By default, the value is false and fonts are written to separate files. If this option is set to true, fonts will be embedded into the document's CSS in Base64 encoding. The property affects only HTML format and doesn't affect EPUB and MHTML. This is an extension to saveOptions->set_ExportFontResources option and ExportFontsAsBase64 will work only if this property set to true. Below example shows how to export fonts to HTML in Base64 encoding.

// The path to the documents directories.
System::String inputDataDir = GetInputDataDir_LoadingAndSaving();
System::String outputDataDir = GetOutputDataDir_LoadingAndSaving();
System::SharedPtr<Document> doc = System::MakeObject<Document>(inputDataDir + u"Document.doc");
System::SharedPtr<HtmlSaveOptions> saveOptions = System::MakeObject<HtmlSaveOptions>();
saveOptions->set_ExportFontResources(true);
saveOptions->set_ExportFontsAsBase64(true);
System::String outputPath = outputDataDir + u"ExportFontsAsBase64.html";
doc->Save(outputPath, saveOptions);

For more HtmlSaveOptions, please check the documentation and API Reference.

Convert a Document to EPUB

The electronic publication (an EPUB) document is an HTML-based format commonly used for electronic book distribution. This format is fully supported in Aspose.Words for exporting electronic books compatible with the majority of devices used for reading. This code sample shows how to convert simple MS Word document to EPUB with a few lines of code.

// Load the document from disk.
System::SharedPtr<Document> doc = System::MakeObject<Document>(inputDataDir + u"Document.EpubConversion.doc");
// Create a new instance of HtmlSaveOptions. This object allows us to set options that control
// How the output document is saved.
System::SharedPtr<HtmlSaveOptions> saveOptions = System::MakeObject<HtmlSaveOptions>();
// Specify the desired encoding.
saveOptions->set_Encoding(System::Text::Encoding::get_UTF8());
// Specify at what elements to split the internal HTML at. This creates a new HTML within the EPUB
// which allows you to limit the size of each HTML part. This is useful for readers which cannot read
// HTML files greater than a certain size e.g 300kb.
saveOptions->set_DocumentSplitCriteria(DocumentSplitCriteria::HeadingParagraph);
// Specify that we want to export document properties.
saveOptions->set_ExportDocumentProperties(true);
// Specify that we want to save in EPUB format.
saveOptions->set_SaveFormat(SaveFormat::Epub);
// Export the document as an EPUB file.
doc->Save(outputDataDir + u"ConvertDocumentToEPUB.ConvertDocumentToEPUB.epub", saveOptions);

Installation

You may be wondering how to install Aspose.Words for C++ library. Please check this article, it provides step by step instructions to install the library.

Conversion is just one feature of Aspose.Words for C++, please check the Documentation to know about a number of other features.

If you need any assistance regarding Aspose.Words for C++, please visit Aspose.Forums. You can create a new topic over Aspose.Words for C++ forums and your post will be answered within a few hours.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more