DEV Community

Gia
Gia

Posted on

How to convert Word to HTML file format via Java

Introduction:

An HTML file is a text file, written in Hypertext Markup Language, used to create web pages. It contains information such as the structure, content and style of the web page. Converting Word documents to HTML files can convert text, headings, paragraphs, lists, tables, images and other elements in Word documents into corresponding HTML tags and attributes. This transformation not only makes it easy to publish the content of the document on the web, but also ensures that the document is displayed consistently across different platforms and devices. The related methods are described below.

Preparation

Take IntelliJ IDEA 2018 as an example:

  • Download Free Spire.Doc for Java from this Link and decompress the package.
  • Create a new project in IDEA, and then click “File”- “Project Structure” - “Modules” - “Dependencies” in turn.
  • Choose the “JARs or Directories” under the “+”.
  • Find the “Spire.Doc.jar” in the lib folder of the decompressed package and import it to the project.

Code:

import com.spire.doc.*;

public class WordToHtml {
    public static void main(String[] args) {

        //Create a Document instance
        Document document = new Document();

        //Load a Word document
        document.loadFromFile("sample.docx");

        //Save the document as HTML 
        document.saveToFile("output/toHtml.html", FileFormat.Html);
    }
}
Enter fullscreen mode Exit fullscreen mode

In this code, the Document instance is first created. Call the loadFromFile() method to load a Word document, and finally save the document as an HTML file through the saveToFile(String fileName, FileFormat fileFormat)method.

saveToFile(String fileName, FileFormat fileFormat)method provided by Free Spire.Doc for Java supports converting Word documents to various documents, such as PDF, Image, XPS.

Image description

Top comments (0)