DEV Community

Cover image for One way to converting PDF to HTML in Java
Carinababy
Carinababy

Posted on

One way to converting PDF to HTML in Java

While PDF files are suitable for printing and publishing, not all types of documents are suitable for this format. For example, documents containing complex charts and graphs may not render well in PDF. HTML files, on the other hand, can be read and displayed on any computer that can run a browser. In addition, HTML has the advantages of occupying less server resources and being easily indexed by search engines. So, in this article we will explain how to convert a PDF document to HTML format through a Java application. The following are the specific steps and methods I compiled, and provide the Java code as a reference.

Programming Environment

Method 1:
Introduced manually. Download Free Spire.PDF for Java locally, unzip it, and find the Spire.PDF.jar file in the lib folder. Open the following interface in IDEA, and import the jar file in the local path into the Java program:
Image description
Method 2:
If you use Maven, you can easily import the JAR file in your application by adding the following code to your project’s pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf.free</artifactId>
        <version>5.1.0</version>
    </dependency>
</dependencies>
Enter fullscreen mode Exit fullscreen mode

Convert a PDF document to an HTML file in Java

Free Spire.PDF for Java offers PdfDocument.saveToFile(String filename, FileFormat.HTML) method to convert PDF to HTML for users. The detailed steps are as follows.

  • Create an object of PdfDocument.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Save the PDF file as an HTML file using PdfDocument.saveToFle() method.

Full Code

import com.spire.pdf.*;

public class convertPDFToHTML {
    public static void main(String[] args) {

        //Create an object of PdfDocument
        PdfDocument pdf = new PdfDocument();

        //Load a PDF file
        pdf.loadFromFile("C:/Guide to a Foreign Past.pdf");

        //Save the PDF file as an HTML file
        pdf.saveToFile("PDFToHTML.html",FileFormat.HTML);
        pdf.close();
    }
}
Enter fullscreen mode Exit fullscreen mode

Effective Shot

Image description

Conclusion:

In addition to PDF to Word conversion, Spire.PDF for Java offers many other useful features. For example, you can use Spire.PDF to insert text/image watermark to the PDF. Additionally, Spire.PDF for Java can be applied easily to convert PDF to Word, and PDF to PDF/A in high quality. In short, Spire.PDF for Java is a useful tool to convert PDF to Word. With the additional features it offers, it is a complete solution for all your PDF file needs.

Top comments (0)