DEV Community

Pilalo Jovanitho
Pilalo Jovanitho

Posted on

HTML to PDF in Java: A Practical Guide to Document Conversion

Converting HTML to PDF is a common requirement in enterprise applications. Whether it's generating invoices, archiving reports, or creating downloadable documents, the ability to reliably transform web content into portable document format is essential. This article explores practical approaches to HTML-to-PDF conversion in Java, focusing on implementation patterns, common pitfalls, and production-ready considerations.


1. Understanding the Challenge

Before diving into code, it's worth understanding why HTML to PDF conversion isn't as straightforward as it might seem.

Challenge Description
Layout Engines Browsers and PDF renderers interpret CSS differently
Font Availability Server environments often lack fonts used in HTML
Resource Management Images, stylesheets, and external assets need proper handling
Performance Large-scale conversion requires careful resource planning

2. Implementation Approaches

Java developers have several options when implementing HTML to PDF conversion. Here's a comparison of the main approaches:

2.1 Document Model Approach

Libraries like Spire.Doc take an indirect approach: they parse HTML into an internal document model (similar to Word documents), then export to PDF. This works well when you need consistency across multiple document formats.

Basic HTML File Conversion:

import com.spire.doc.*;

public class HtmlToPdfDemo {
    public static void main(String[] args) {
        Document doc = new Document();
        doc.loadFromFile("report.html", FileFormat.Html);
        doc.saveToFile("report.pdf", FileFormat.PDF);
        doc.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

Converting from HTML String:

public class InlineHtmlConverter {
    public static void main(String[] args) {
        Document doc = new Document();
        Section section = doc.addSection();

        String html = """
            <!DOCTYPE html>
            <html>
            <head><title>Invoice</title></head>
            <body>
                <h1>Order Summary</h1>
                <p>Thank you for your purchase.</p>
                <ul>
                    <li>Item: Laptop</li>
                    <li>Quantity: 1</li>
                    <li>Price: $1,299.99</li>
                </ul>
            </body>
            </html>
            """;

        section.addParagraph().appendHTML(html);
        doc.saveToFile("invoice.pdf", FileFormat.PDF);
        doc.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

2.2 Native PDF Generation

Libraries like iText and Apache PDFBox generate PDF directly without an intermediate document model. This approach offers finer control over PDF-specific features.

2.3 Headless Browser Rendering

Tools like Puppeteer (Node.js) or Selenium can be controlled from Java to render HTML in an actual browser engine, then export to PDF. This provides the highest fidelity but introduces external dependencies.


3. Advanced Configuration

Real-world applications often require more than basic conversion. Here are practical configuration patterns.

3.1 Handling External Resources

HTML documents frequently reference external resources. Proper resource loading ensures images and stylesheets appear correctly.

import com.spire.doc.*;
import com.spire.doc.documents.*;

public class ResourceAwareConverter {
    public static void main(String[] args) {
        Document doc = new Document();

        // Set base URL for resolving relative paths
        doc.setBaseUrl("https://yourdomain.com/assets/");

        // Configure HTML load options
        doc.loadFromFile(
            "complex.html", 
            FileFormat.Html, 
            XHTMLValidationType.None  // Looser validation for real-world HTML
        );

        doc.saveToFile("output.pdf", FileFormat.PDF);
        doc.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

3.2 Font Management

Font-related issues are among the most common problems in PDF generation.

public class FontManagementExample {
    public static void main(String[] args) {
        Document doc = new Document();

        // Specify custom font directories
        String[] fontPaths = {
            "/usr/share/fonts",           // Linux
            "C:\\Windows\\Fonts",         // Windows
            "/System/Library/Fonts"       // macOS
        };
        doc.setCustomFontsFolders(fontPaths);

        // Font can also be specified in HTML
        String html = """
            <div style="font-family: 'Arial', 'Helvetica', sans-serif;">
                This text will use the first available font.
            </div>
            """;

        doc.getSections().get(0).addParagraph().appendHTML(html);
        doc.saveToFile("output.pdf", FileFormat.PDF);
        doc.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

4. Performance Optimization

When processing documents at scale, consider these optimization strategies.

4.1 Resource Management

Always dispose Document objects to prevent memory leaks:

public void convertWithResourceCleanup(String inputPath, String outputPath) {
    Document doc = null;
    try {
        doc = new Document();
        doc.loadFromFile(inputPath, FileFormat.Html);
        doc.saveToFile(outputPath, FileFormat.PDF);
    } finally {
        if (doc != null) {
            doc.dispose();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

4.2 Batch Processing with Concurrency

For processing multiple documents, use a thread pool with controlled concurrency:

import java.util.concurrent.*;

public class BatchConverter {
    private final ExecutorService executor = Executors.newFixedThreadPool(4);

    public void convertAll(List<String> files, String outputDir) {
        List<Future<?>> futures = new ArrayList<>();

        for (String file : files) {
            futures.add(executor.submit(() -> {
                try (Document doc = new Document()) {
                    doc.loadFromFile(file, FileFormat.Html);
                    String outputPath = outputDir + "/" + 
                        new File(file).getName().replace(".html", ".pdf");
                    doc.saveToFile(outputPath, FileFormat.PDF);
                }
            }));
        }

        // Wait for completion
        for (Future<?> future : futures) {
            try {
                future.get();
            } catch (Exception e) {
                // Log and handle failures
            }
        }

        executor.shutdown();
    }
}
Enter fullscreen mode Exit fullscreen mode

5. Library Comparison

Here's how different libraries compare for HTML to PDF conversion:

Library Approach Strengths Considerations
Spire.Doc Document model Simple API, consistent with Word conversion Free version has page limits (10 pages)
iText 7 + pdfHTML Native PDF Excellent CSS support, industry standard AGPL license for open source use
OpenHTMLtoPDF Rendering engine Lightweight, open source CSS 2.1 level support
Headless Chrome Browser-based Highest fidelity, modern CSS support External process, additional infrastructure

6. Production Considerations

6.1 Error Handling

Implement robust error handling to manage conversion failures gracefully:

public ConversionResult safeConvert(String htmlPath, String pdfPath) {
    try {
        Document doc = new Document();
        doc.loadFromFile(htmlPath, FileFormat.Html);
        doc.saveToFile(pdfPath, FileFormat.PDF);
        doc.dispose();
        return ConversionResult.success(pdfPath);
    } catch (Exception e) {
        logger.error("Conversion failed for {}: {}", htmlPath, e.getMessage());
        return ConversionResult.failure(e.getMessage());
    }
}
Enter fullscreen mode Exit fullscreen mode

6.2 Performance Benchmarks

Typical performance indicators for reference:

Document Size Page Count Conversion Time Memory Usage
Simple HTML 1-2 pages 1-2 seconds 50-80 MB
Complex layout 5-10 pages 3-5 seconds 120-180 MB
Large document 50+ pages 10-15 seconds 300-500 MB

Note: Actual performance varies by content complexity and hardware

6.3 Free Version Limitations

If using a free version of any commercial library, be aware of limitations:

  • Page restrictions (typically 10 pages)
  • Watermarks or evaluation notices
  • Reduced performance or concurrent processing limits

For production deployments, evaluate whether these constraints impact your use case.


7. When to Choose Which Approach

Scenario Recommended Approach
Simple HTML, internal use Document library with free version
Complex CSS, pixel-perfect required Headless browser solution
High-volume batch processing Native PDF library with optimization
Existing Word/Office workflow Document model approach for consistency

8. Conclusion

HTML to PDF conversion in Java offers multiple viable approaches, each with distinct trade-offs. The document model approach provides a balanced solution: simple API, consistent handling across formats, and no external process dependencies.

Key takeaways for implementation:

  • Test with real content: HTML structure varies widely; validate with actual documents
  • Plan for fonts: Font availability is the most common failure point
  • Manage resources: Proper cleanup prevents memory issues at scale
  • Understand limitations: Free versions have constraints; plan accordingly

Start with simple conversion and progressively add configuration as your requirements evolve. Most importantly, test thoroughly with the actual HTML content your application will process before committing to a specific approach.

Top comments (0)