DEV Community

IronSoftware
IronSoftware

Posted on

iText 7 pdfHTML Memory Leak: OutOfMemoryError When Converting HTML

When converting HTML documents to PDF using iText 7's pdfHTML module, developers frequently encounter severe memory consumption issues. A document that should require 100MB of memory instead consumes 600-700MB, and in many cases triggers OutOfMemoryError exceptions that crash production applications. This article examines the root cause of these memory leaks, documents the community experience, and presents an alternative approach using IronPDF's Chrome-based rendering engine.

The Problem

iText 7's pdfHTML module suffers from a memory leak when processing HTML documents that contain deeply nested elements. The issue manifests most severely with:

  • Paragraphs containing many nested subelements
  • Tables with hundreds of cells spanning multiple pages
  • Deeply nested div structures
  • Long block elements containing hundreds of child elements

According to iText's own documentation, the root cause lies in the layout engine's handling of parent references:

"When processing paragraphs with a lot of nested subelements, iText would act suboptimally by not cleaning up certain parent links after laying out intermediate renderers."

The consequence is straightforward but severe: memory references that should be released are retained, causing memory consumption to grow with each nested element processed. For documents that span dozens of pages with complex nested structures, the memory footprint can become unmanageable.

Frequently Searched Error Messages

Developers experiencing this issue typically encounter one of several OutOfMemoryError variants. These exact stack traces appear frequently in support forums and issue trackers:

java.lang.OutOfMemoryError: Java heap space
    at com.itextpdf.kernel.pdf.PdfReader.readStreamBytesRaw
    at com.itextpdf.kernel.pdf.PdfReader.readStreamBytes
    at com.itextpdf.kernel.pdf.PdfStream.getBytes
Enter fullscreen mode Exit fullscreen mode
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:3236)
    at com.itextpdf.layout.renderer.BlockRenderer.layout
Enter fullscreen mode Exit fullscreen mode
java.lang.OutOfMemoryError: Java heap space
    at com.itextpdf.html2pdf.attach.impl.layout.Html2PdfProperty
Enter fullscreen mode Exit fullscreen mode

For .NET developers using iText 7:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at iText.Layout.Renderer.TableRenderer.Layout()
   at iText.Html2pdf.Attach.Impl.Tags.TableTagWorker.ProcessEnd()
Enter fullscreen mode Exit fullscreen mode

Symptoms Beyond Exceptions

Even when the conversion completes without throwing an exception, applications exhibit problematic behavior:

  • Memory usage spikes to 600-700MB for documents that should require ~100MB
  • Memory is not properly released after the conversion completes
  • Garbage collection cycles become longer and more frequent
  • In web server environments, subsequent requests experience degraded performance

Who Is Affected

This memory leak affects any application using iText 7's pdfHTML module for HTML-to-PDF conversion, particularly in the following scenarios:

Operating Systems: The issue is platform-independent, affecting Windows, Linux, and macOS deployments equally since it stems from Java's heap management rather than native code.

Framework Versions: The bug was present in iText 7 versions 7.1.0 through 7.1.14, with particularly severe manifestations in:

  • iText 7.1.0 through 7.1.4 (initial implementations)
  • iText 7.1.5 through 7.1.8 (partial fixes that did not fully resolve the issue)
  • iText 7.1.9 through 7.1.14 (incremental improvements but still problematic for deeply nested content)

While a comprehensive fix was released in version 7.1.15 (April 2021), many enterprise deployments remain on older versions due to licensing considerations and upgrade restrictions.

Java Heap Configuration: Affected developers commonly need to increase heap allocation significantly:

# Minimum heap size to process moderately complex documents
java -Xms512m -Xmx4g -jar myapp.jar

# For complex documents with deep nesting
java -Xms1g -Xmx8g -jar myapp.jar

# Docker container configuration
docker run -e JAVA_OPTS="-Xmx8g" myimage
Enter fullscreen mode Exit fullscreen mode

Use Cases: The memory leak primarily affects:

  • Report generation systems producing multi-page documents
  • Invoice or statement generation with repeating table rows
  • Document assembly pipelines processing batches of HTML
  • Web applications generating PDFs from user-submitted content

Scale: The problem becomes exponentially worse with document complexity. A simple single-page document may show no symptoms, while a 50-page document with nested tables can consume gigabytes of heap space.

Evidence from the Developer Community

The iText pdfHTML memory leak has been documented across multiple platforms over several years. The following timeline tracks the issue from initial reports through the eventual partial fix.

Timeline

Date Event Source
2015 Early reports of slow multi-page generation iText Mailing List
2018-01 Bug formally identified in iText 7 layout module iText KB
2019-02 Version 7.1.5 includes partial memory leak fix iText Release Notes
2019 GitHub issues document 8GB heap requirements GitHub (CERMINE)
2021-04 iText 7.1.15 releases with comprehensive fix iText Blog

Community Reports

The impact of this issue extends beyond iText's own forums. On GitHub, developers working with PDF processing documented the severity of the memory requirements:

"Fatal java.lang.OutOfMemoryError thrown while processing document"
— GitHub Issue #58, CERMINE Project

The same issue thread revealed the workaround that many developers were forced to adopt:

"able to finally process this file after bumping Xmx memory to 8GB"
— GitHub, 2019

On Coderanch, developers noted the discrepancy between HTML and PDF memory requirements for the same content:

"OutOfMemoryError (Java heap space) when trying to run a report as a PDF, while the same report runs fine when using HTML"
— Coderanch Forums

Historical archives from the iText mailing list reveal that memory issues with table processing have plagued the library for over a decade. One thread documented the scale of the problem:

"205 MB were used and about 575 thousand objects were temporarily created for a result that was just a 53kb PDF."
— iText Mailing List Archive

These reports span from 2008 to 2021, indicating a persistent architectural challenge rather than an isolated bug.

Root Cause Analysis

The iText pdfHTML memory leak stems from the library's custom layout engine design. Understanding this architecture helps explain why the issue was difficult to resolve.

When iText processes HTML for PDF conversion, it builds an internal representation of the document using renderer objects. Each HTML element (div, paragraph, table cell) creates a corresponding renderer. These renderers maintain references to their parent elements to support layout calculations like margin collapsing and relative positioning.

The problem occurs in the cleanup phase. After a renderer has been laid out and is no longer needed for active layout calculations, iText failed to properly clear the parent references. This meant that child renderers kept their parent renderers alive in memory, even after those parents should have been eligible for garbage collection.

For a simple document structure, this overhead is negligible. But for documents with deeply nested structures, the retained references compound:

  1. A table with 100 rows creates 100+ row renderers
  2. Each row with 5 cells creates 500+ cell renderers
  3. Each cell containing paragraphs with inline elements adds hundreds more
  4. All of these maintain references back up the tree

The result is that memory consumption scales non-linearly with document complexity. iText's measurements showed a 6-7x difference in memory usage before and after the fix for documents with hundreds of nested children.

Attempted Workarounds

Before the fix in version 7.1.15, developers attempted several approaches to work around the memory leak.

Increasing JVM Heap Size

Approach: Allocate more memory to the JVM using the -Xmx parameter.

java -Xmx8g -jar myapp.jar
Enter fullscreen mode Exit fullscreen mode

Limitations: This approach delays rather than solves the problem. Developers reported needing 8GB or more of heap space for moderately sized documents. Additionally, the leaked memory is not released after conversion completes, so subsequent conversions continue to accumulate memory pressure until the application restarts.

Chunking Large Documents

Approach: Split large HTML documents into smaller sections, convert each separately, then merge the resulting PDFs.

// Pseudocode - not recommended for production
List<String> htmlChunks = splitHtmlBySection(largeHtml);
List<PdfDocument> partialPdfs = new ArrayList<>();
for (String chunk : htmlChunks) {
    partialPdfs.add(convertChunkToPdf(chunk));
}
PdfDocument merged = mergeDocuments(partialPdfs);
Enter fullscreen mode Exit fullscreen mode

Limitations: This adds significant complexity and may not be feasible for documents where content flow must remain continuous. Table rows cannot be split mid-table without breaking the layout. The merging step itself has memory overhead, potentially negating the benefits.

Using MemoryLimitsAwareHandler

Approach: Configure iText to throw an exception when memory usage exceeds a threshold.

MemoryLimitsAwareHandler handler = new MemoryLimitsAwareHandler();
handler.setMaxSizeOfSingleDecompressedPdfStream(1024 * 1024 * 50); // 50MB
PdfReader reader = new PdfReader(inputStream,
    new ReaderProperties().setMemoryLimitsAwareHandler(handler));
Enter fullscreen mode Exit fullscreen mode

Limitations: This does not fix the memory leak; it simply causes the conversion to fail earlier rather than consuming all available memory. For legitimate large documents, this results in conversion failures rather than completed output.

Upgrading to iText 7.1.15

Approach: Update to the fixed version where iText addressed the parent link cleanup issue.

Limitations: While this is the recommended solution from iText, it presents practical challenges:

  • The AGPL license requires disclosing source code for non-commercial use
  • Commercial licenses may need to be repurchased for new versions
  • Enterprise change management processes may restrict version upgrades
  • API changes between versions may require code modifications

A Different Approach: IronPDF

For developers who cannot upgrade to iText 7.1.15 or who prefer a different architectural approach to HTML-to-PDF conversion, IronPDF offers an alternative that sidesteps the memory leak issue entirely.

IronPDF uses an embedded Chromium rendering engine rather than a custom layout engine. This means HTML is rendered using the same engine that powers Google Chrome, with memory management handled by Chrome's mature garbage collection system.

Why IronPDF Avoids This Issue

The architectural difference is fundamental. iText builds a custom representation of the document using Java objects, requiring careful management of object lifecycles. IronPDF delegates rendering to Chromium, which:

  1. Has been optimized for rendering complex web pages over two decades
  2. Uses a separate process for rendering, isolating memory management
  3. Releases memory predictably when the rendering process completes
  4. Handles deeply nested DOM structures as a core design requirement

This means the memory consumption profile is predictable and does not grow non-linearly with document nesting depth.

Code Example

The following C# example demonstrates converting an HTML document with deeply nested elements using IronPDF:

using IronPdf;

public class HtmlToPdfConverter
{
    public void ConvertLargeHtmlDocument(string htmlContent, string outputPath)
    {
        // ChromePdfRenderer uses embedded Chromium for rendering
        // Memory is managed by the Chrome process, not the .NET heap
        var renderer = new ChromePdfRenderer();

        // Configure rendering options for large documents
        renderer.RenderingOptions.Timeout = 120; // seconds
        renderer.RenderingOptions.EnableJavaScript = true;
        renderer.RenderingOptions.WaitFor.RenderDelay(500);

        // Convert HTML to PDF - memory usage scales linearly with document size
        var pdf = renderer.RenderHtmlAsPdf(htmlContent);

        // Memory is released when the pdf object is disposed
        pdf.SaveAs(outputPath);
    }

    public void ConvertNestedTableDocument()
    {
        // Example: Generate a large table that would cause memory issues in iText
        var htmlBuilder = new System.Text.StringBuilder();
        htmlBuilder.Append("<html><body><table>");

        // Create 500 rows with 10 cells each - 5000 nested elements
        for (int row = 0; row < 500; row++)
        {
            htmlBuilder.Append("<tr>");
            for (int col = 0; col < 10; col++)
            {
                htmlBuilder.AppendFormat(
                    "<td><div><p>Row {0}, Cell {1}</p></div></td>",
                    row, col);
            }
            htmlBuilder.Append("</tr>");
        }

        htmlBuilder.Append("</table></body></html>");

        var renderer = new ChromePdfRenderer();
        var pdf = renderer.RenderHtmlAsPdf(htmlBuilder.ToString());

        // Conversion completes without memory leak
        // Memory is released after SaveAs completes
        pdf.SaveAs("large-table-output.pdf");
    }
}
Enter fullscreen mode Exit fullscreen mode

Key points about this code:

  • The ChromePdfRenderer class handles all memory management internally
  • Chromium's rendering engine processes nested elements without retaining parent references
  • Memory usage scales linearly with document size rather than exponentially with nesting depth
  • The rendering process releases memory when the operation completes

For Java developers, IronPDF also provides a Java API with the same Chrome-based architecture:

import com.ironsoftware.ironpdf.*;

public class HtmlToPdfConverter {
    public static void main(String[] args) {
        // Apply license key if using licensed version
        License.setLicenseKey("YOUR-LICENSE-KEY");

        // Create renderer - uses embedded Chromium
        PdfDocument pdf = PdfDocument.renderHtmlAsPdf(
            "<html><body><table>...</table></body></html>"
        );

        // Save output
        pdf.saveAs("output.pdf");
    }
}
Enter fullscreen mode Exit fullscreen mode

API Reference

For detailed documentation on the methods used in these examples:

Migration Considerations

When evaluating a switch from iText to IronPDF, consider the following factors.

Licensing

iText 7 uses dual licensing (AGPL or commercial). IronPDF is commercial software with perpetual licensing options. A free trial is available for evaluation. For teams already paying for iText commercial licenses, the cost comparison should include upgrade fees for new iText versions.

API Differences

IronPDF's API is designed around the concept of rendering rather than document construction:

iText 7 IronPDF
HtmlConverter.convertToPdf() ChromePdfRenderer.RenderHtmlAsPdf()
Custom font providers CSS @font-face support
Layout-based positioning CSS/HTML positioning

Migration effort depends on how extensively the application uses iText-specific features beyond HTML-to-PDF conversion.

What You Gain

  • Predictable memory consumption for nested documents
  • Full CSS3 and JavaScript support via Chrome
  • No dependency on custom layout engine
  • Regular updates matching Chrome rendering improvements

What to Consider

  • IronPDF requires a commercial license for production use
  • The Chrome rendering engine adds ~50MB to the deployment size
  • Rendering style may differ slightly from iText for edge cases in CSS interpretation

Conclusion

The iText pdfHTML memory leak when processing nested HTML elements has been a documented issue affecting versions 7.1.0 through 7.1.14. While iText addressed the issue in version 7.1.15, many deployments remain on affected versions due to licensing and upgrade constraints. For applications requiring reliable HTML-to-PDF conversion without memory management concerns, IronPDF's Chrome-based rendering architecture provides a fundamentally different approach that avoids the nested element memory leak by design.


Jacob Mellor is CTO at Iron Software, where he leads technical development and originally built IronPDF.


References

  1. pdfHTML: Processing of extremely long elements{:rel="nofollow"} - Official iText documentation of the memory issue
  2. Release iText Core 7.1.15{:rel="nofollow"} - Release notes documenting the fix
  3. iText 7 Suite 7.1.15 Blog Post{:rel="nofollow"} - Official announcement of the memory optimization
  4. Release iText 7.1.5{:rel="nofollow"} - Earlier partial fix release notes
  5. OutOfMemoryError during pdf report generation - Coderanch{:rel="nofollow"} - Community discussion of the issue
  6. CERMINE GitHub Issue #58{:rel="nofollow"} - GitHub issue documenting 8GB memory requirements
  7. High memory allocation with PdfPTable{:rel="nofollow"} - Historical mailing list discussion

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

Top comments (0)