When converting HTML documents to PDF using iText 7's pdfHTML module, developers frequently encounter severe memory consumption issues. A document that should require 100MB of memory instead consumes 600-700MB, and in many cases triggers OutOfMemoryError exceptions that crash production applications. This article examines the root cause of these memory leaks, documents the community experience, and presents an alternative approach using IronPDF's Chrome-based rendering engine.
The Problem
iText 7's pdfHTML module suffers from a memory leak when processing HTML documents that contain deeply nested elements. The issue manifests most severely with:
- Paragraphs containing many nested subelements
- Tables with hundreds of cells spanning multiple pages
- Deeply nested div structures
- Long block elements containing hundreds of child elements
According to iText's own documentation, the root cause lies in the layout engine's handling of parent references:
"When processing paragraphs with a lot of nested subelements, iText would act suboptimally by not cleaning up certain parent links after laying out intermediate renderers."
The consequence is straightforward but severe: memory references that should be released are retained, causing memory consumption to grow with each nested element processed. For documents that span dozens of pages with complex nested structures, the memory footprint can become unmanageable.
Frequently Searched Error Messages
Developers experiencing this issue typically encounter one of several OutOfMemoryError variants. These exact stack traces appear frequently in support forums and issue trackers:
java.lang.OutOfMemoryError: Java heap space
at com.itextpdf.kernel.pdf.PdfReader.readStreamBytesRaw
at com.itextpdf.kernel.pdf.PdfReader.readStreamBytes
at com.itextpdf.kernel.pdf.PdfStream.getBytes
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3236)
at com.itextpdf.layout.renderer.BlockRenderer.layout
java.lang.OutOfMemoryError: Java heap space
at com.itextpdf.html2pdf.attach.impl.layout.Html2PdfProperty
For .NET developers using iText 7:
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at iText.Layout.Renderer.TableRenderer.Layout()
at iText.Html2pdf.Attach.Impl.Tags.TableTagWorker.ProcessEnd()
Symptoms Beyond Exceptions
Even when the conversion completes without throwing an exception, applications exhibit problematic behavior:
- Memory usage spikes to 600-700MB for documents that should require ~100MB
- Memory is not properly released after the conversion completes
- Garbage collection cycles become longer and more frequent
- In web server environments, subsequent requests experience degraded performance
Who Is Affected
This memory leak affects any application using iText 7's pdfHTML module for HTML-to-PDF conversion, particularly in the following scenarios:
Operating Systems: The issue is platform-independent, affecting Windows, Linux, and macOS deployments equally since it stems from Java's heap management rather than native code.
Framework Versions: The bug was present in iText 7 versions 7.1.0 through 7.1.14, with particularly severe manifestations in:
- iText 7.1.0 through 7.1.4 (initial implementations)
- iText 7.1.5 through 7.1.8 (partial fixes that did not fully resolve the issue)
- iText 7.1.9 through 7.1.14 (incremental improvements but still problematic for deeply nested content)
While a comprehensive fix was released in version 7.1.15 (April 2021), many enterprise deployments remain on older versions due to licensing considerations and upgrade restrictions.
Java Heap Configuration: Affected developers commonly need to increase heap allocation significantly:
# Minimum heap size to process moderately complex documents
java -Xms512m -Xmx4g -jar myapp.jar
# For complex documents with deep nesting
java -Xms1g -Xmx8g -jar myapp.jar
# Docker container configuration
docker run -e JAVA_OPTS="-Xmx8g" myimage
Use Cases: The memory leak primarily affects:
- Report generation systems producing multi-page documents
- Invoice or statement generation with repeating table rows
- Document assembly pipelines processing batches of HTML
- Web applications generating PDFs from user-submitted content
Scale: The problem becomes exponentially worse with document complexity. A simple single-page document may show no symptoms, while a 50-page document with nested tables can consume gigabytes of heap space.
Evidence from the Developer Community
The iText pdfHTML memory leak has been documented across multiple platforms over several years. The following timeline tracks the issue from initial reports through the eventual partial fix.
Timeline
| Date | Event | Source |
|---|---|---|
| 2015 | Early reports of slow multi-page generation | iText Mailing List |
| 2018-01 | Bug formally identified in iText 7 layout module | iText KB |
| 2019-02 | Version 7.1.5 includes partial memory leak fix | iText Release Notes |
| 2019 | GitHub issues document 8GB heap requirements | GitHub (CERMINE) |
| 2021-04 | iText 7.1.15 releases with comprehensive fix | iText Blog |
Community Reports
The impact of this issue extends beyond iText's own forums. On GitHub, developers working with PDF processing documented the severity of the memory requirements:
"Fatal java.lang.OutOfMemoryError thrown while processing document"
— GitHub Issue #58, CERMINE Project
The same issue thread revealed the workaround that many developers were forced to adopt:
"able to finally process this file after bumping Xmx memory to 8GB"
— GitHub, 2019
On Coderanch, developers noted the discrepancy between HTML and PDF memory requirements for the same content:
"OutOfMemoryError (Java heap space) when trying to run a report as a PDF, while the same report runs fine when using HTML"
— Coderanch Forums
Historical archives from the iText mailing list reveal that memory issues with table processing have plagued the library for over a decade. One thread documented the scale of the problem:
"205 MB were used and about 575 thousand objects were temporarily created for a result that was just a 53kb PDF."
— iText Mailing List Archive
These reports span from 2008 to 2021, indicating a persistent architectural challenge rather than an isolated bug.
Root Cause Analysis
The iText pdfHTML memory leak stems from the library's custom layout engine design. Understanding this architecture helps explain why the issue was difficult to resolve.
When iText processes HTML for PDF conversion, it builds an internal representation of the document using renderer objects. Each HTML element (div, paragraph, table cell) creates a corresponding renderer. These renderers maintain references to their parent elements to support layout calculations like margin collapsing and relative positioning.
The problem occurs in the cleanup phase. After a renderer has been laid out and is no longer needed for active layout calculations, iText failed to properly clear the parent references. This meant that child renderers kept their parent renderers alive in memory, even after those parents should have been eligible for garbage collection.
For a simple document structure, this overhead is negligible. But for documents with deeply nested structures, the retained references compound:
- A table with 100 rows creates 100+ row renderers
- Each row with 5 cells creates 500+ cell renderers
- Each cell containing paragraphs with inline elements adds hundreds more
- All of these maintain references back up the tree
The result is that memory consumption scales non-linearly with document complexity. iText's measurements showed a 6-7x difference in memory usage before and after the fix for documents with hundreds of nested children.
Attempted Workarounds
Before the fix in version 7.1.15, developers attempted several approaches to work around the memory leak.
Increasing JVM Heap Size
Approach: Allocate more memory to the JVM using the -Xmx parameter.
java -Xmx8g -jar myapp.jar
Limitations: This approach delays rather than solves the problem. Developers reported needing 8GB or more of heap space for moderately sized documents. Additionally, the leaked memory is not released after conversion completes, so subsequent conversions continue to accumulate memory pressure until the application restarts.
Chunking Large Documents
Approach: Split large HTML documents into smaller sections, convert each separately, then merge the resulting PDFs.
// Pseudocode - not recommended for production
List<String> htmlChunks = splitHtmlBySection(largeHtml);
List<PdfDocument> partialPdfs = new ArrayList<>();
for (String chunk : htmlChunks) {
partialPdfs.add(convertChunkToPdf(chunk));
}
PdfDocument merged = mergeDocuments(partialPdfs);
Limitations: This adds significant complexity and may not be feasible for documents where content flow must remain continuous. Table rows cannot be split mid-table without breaking the layout. The merging step itself has memory overhead, potentially negating the benefits.
Using MemoryLimitsAwareHandler
Approach: Configure iText to throw an exception when memory usage exceeds a threshold.
MemoryLimitsAwareHandler handler = new MemoryLimitsAwareHandler();
handler.setMaxSizeOfSingleDecompressedPdfStream(1024 * 1024 * 50); // 50MB
PdfReader reader = new PdfReader(inputStream,
new ReaderProperties().setMemoryLimitsAwareHandler(handler));
Limitations: This does not fix the memory leak; it simply causes the conversion to fail earlier rather than consuming all available memory. For legitimate large documents, this results in conversion failures rather than completed output.
Upgrading to iText 7.1.15
Approach: Update to the fixed version where iText addressed the parent link cleanup issue.
Limitations: While this is the recommended solution from iText, it presents practical challenges:
- The AGPL license requires disclosing source code for non-commercial use
- Commercial licenses may need to be repurchased for new versions
- Enterprise change management processes may restrict version upgrades
- API changes between versions may require code modifications
A Different Approach: IronPDF
For developers who cannot upgrade to iText 7.1.15 or who prefer a different architectural approach to HTML-to-PDF conversion, IronPDF offers an alternative that sidesteps the memory leak issue entirely.
IronPDF uses an embedded Chromium rendering engine rather than a custom layout engine. This means HTML is rendered using the same engine that powers Google Chrome, with memory management handled by Chrome's mature garbage collection system.
Why IronPDF Avoids This Issue
The architectural difference is fundamental. iText builds a custom representation of the document using Java objects, requiring careful management of object lifecycles. IronPDF delegates rendering to Chromium, which:
- Has been optimized for rendering complex web pages over two decades
- Uses a separate process for rendering, isolating memory management
- Releases memory predictably when the rendering process completes
- Handles deeply nested DOM structures as a core design requirement
This means the memory consumption profile is predictable and does not grow non-linearly with document nesting depth.
Code Example
The following C# example demonstrates converting an HTML document with deeply nested elements using IronPDF:
using IronPdf;
public class HtmlToPdfConverter
{
public void ConvertLargeHtmlDocument(string htmlContent, string outputPath)
{
// ChromePdfRenderer uses embedded Chromium for rendering
// Memory is managed by the Chrome process, not the .NET heap
var renderer = new ChromePdfRenderer();
// Configure rendering options for large documents
renderer.RenderingOptions.Timeout = 120; // seconds
renderer.RenderingOptions.EnableJavaScript = true;
renderer.RenderingOptions.WaitFor.RenderDelay(500);
// Convert HTML to PDF - memory usage scales linearly with document size
var pdf = renderer.RenderHtmlAsPdf(htmlContent);
// Memory is released when the pdf object is disposed
pdf.SaveAs(outputPath);
}
public void ConvertNestedTableDocument()
{
// Example: Generate a large table that would cause memory issues in iText
var htmlBuilder = new System.Text.StringBuilder();
htmlBuilder.Append("<html><body><table>");
// Create 500 rows with 10 cells each - 5000 nested elements
for (int row = 0; row < 500; row++)
{
htmlBuilder.Append("<tr>");
for (int col = 0; col < 10; col++)
{
htmlBuilder.AppendFormat(
"<td><div><p>Row {0}, Cell {1}</p></div></td>",
row, col);
}
htmlBuilder.Append("</tr>");
}
htmlBuilder.Append("</table></body></html>");
var renderer = new ChromePdfRenderer();
var pdf = renderer.RenderHtmlAsPdf(htmlBuilder.ToString());
// Conversion completes without memory leak
// Memory is released after SaveAs completes
pdf.SaveAs("large-table-output.pdf");
}
}
Key points about this code:
- The
ChromePdfRendererclass handles all memory management internally - Chromium's rendering engine processes nested elements without retaining parent references
- Memory usage scales linearly with document size rather than exponentially with nesting depth
- The rendering process releases memory when the operation completes
For Java developers, IronPDF also provides a Java API with the same Chrome-based architecture:
import com.ironsoftware.ironpdf.*;
public class HtmlToPdfConverter {
public static void main(String[] args) {
// Apply license key if using licensed version
License.setLicenseKey("YOUR-LICENSE-KEY");
// Create renderer - uses embedded Chromium
PdfDocument pdf = PdfDocument.renderHtmlAsPdf(
"<html><body><table>...</table></body></html>"
);
// Save output
pdf.saveAs("output.pdf");
}
}
API Reference
For detailed documentation on the methods used in these examples:
- ChromePdfRenderer - Main rendering class
- HTML to PDF Tutorial - Step-by-step guide
- HTML to PDF Examples - Additional code samples
Migration Considerations
When evaluating a switch from iText to IronPDF, consider the following factors.
Licensing
iText 7 uses dual licensing (AGPL or commercial). IronPDF is commercial software with perpetual licensing options. A free trial is available for evaluation. For teams already paying for iText commercial licenses, the cost comparison should include upgrade fees for new iText versions.
API Differences
IronPDF's API is designed around the concept of rendering rather than document construction:
| iText 7 | IronPDF |
|---|---|
HtmlConverter.convertToPdf() |
ChromePdfRenderer.RenderHtmlAsPdf() |
| Custom font providers | CSS @font-face support |
| Layout-based positioning | CSS/HTML positioning |
Migration effort depends on how extensively the application uses iText-specific features beyond HTML-to-PDF conversion.
What You Gain
- Predictable memory consumption for nested documents
- Full CSS3 and JavaScript support via Chrome
- No dependency on custom layout engine
- Regular updates matching Chrome rendering improvements
What to Consider
- IronPDF requires a commercial license for production use
- The Chrome rendering engine adds ~50MB to the deployment size
- Rendering style may differ slightly from iText for edge cases in CSS interpretation
Conclusion
The iText pdfHTML memory leak when processing nested HTML elements has been a documented issue affecting versions 7.1.0 through 7.1.14. While iText addressed the issue in version 7.1.15, many deployments remain on affected versions due to licensing and upgrade constraints. For applications requiring reliable HTML-to-PDF conversion without memory management concerns, IronPDF's Chrome-based rendering architecture provides a fundamentally different approach that avoids the nested element memory leak by design.
Jacob Mellor is CTO at Iron Software, where he leads technical development and originally built IronPDF.
References
- pdfHTML: Processing of extremely long elements{:rel="nofollow"} - Official iText documentation of the memory issue
- Release iText Core 7.1.15{:rel="nofollow"} - Release notes documenting the fix
- iText 7 Suite 7.1.15 Blog Post{:rel="nofollow"} - Official announcement of the memory optimization
- Release iText 7.1.5{:rel="nofollow"} - Earlier partial fix release notes
- OutOfMemoryError during pdf report generation - Coderanch{:rel="nofollow"} - Community discussion of the issue
- CERMINE GitHub Issue #58{:rel="nofollow"} - GitHub issue documenting 8GB memory requirements
- High memory allocation with PdfPTable{:rel="nofollow"} - Historical mailing list discussion
For the latest IronPDF documentation and tutorials, visit ironpdf.com.
Top comments (0)