IronSoftware

Posted on Apr 22 • Edited on Apr 24

Aspose HTML to PDF OutOfMemoryException: Why Conversion Uses 2GB RAM

#dotnet #csharp

Developers using Aspose.HTML or Aspose.PDF to convert HTML documents to PDF encounter out-of-memory exceptions even with moderately complex content. The library's memory consumption can spike to 2GB or more during rendering, causing production failures without warning. This issue has been documented since 2020 and continues to affect deployments. This article examines the root cause and presents an alternative with more predictable memory characteristics.

The Problem

When converting HTML to PDF using Aspose.HTML or Aspose.PDF's HTML conversion features, the library allocates memory in an uncontrolled manner. Documents that render instantly in a browser can consume gigabytes of RAM during Aspose's conversion process.

The issue is particularly severe when:

HTML contains complex CSS layouts
Multiple images are embedded or referenced
Tables have many rows or columns
The document spans many pages
Multiple conversions run concurrently in a web application

Memory allocation grows rapidly during the conversion and may not be released promptly after completion, compounding the problem in production environments.

Error Messages and Symptoms

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at Aspose.Html.Converters.Converter.ConvertHTML(HTMLDocument document, PdfSaveOptions options, ICreateStreamProvider provider)

From developer reports:

ConvertHTML(htmlDocument, saveOptions, streamProvider); throws an out of memory exception
because it uses 2GB of RAM

Symptoms include:

Application memory climbing to 2GB+ during conversion
Conversion succeeding on small documents but failing on larger ones
Cascading failures when multiple requests trigger OOM simultaneously
Container kills in memory-limited environments

Memory Profiler Analysis

When analyzing the issue with memory profiling tools like dotMemory or Visual Studio Diagnostic Tools, the following patterns emerge:

Snapshot during HTML-to-PDF conversion:
=====================================
Total Managed Heap: 1.8 GB
  - Large Object Heap: 1.2 GB
    - System.Byte[]: 890 MB (image buffers)
    - System.String: 180 MB (HTML content copies)
    - System.Char[]: 130 MB (CSS parsing)
  - Generation 2: 450 MB
    - Internal layout objects
    - Font cache entries
  - Generation 0/1: 150 MB
    - Temporary parsing objects

Native Memory (untracked by GC): ~400 MB
  - Image decoding buffers
  - Font rasterization cache

The memory profile reveals that image buffers and internal string copies account for the majority of allocations. These are not released during the conversion, and the Large Object Heap becomes fragmented.

Who Is Affected

This issue impacts production deployments using Aspose's HTML conversion:

Operating Systems: Windows and Linux, though memory limits are often stricter on containerized Linux deployments.

Affected Versions: Reports span from version 20.8 through current versions.

Use Cases: Report generation systems, invoice creation, document automation pipelines, any application converting user-provided or dynamically generated HTML.

Environments: Azure App Service, AWS ECS/Lambda, Kubernetes, Docker, and any environment with memory limits.

Evidence from the Developer Community

Timeline

Date	Event	Source
2020-03-18	Out of memory rendering HTML reported	Aspose Forums
2020-05-15	Issue escalated, marked under investigation	Aspose Forums
2020-07-28	Issue still unresolved, developer reports production impact	Aspose Forums

Community Reports

"ConvertHTML(htmlDocument, saveOptions, streamProvider); throws an out of memory exception because it uses 2gb of ram."
— Developer, Aspose Forums, March 2020

"As we had this problem in our production so for me is important in which time the problem can be resolved because it is a blocking error."
— Developer, Aspose Forums, May 2020

Official Response

The Aspose team acknowledged the issue:

"We regret to share that the issue is not yet resolved. However, it is under the phase of investigation and requires more time to get fixed. We have recorded your concerns and escalated the issue to next level."
— Aspose Support, July 2020

HTML Patterns That Trigger High Memory Usage

Certain HTML patterns cause disproportionate memory consumption in Aspose's converter:

Large Data Tables

Tables with hundreds of rows cause memory to scale non-linearly:

<!-- This pattern causes excessive memory allocation -->
<table>
  <thead><tr><th>Col1</th><th>Col2</th><th>Col3</th></tr></thead>
  <tbody>
    <!-- 500+ rows causes 2GB+ memory -->
    <tr><td>Data</td><td>Data</td><td>Data</td></tr>
    <!-- ... repeated hundreds of times ... -->
  </tbody>
</table>

Embedded Base64 Images

Inline images multiply memory usage:

<!-- Each embedded image is decoded and held in memory -->
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..." />
<!-- Multiple embedded images compound the problem -->

Complex CSS Selectors

Deep selector chains increase style calculation memory:

/* Deep selectors increase memory during style resolution */
.container .wrapper .content .section .item .inner .text p span {
  color: #333;
}

Print Stylesheets with Media Queries

Complex @media print rules trigger additional layout calculations.

Memory Usage by HTML Pattern

HTML Pattern	Typical Memory Usage
Simple text (10 pages)	200-400 MB
Data table (100 rows)	400-600 MB
Data table (500 rows)	1.2-1.8 GB
10 embedded images (1MB each)	800 MB - 1.2 GB
Complex CSS with nested selectors	+200-400 MB overhead
Print media queries	+100-200 MB overhead

Root Cause Analysis

Aspose's HTML-to-PDF conversion does not use a browser rendering engine. Instead, it implements its own HTML parser and layout engine. This custom implementation has different memory characteristics than browser-based rendering:

Document Model Loading: The entire HTML document is parsed into memory before rendering begins
CSS Calculation: Style calculations are performed on the full document tree
Layout Computation: Layout passes may require multiple iterations for complex CSS
Image Processing: Images are decoded and held in memory during rendering
Font Loading: Font data is loaded for each font family used

These operations compound in ways that browser engines have optimized over decades but custom implementations have not. A document that Chrome renders in 50MB might consume 2GB in Aspose's converter.

The issue is architectural rather than a simple bug. Reducing memory consumption would require fundamental changes to how the converter processes documents.

Attempted Workarounds

Workaround 1: Increase Memory Limits

Approach: Configure the application or container with more available memory.

<!-- App.config or Web.config -->
<configuration>
  <runtime>
    <gcAllowVeryLargeObjects enabled="true" />
  </runtime>
</configuration>

# Kubernetes
resources:
  limits:
    memory: 4Gi

Limitations:

Increases infrastructure costs
Does not solve the root cause
Memory usage is unbounded; larger documents still fail
May cause other applications on the same host to be memory-starved

Workaround 2: Split Large Documents

Approach: Break HTML into smaller chunks and convert separately.

// Convert in chunks, then merge PDFs
List<byte[]> chunks = new List<byte[]>();
foreach (var htmlChunk in SplitHtml(fullHtml))
{
    chunks.Add(ConvertChunk(htmlChunk));
}
byte[] merged = MergePdfs(chunks);

Limitations:

Complex implementation
Breaks page numbering, headers, footers
Tables and other elements cannot span chunks
Significant development effort

Workaround 3: Queue with Limited Concurrency

Approach: Process conversions one at a time to prevent memory accumulation.

private static SemaphoreSlim _conversionSemaphore = new SemaphoreSlim(1, 1);

public async Task<byte[]> ConvertWithLimit(string html)
{
    await _conversionSemaphore.WaitAsync();
    try
    {
        return Convert(html);
    }
    finally
    {
        _conversionSemaphore.Release();
        GC.Collect(); // Attempt to free memory
    }
}

Limitations:

Reduces throughput significantly
Conversions queue up during peak load
Memory may still accumulate if GC doesn't release native resources

A Different Approach: IronPDF

IronPDF uses an embedded Chromium browser engine with process isolation, providing predictable memory behavior that differs fundamentally from Aspose's architecture.

Why IronPDF Handles Memory Differently

IronPDF's rendering happens in a separate Chromium subprocess. This architecture provides several memory advantages:

Process Isolation: Chromium's memory is separate from the .NET application
OS Memory Management: The subprocess memory is managed by the operating system
Clean Termination: When rendering completes, subprocess memory is fully released
Battle-Tested Engine: Chromium's memory management has been optimized for years

The result is predictable memory consumption that scales with document complexity in a linear, manageable way.

Code Example

using IronPdf;

public class HtmlConverter
{
    public byte[] ConvertHtmlToPdf(string html)
    {
        var renderer = new ChromePdfRenderer();

        // These options affect rendering quality, not memory consumption
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;

        // Render HTML - memory usage is predictable and bounded
        using var pdf = renderer.RenderHtmlAsPdf(html);

        return pdf.BinaryData;
    }

    public async Task<byte[]> ConvertComplexReportAsync(ReportData data)
    {
        var renderer = new ChromePdfRenderer();

        // Enable JavaScript for complex rendering
        renderer.RenderingOptions.EnableJavaScript = true;
        renderer.RenderingOptions.RenderDelay = 1000; // Wait for charts to render

        // Generate complex HTML with charts, tables, images
        string html = GenerateReportHtml(data);

        using var pdf = renderer.RenderHtmlAsPdf(html);

        return pdf.BinaryData;
    }

    private string GenerateReportHtml(ReportData data)
    {
        return $@"
<!DOCTYPE html>
<html>
<head>
    <style>
        body {{ font-family: Arial, sans-serif; }}
        table {{ width: 100%; border-collapse: collapse; }}
        th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
        th {{ background-color: #4a90a4; color: white; }}
        tr:nth-child(even) {{ background-color: #f9f9f9; }}
    </style>
</head>
<body>
    <h1>{data.Title}</h1>
    <table>
        <thead>
            <tr>
                <th>Column 1</th>
                <th>Column 2</th>
                <th>Column 3</th>
            </tr>
        </thead>
        <tbody>
            {GenerateTableRows(data.Rows)}
        </tbody>
    </table>
</body>
</html>";
    }

    private string GenerateTableRows(IEnumerable<RowData> rows)
    {
        return string.Join("", rows.Select(r =>
            $"<tr><td>{r.Col1}</td><td>{r.Col2}</td><td>{r.Col3}</td></tr>"));
    }
}

Key points about this code:

Memory usage does not spike unpredictably
Large documents with many rows complete without OOM errors
The using statement ensures proper cleanup
No special configuration needed for memory management

API Reference

For more details on the methods used:

Migration Considerations

Licensing

IronPDF is commercial software with perpetual licensing
Free trial available for evaluation
Licensing information

API Differences

Aspose: Converter.ConvertHTML() with HTMLDocument objects
IronPDF: ChromePdfRenderer.RenderHtmlAsPdf() with HTML strings
Migration involves replacing conversion calls, not changing HTML templates

What You Gain

Predictable, bounded memory consumption
Same HTML renders regardless of document size
No need to split documents or limit concurrency

What to Consider

Chromium binaries add to deployment size
Different licensing model
Slightly different API surface

Conclusion

Aspose's HTML-to-PDF conversion can exhaust memory on moderately complex documents due to its custom rendering implementation. For applications where memory predictability is important—especially containerized and serverless deployments—a Chromium-based converter provides the stability that custom HTML parsers cannot match.

Written by Jacob Mellor, CTO at Iron Software.

References

Aspose Forum Thread #210253{:rel="nofollow"} - Out of memory when rendering HTML

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

DEV Community

Aspose HTML to PDF OutOfMemoryException: Why Conversion Uses 2GB RAM

The Problem

Error Messages and Symptoms

Memory Profiler Analysis

Who Is Affected

Evidence from the Developer Community

Timeline

Community Reports

Official Response

HTML Patterns That Trigger High Memory Usage

Large Data Tables

Embedded Base64 Images

Complex CSS Selectors

Print Stylesheets with Media Queries

Memory Usage by HTML Pattern

Root Cause Analysis

Attempted Workarounds

Workaround 1: Increase Memory Limits

Workaround 2: Split Large Documents

Workaround 3: Queue with Limited Concurrency

A Different Approach: IronPDF

Why IronPDF Handles Memory Differently

Code Example

API Reference

Migration Considerations

Licensing

API Differences

What You Gain

What to Consider

Conclusion

References

Top comments (0)