DEV Community

IronSoftware
IronSoftware

Posted on

# Aspose HTML to PDF OutOfMemoryException: Why Conversion Uses 2GB RAM

Developers using Aspose.HTML or Aspose.PDF to convert HTML documents to PDF encounter out-of-memory exceptions even with moderately complex content. The library's memory consumption can spike to 2GB or more during rendering, causing production failures without warning. This issue has been documented since 2020 and continues to affect deployments. This article examines the root cause and presents an alternative with more predictable memory characteristics.

The Problem

When converting HTML to PDF using Aspose.HTML or Aspose.PDF's HTML conversion features, the library allocates memory in an uncontrolled manner. Documents that render instantly in a browser can consume gigabytes of RAM during Aspose's conversion process.

The issue is particularly severe when:

  • HTML contains complex CSS layouts
  • Multiple images are embedded or referenced
  • Tables have many rows or columns
  • The document spans many pages
  • Multiple conversions run concurrently in a web application

Memory allocation grows rapidly during the conversion and may not be released promptly after completion, compounding the problem in production environments.

Error Messages and Symptoms

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at Aspose.Html.Converters.Converter.ConvertHTML(HTMLDocument document, PdfSaveOptions options, ICreateStreamProvider provider)
Enter fullscreen mode Exit fullscreen mode

From developer reports:

ConvertHTML(htmlDocument, saveOptions, streamProvider); throws an out of memory exception
because it uses 2GB of RAM
Enter fullscreen mode Exit fullscreen mode

Symptoms include:

  • Application memory climbing to 2GB+ during conversion
  • Conversion succeeding on small documents but failing on larger ones
  • Cascading failures when multiple requests trigger OOM simultaneously
  • Container kills in memory-limited environments

Memory Profiler Analysis

When analyzing the issue with memory profiling tools like dotMemory or Visual Studio Diagnostic Tools, the following patterns emerge:

Snapshot during HTML-to-PDF conversion:
=====================================
Total Managed Heap: 1.8 GB
  - Large Object Heap: 1.2 GB
    - System.Byte[]: 890 MB (image buffers)
    - System.String: 180 MB (HTML content copies)
    - System.Char[]: 130 MB (CSS parsing)
  - Generation 2: 450 MB
    - Internal layout objects
    - Font cache entries
  - Generation 0/1: 150 MB
    - Temporary parsing objects

Native Memory (untracked by GC): ~400 MB
  - Image decoding buffers
  - Font rasterization cache
Enter fullscreen mode Exit fullscreen mode

The memory profile reveals that image buffers and internal string copies account for the majority of allocations. These are not released during the conversion, and the Large Object Heap becomes fragmented.

Who Is Affected

This issue impacts production deployments using Aspose's HTML conversion:

Operating Systems: Windows and Linux, though memory limits are often stricter on containerized Linux deployments.

Affected Versions: Reports span from version 20.8 through current versions.

Use Cases: Report generation systems, invoice creation, document automation pipelines, any application converting user-provided or dynamically generated HTML.

Environments: Azure App Service, AWS ECS/Lambda, Kubernetes, Docker, and any environment with memory limits.

Evidence from the Developer Community

Timeline

Date Event Source
2020-03-18 Out of memory rendering HTML reported Aspose Forums
2020-05-15 Issue escalated, marked under investigation Aspose Forums
2020-07-28 Issue still unresolved, developer reports production impact Aspose Forums

Community Reports

"ConvertHTML(htmlDocument, saveOptions, streamProvider); throws an out of memory exception because it uses 2gb of ram."
— Developer, Aspose Forums, March 2020

"As we had this problem in our production so for me is important in which time the problem can be resolved because it is a blocking error."
— Developer, Aspose Forums, May 2020

Official Response

The Aspose team acknowledged the issue:

"We regret to share that the issue is not yet resolved. However, it is under the phase of investigation and requires more time to get fixed. We have recorded your concerns and escalated the issue to next level."
— Aspose Support, July 2020

HTML Patterns That Trigger High Memory Usage

Certain HTML patterns cause disproportionate memory consumption in Aspose's converter:

Large Data Tables

Tables with hundreds of rows cause memory to scale non-linearly:

<!-- This pattern causes excessive memory allocation -->
<table>
  <thead><tr><th>Col1</th><th>Col2</th><th>Col3</th></tr></thead>
  <tbody>
    <!-- 500+ rows causes 2GB+ memory -->
    <tr><td>Data</td><td>Data</td><td>Data</td></tr>
    <!-- ... repeated hundreds of times ... -->
  </tbody>
</table>
Enter fullscreen mode Exit fullscreen mode

Embedded Base64 Images

Inline images multiply memory usage:

<!-- Each embedded image is decoded and held in memory -->
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..." />
<!-- Multiple embedded images compound the problem -->
Enter fullscreen mode Exit fullscreen mode

Complex CSS Selectors

Deep selector chains increase style calculation memory:

/* Deep selectors increase memory during style resolution */
.container .wrapper .content .section .item .inner .text p span {
  color: #333;
}
Enter fullscreen mode Exit fullscreen mode

Print Stylesheets with Media Queries

Complex @media print rules trigger additional layout calculations.

Memory Usage by HTML Pattern

HTML Pattern Typical Memory Usage
Simple text (10 pages) 200-400 MB
Data table (100 rows) 400-600 MB
Data table (500 rows) 1.2-1.8 GB
10 embedded images (1MB each) 800 MB - 1.2 GB
Complex CSS with nested selectors +200-400 MB overhead
Print media queries +100-200 MB overhead

Root Cause Analysis

Aspose's HTML-to-PDF conversion does not use a browser rendering engine. Instead, it implements its own HTML parser and layout engine. This custom implementation has different memory characteristics than browser-based rendering:

  1. Document Model Loading: The entire HTML document is parsed into memory before rendering begins
  2. CSS Calculation: Style calculations are performed on the full document tree
  3. Layout Computation: Layout passes may require multiple iterations for complex CSS
  4. Image Processing: Images are decoded and held in memory during rendering
  5. Font Loading: Font data is loaded for each font family used

These operations compound in ways that browser engines have optimized over decades but custom implementations have not. A document that Chrome renders in 50MB might consume 2GB in Aspose's converter.

The issue is architectural rather than a simple bug. Reducing memory consumption would require fundamental changes to how the converter processes documents.

Attempted Workarounds

Workaround 1: Increase Memory Limits

Approach: Configure the application or container with more available memory.

<!-- App.config or Web.config -->
<configuration>
  <runtime>
    <gcAllowVeryLargeObjects enabled="true" />
  </runtime>
</configuration>
Enter fullscreen mode Exit fullscreen mode
# Kubernetes
resources:
  limits:
    memory: 4Gi
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Increases infrastructure costs
  • Does not solve the root cause
  • Memory usage is unbounded; larger documents still fail
  • May cause other applications on the same host to be memory-starved

Workaround 2: Split Large Documents

Approach: Break HTML into smaller chunks and convert separately.

// Convert in chunks, then merge PDFs
List<byte[]> chunks = new List<byte[]>();
foreach (var htmlChunk in SplitHtml(fullHtml))
{
    chunks.Add(ConvertChunk(htmlChunk));
}
byte[] merged = MergePdfs(chunks);
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Complex implementation
  • Breaks page numbering, headers, footers
  • Tables and other elements cannot span chunks
  • Significant development effort

Workaround 3: Queue with Limited Concurrency

Approach: Process conversions one at a time to prevent memory accumulation.

private static SemaphoreSlim _conversionSemaphore = new SemaphoreSlim(1, 1);

public async Task<byte[]> ConvertWithLimit(string html)
{
    await _conversionSemaphore.WaitAsync();
    try
    {
        return Convert(html);
    }
    finally
    {
        _conversionSemaphore.Release();
        GC.Collect(); // Attempt to free memory
    }
}
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Reduces throughput significantly
  • Conversions queue up during peak load
  • Memory may still accumulate if GC doesn't release native resources

A Different Approach: IronPDF

IronPDF uses an embedded Chromium browser engine with process isolation, providing predictable memory behavior that differs fundamentally from Aspose's architecture.

Why IronPDF Handles Memory Differently

IronPDF's rendering happens in a separate Chromium subprocess. This architecture provides several memory advantages:

  1. Process Isolation: Chromium's memory is separate from the .NET application
  2. OS Memory Management: The subprocess memory is managed by the operating system
  3. Clean Termination: When rendering completes, subprocess memory is fully released
  4. Battle-Tested Engine: Chromium's memory management has been optimized for years

The result is predictable memory consumption that scales with document complexity in a linear, manageable way.

Code Example

using IronPdf;

public class HtmlConverter
{
    public byte[] ConvertHtmlToPdf(string html)
    {
        var renderer = new ChromePdfRenderer();

        // These options affect rendering quality, not memory consumption
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;

        // Render HTML - memory usage is predictable and bounded
        using var pdf = renderer.RenderHtmlAsPdf(html);

        return pdf.BinaryData;
    }

    public async Task<byte[]> ConvertComplexReportAsync(ReportData data)
    {
        var renderer = new ChromePdfRenderer();

        // Enable JavaScript for complex rendering
        renderer.RenderingOptions.EnableJavaScript = true;
        renderer.RenderingOptions.RenderDelay = 1000; // Wait for charts to render

        // Generate complex HTML with charts, tables, images
        string html = GenerateReportHtml(data);

        using var pdf = renderer.RenderHtmlAsPdf(html);

        return pdf.BinaryData;
    }

    private string GenerateReportHtml(ReportData data)
    {
        return $@"
<!DOCTYPE html>
<html>
<head>
    <style>
        body {{ font-family: Arial, sans-serif; }}
        table {{ width: 100%; border-collapse: collapse; }}
        th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
        th {{ background-color: #4a90a4; color: white; }}
        tr:nth-child(even) {{ background-color: #f9f9f9; }}
    </style>
</head>
<body>
    <h1>{data.Title}</h1>
    <table>
        <thead>
            <tr>
                <th>Column 1</th>
                <th>Column 2</th>
                <th>Column 3</th>
            </tr>
        </thead>
        <tbody>
            {GenerateTableRows(data.Rows)}
        </tbody>
    </table>
</body>
</html>";
    }

    private string GenerateTableRows(IEnumerable<RowData> rows)
    {
        return string.Join("", rows.Select(r =>
            $"<tr><td>{r.Col1}</td><td>{r.Col2}</td><td>{r.Col3}</td></tr>"));
    }
}
Enter fullscreen mode Exit fullscreen mode

Key points about this code:

  • Memory usage does not spike unpredictably
  • Large documents with many rows complete without OOM errors
  • The using statement ensures proper cleanup
  • No special configuration needed for memory management

API Reference

For more details on the methods used:

Migration Considerations

Licensing

  • IronPDF is commercial software with perpetual licensing
  • Free trial available for evaluation
  • Licensing information

API Differences

  • Aspose: Converter.ConvertHTML() with HTMLDocument objects
  • IronPDF: ChromePdfRenderer.RenderHtmlAsPdf() with HTML strings
  • Migration involves replacing conversion calls, not changing HTML templates

What You Gain

  • Predictable, bounded memory consumption
  • Same HTML renders regardless of document size
  • No need to split documents or limit concurrency

What to Consider

  • Chromium binaries add to deployment size
  • Different licensing model
  • Slightly different API surface

Conclusion

Aspose's HTML-to-PDF conversion can exhaust memory on moderately complex documents due to its custom rendering implementation. For applications where memory predictability is important—especially containerized and serverless deployments—a Chromium-based converter provides the stability that custom HTML parsers cannot match.


Written by Jacob Mellor, CTO at Iron Software.


References

  1. Aspose Forum Thread #210253{:rel="nofollow"} - Out of memory when rendering HTML

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

Top comments (0)