IronSoftware

Posted on Feb 27

Aspose HTML to PDF Memory Leak (Issue Fixed)

#csharp #dotnet

Developers using Aspose.HTML for repeated PDF conversions report runaway memory growth that leads to OutOfMemoryException. Memory climbs with each conversion and is never released, even after disposing objects and forcing garbage collection. This pattern has been reported consistently across multiple versions and continues to affect production deployments. This article documents the issue and explores alternatives with more predictable memory behavior.

The Problem

When converting multiple HTML documents to PDF using Aspose.HTML, memory accumulates after each conversion. The allocated memory is not released between operations, causing:

Memory growth proportional to the number of conversions
Eventual OutOfMemoryException or process termination
Service degradation over time in long-running processes
Container restarts in cloud environments

The issue persists even when:

Objects are properly disposed
GC.Collect() is called between conversions
Documents are processed sequentially rather than in parallel

Error Messages and Symptoms

From developer reports:

Aspose.Html.Converter.ConvertHTML, runaway memory leak

HTML to PDF conversion - runaway memory usage

Huge memory usage and possible memory leak

Memory profiles show:

Baseline: 200MB
After 10 conversions: 800MB
After 50 conversions: 3.5GB
After 100 conversions: OutOfMemoryException

Memory Profiler Analysis

When analyzing with dotMemory or Visual Studio Diagnostic Tools, the leak pattern becomes visible:

Memory Snapshot Comparison (Before → After 50 conversions):
===========================================================
Managed Heap Growth:
  System.String:           45 MB → 890 MB (+845 MB)
  System.Byte[]:          120 MB → 1.8 GB (+1.68 GB)
  Dictionary<,>:           15 MB → 340 MB (+325 MB)
  Aspose.Html.* objects:   60 MB → 520 MB (+460 MB)

Retention Paths (why objects aren't collected):
  Root → static field → HTMLDocument → internal cache → retained objects
  Root → finalizer queue → unmanaged wrapper → native memory

GC Generation Distribution:
  Gen 0: 50 MB (temporary objects - normal)
  Gen 1: 180 MB (short-lived survivors - elevated)
  Gen 2: 2.9 GB (long-lived objects - problematic)
  LOH:   1.4 GB (large object heap - fragmented)

The profiler reveals that internal caches hold references to converted document data, preventing garbage collection. Even after explicit disposal, these caches retain objects in Generation 2 and the Large Object Heap.

Who Is Affected

This issue impacts any application performing repeated HTML conversions:

Deployment Types: Background services, web APIs, batch processing systems, document automation pipelines.

Affected Versions: Reports span multiple years and versions, indicating a persistent architectural issue.

Common Scenarios:

Report generation services processing many documents
Invoice systems generating hundreds of PDFs daily
Document preview systems with continuous traffic
Batch export operations

Evidence from the Developer Community

Multiple Forum Reports

Recent reports on Aspose Forums:

Topic	Title	Views
#314320	Aspose.Html.Converter.ConvertHTML, runaway memory leak	Recent
#312167	HTML to PDF conversion runaway memory usage	Recent
#294992	Huge memory usage and possible memory leak	1K+
#282006	Aspose.Html Converter.ConvertTemplate throws OutOfMemoryException	1K+
#242570	Aspose HTML: Conversion to PDF High Memory Usage	2K+

Developer Reports

"Memory keeps growing with each ConvertHTML call. Even with proper disposal, memory never goes back to baseline."
— Developer, Aspose Forums, 2024

"Huge memory usage and possible memory leak when converting HTML to PDF in a batch process."
— Developer, Aspose Forums, 2024

Root Cause Analysis

The memory leak appears to stem from several factors:

Native Resource Retention: Aspose.HTML uses native components that may not release memory when .NET objects are disposed
Font Caching: Font data loaded during conversion may be cached indefinitely
Image Processing: Embedded images may not be fully released from memory
Internal Caches: The library may maintain internal caches that grow unbounded

The issue is architectural - it's not a simple bug that can be fixed with proper disposal patterns in user code.

Batch Processing Memory Benchmarks

The following benchmarks illustrate memory behavior during batch HTML-to-PDF conversion:

Test Configuration

HTML documents: Simple invoice templates (~50KB each)
System: 16GB RAM, .NET 8.0
Aspose.HTML version: 24.11

Memory Growth Over Conversions

Documents Processed	Aspose.HTML Memory	Expected Memory
0 (baseline)	180 MB	180 MB
10	420 MB	200 MB
25	890 MB	200 MB
50	1.8 GB	200 MB
75	2.9 GB	200 MB
100	3.8 GB	200 MB
125	OutOfMemoryException	200 MB

The "Expected Memory" column shows what a properly behaving library should consume - returning to baseline after each conversion with small temporary allocations.

Conversion Rate Degradation

As memory fills, performance also degrades:

Documents Processed	Time per Conversion
1-10	850 ms
25-35	1,200 ms
50-60	1,800 ms
75-85	2,400 ms
90-100	3,500 ms

The slowdown occurs because:

Garbage collector runs more frequently
Memory fragmentation increases allocation time
Page file usage increases (if enabled)

Memory Monitoring Implementation

To detect memory leaks in production before they cause failures, implement monitoring:

Basic Memory Tracking

public class MemoryMonitor
{
    private readonly long _warningThresholdMB;
    private readonly long _criticalThresholdMB;
    private readonly ILogger _logger;

    public MemoryMonitor(ILogger logger, long warningMB = 1024, long criticalMB = 2048)
    {
        _logger = logger;
        _warningThresholdMB = warningMB;
        _criticalThresholdMB = criticalMB;
    }

    public void CheckMemoryBefore(string operation)
    {
        var memoryMB = GC.GetTotalMemory(false) / 1024 / 1024;
        _logger.LogDebug("Memory before {Operation}: {MemoryMB} MB", operation, memoryMB);

        if (memoryMB > _criticalThresholdMB)
        {
            _logger.LogError("CRITICAL: Memory at {MemoryMB} MB before {Operation}", memoryMB, operation);
            // Consider triggering application restart or refusing new conversions
        }
        else if (memoryMB > _warningThresholdMB)
        {
            _logger.LogWarning("HIGH MEMORY: {MemoryMB} MB before {Operation}", memoryMB, operation);
        }
    }

    public void CheckMemoryAfter(string operation)
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();

        var memoryMB = GC.GetTotalMemory(true) / 1024 / 1024;
        _logger.LogDebug("Memory after {Operation} and GC: {MemoryMB} MB", operation, memoryMB);
    }
}

Usage in Conversion Service

public class HtmlConversionService
{
    private readonly MemoryMonitor _monitor;
    private int _conversionCount = 0;
    private const int MaxConversionsBeforeRestart = 50;

    public byte[] ConvertHtml(string html)
    {
        _monitor.CheckMemoryBefore($"Conversion #{_conversionCount}");

        try
        {
            // Conversion code here
            _conversionCount++;

            if (_conversionCount >= MaxConversionsBeforeRestart)
            {
                // Signal for graceful restart
                RequestApplicationRestart();
            }

            return result;
        }
        finally
        {
            _monitor.CheckMemoryAfter($"Conversion #{_conversionCount}");
        }
    }
}

IDisposable Best Practices (That Don't Solve the Leak)

Even with these patterns, the leak persists:

// Correct pattern - but still leaks
public byte[] ConvertWithProperDisposal(string html)
{
    byte[] result;

    // 1. Use using statements for all disposables
    using (var document = new HTMLDocument(html, "."))
    {
        using (var options = new PdfSaveOptions())
        {
            using (var stream = new MemoryStream())
            {
                Converter.ConvertHTML(document, options, stream);
                result = stream.ToArray();
            }
        }
    }

    // 2. Force garbage collection (not normally recommended)
    GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, blocking: true);
    GC.WaitForPendingFinalizers();
    GC.Collect();

    // 3. Memory still grows despite all this
    return result;
}

Attempted Workarounds

Workaround 1: Proper Disposal Pattern

Approach: Ensure all Aspose objects are properly disposed.

using (var htmlDocument = new HTMLDocument(html, "."))
{
    using (var options = new PdfSaveOptions())
    {
        Converter.ConvertHTML(htmlDocument, options, outputPath);
    }
}
GC.Collect();
GC.WaitForPendingFinalizers();

Limitations:

Does not prevent memory accumulation
Native memory is not reclaimed by GC

Workaround 2: Process Recycling

Approach: Run conversions in a separate process and terminate it periodically.

// In a separate worker process
public static void Main(string[] args)
{
    ConvertDocument(args[0], args[1]);
    Environment.Exit(0); // Clean process termination
}

Limitations:

Significant performance overhead
Complex implementation
Process startup time adds latency

Workaround 3: Scheduled Service Restarts

Approach: Configure infrastructure to restart the service periodically.

Limitations:

Causes service interruption
Not suitable for real-time systems
Masks the problem rather than solving it

A Different Approach: IronPDF

IronPDF uses a subprocess architecture that naturally isolates memory usage and provides clean release after each conversion.

Why IronPDF Handles Memory Differently

IronPDF's Chromium-based rendering runs in a separate subprocess:

Process Isolation: Rendering memory is in a separate process
Natural Cleanup: Subprocess memory is released by the OS after completion
No Accumulation: Each render gets fresh memory space
Predictable Usage: Memory behavior is consistent over time

Code Example

using IronPdf;

public class BatchPdfGenerator
{
    public void GenerateBatch(IEnumerable<DocumentData> documents)
    {
        var renderer = new ChromePdfRenderer();
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;

        foreach (var doc in documents)
        {
            // Each render has predictable memory usage
            string html = GenerateHtml(doc);

            using var pdf = renderer.RenderHtmlAsPdf(html);
            pdf.SaveAs(doc.OutputPath);

            // Memory is naturally released - no accumulation
        }
    }

    public async Task GenerateBatchAsync(IEnumerable<DocumentData> documents)
    {
        var renderer = new ChromePdfRenderer();

        var tasks = documents.Select(async doc =>
        {
            string html = GenerateHtml(doc);
            using var pdf = await renderer.RenderHtmlAsPdfAsync(html);
            await Task.Run(() => pdf.SaveAs(doc.OutputPath));
        });

        await Task.WhenAll(tasks);
        // All memory properly released
    }

    private string GenerateHtml(DocumentData doc)
    {
        return $@"
<!DOCTYPE html>
<html>
<head>
    <style>
        body {{ font-family: Arial, sans-serif; padding: 40px; }}
        h1 {{ color: #333; }}
        .content {{ line-height: 1.6; }}
    </style>
</head>
<body>
    <h1>{doc.Title}</h1>
    <div class='content'>
        {doc.Content}
    </div>
    <footer>
        <p>Generated: {DateTime.Now:yyyy-MM-dd HH:mm}</p>
    </footer>
</body>
</html>";
    }
}

public class DocumentData
{
    public string Title { get; set; }
    public string Content { get; set; }
    public string OutputPath { get; set; }
}

Memory profile with IronPDF:

Baseline: 150MB
During conversion: +50-100MB
After conversion: Returns to baseline
After 100 conversions: Still at baseline

Key points:

Memory returns to baseline after each conversion
No accumulation over time
Safe for long-running services
Works with parallel processing

API Reference

For more details:

Migration Considerations

Licensing

IronPDF is commercial software with perpetual licensing
Free trial available for evaluation
Licensing details

API Differences

Aspose.HTML: Converter.ConvertHTML() with HTMLDocument
IronPDF: ChromePdfRenderer.RenderHtmlAsPdf() with string
HTML templates typically work unchanged

What You Gain

Predictable memory behavior over time
No memory accumulation in long-running services
Suitable for batch processing and continuous operation

What to Consider

Different API surface
Chromium-based rendering vs custom renderer
Commercial licensing required

Conclusion

Aspose.HTML's memory leak during repeated conversions is a persistent issue documented across multiple years and versions. The problem appears architectural rather than a simple bug, as proper disposal and garbage collection do not prevent memory accumulation. For services performing continuous HTML-to-PDF conversion, subprocess-based architectures provide the memory isolation needed for stable long-term operation.

Jacob Mellor has spent 25+ years building developer tools, including IronPDF.

References

Aspose Forum #314320{:rel="nofollow"} - Runaway memory leak
Aspose Forum #312167{:rel="nofollow"} - Runaway memory usage
Aspose Forum #294992{:rel="nofollow"} - Huge memory usage

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

DEV Community

Aspose HTML to PDF Memory Leak (Issue Fixed)

The Problem

Error Messages and Symptoms

Memory Profiler Analysis

Who Is Affected

Evidence from the Developer Community

Multiple Forum Reports

Developer Reports

Root Cause Analysis

Batch Processing Memory Benchmarks

Test Configuration

Memory Growth Over Conversions

Conversion Rate Degradation

Memory Monitoring Implementation

Basic Memory Tracking

Usage in Conversion Service

IDisposable Best Practices (That Don't Solve the Leak)

Attempted Workarounds

Workaround 1: Proper Disposal Pattern

Workaround 2: Process Recycling

Workaround 3: Scheduled Service Restarts

A Different Approach: IronPDF

Why IronPDF Handles Memory Differently

Code Example

API Reference

Migration Considerations

Licensing

API Differences

What You Gain

What to Consider

Conclusion

References

Top comments (0)