IronSoftware

Posted on Mar 4

Puppeteer PDF Generation Memory Issues in .NET (Issue Fixed)

#csharp #dotnet

Developers using PuppeteerSharp for HTML-to-PDF conversion in .NET applications often encounter memory leaks that accumulate over time, eventually causing out-of-memory crashes. The issue stems from the complexity of managing Chromium browser lifecycles, where improper disposal of browser instances and pages leads to orphaned Chrome processes consuming system resources. This article examines the root causes, documents common patterns that cause memory growth, and presents an alternative approach using a library that manages Chrome lifecycle internally.

The Problem

PuppeteerSharp is a .NET port of the popular Node.js Puppeteer library, providing programmatic control over headless Chromium browsers. While it offers powerful browser automation capabilities including PDF generation, it requires developers to manually manage browser and page lifecycles. This manual management introduces multiple opportunities for memory leaks.

Each Chromium browser instance spawned by PuppeteerSharp consumes 50-200MB or more depending on page complexity. Individual tabs and pages consume additional memory, and this memory is not automatically released when operations complete. Without explicit disposal calls in the correct order, Chrome processes remain alive indefinitely, accumulating memory until the application crashes or the system runs out of resources.

The problem is particularly severe in long-running processes like web servers, background services, and Docker containers where PDF generation occurs repeatedly. Memory growth is gradual and insidious - not a sudden spike that would be immediately obvious, but a slow accumulation that eventually triggers OOM (out-of-memory) errors after hours or days of operation.

Error Messages and Symptoms

Developers encountering PuppeteerSharp memory issues typically observe these patterns:

System.OutOfMemoryException: Out of memory.
   at PuppeteerSharp.Page.PdfDataAsync(PdfOptions options)

PuppeteerSharp.NavigationException: Timeout of 180000ms exceeded.
   at PuppeteerSharp.Page.PdfAsync(String file, PdfOptions options)

PuppeteerSharp.TargetClosedException: Protocol error (Target.activateTarget):
Target closed. (Session closed. Most likely the page has been closed.)

Symptoms include:

Memory usage climbing steadily with each PDF generation, never returning to baseline
Dozens of orphaned Chrome processes visible in task manager or ps aux
PDF generation operations timing out after working successfully for hours
Docker containers being killed by OOM killer
DisposeAsync() calls hanging indefinitely
Application becoming unresponsive after processing several hundred documents
Kubernetes pods restarting due to memory limits

Who Is Affected

This issue impacts any .NET application using PuppeteerSharp for PDF generation at scale:

Operating Systems: Windows, Linux, and macOS deployments, with Docker containers being particularly susceptible due to constrained memory limits.

Framework Versions: .NET Core 3.1, .NET 5, .NET 6, .NET 7, and .NET 8. The issue is architectural rather than framework-specific.

Use Cases: Invoice generation services, report generation, HTML-to-PDF conversion APIs, document templating systems, certificate generation, and any high-volume PDF workflow.

Environments: Docker containers, Kubernetes clusters, AWS Lambda (though limited to 15 minutes), Azure Functions, and traditional server deployments. The problem is most visible in containerized environments where memory limits are enforced.

Evidence from the Developer Community

Memory management issues with Puppeteer and PuppeteerSharp have been documented extensively across GitHub issues, blog posts, and community discussions.

Timeline

Date	Event	Source
2019-03-01	Managed memory leak in Connection.cs identified	GitHub Issue #640
2020-01-01	Chrome memory leak pattern documented	GitHub Issue #5893
2020-05-01	Docker memory increase issue reported	GitHub Issue #5645
2021-06-01	IAsyncDisposable support discussion	GitHub Issue #1456
2021-10-01	DisposeAsync hanging forever in Docker	GitHub Issue #1489
2022-07-01	Docker container memory always increasing	GitHub Issue #8695
2023-01-01	Browser requests leak memory	GitHub Issue #9283
2024-08-01	PdfDataAsync timeout in Chromium v127+	GitHub Issue #2718
2024-10-01	Production memory leak journey documented	Medium article

Community Reports

"This wasn't the classic scenario where memory spikes and then recovers. This was something more insidious - a gradual, implicit memory increase that accumulated over time, slowly and steadily killing the service."
— Developer, Medium, October 2024

"When running PuppeteerSharp with Docker, I'm finding quite a lot of zombie Chrome processes that never get killed. Even using tini as an entry point didn't resolve the issue. The logs showed that some DisposeAsync calls sometimes never complete."
— Developer, GitHub Issue #1489

"The callback needs to be removed from the _callbacks dictionary. This causes a managed leak of memory eventually resulting in OOM."
— Developer, GitHub Issue #640

"Having lack of RAM on server is a terrible thing because it activates operating system's out-of-memory killer who starts killing any processes randomly causing service downtime."
— DevForth Engineering Blog

Production teams have reported that a healthy deployment typically runs 2-3 Chrome processes, but when cleanup fails, dozens of orphaned Chrome instances accumulate. This simple count can reveal when disposal is failing before memory usage spikes catastrophically.

Root Cause Analysis

The memory leaks in PuppeteerSharp stem from several architectural factors:

1. Browser Lifecycle Complexity

PuppeteerSharp requires explicit management of multiple disposable resources:

BrowserFetcher - Downloads Chromium binaries
Browser - The main Chromium process
Page - Individual tabs within the browser
BrowserContext - Incognito contexts for isolation

Each resource must be disposed in the correct order. Missing any disposal, or disposing in the wrong order, leaves resources orphaned.

2. Async Disposal Challenges

PuppeteerSharp implements IAsyncDisposable, but there are known issues where DisposeAsync() hangs indefinitely, particularly in Docker environments. The implementation routes DisposeAsync() to CloseAsync(), but the underlying task management has edge cases where completion is never signaled.

3. Chrome Process Management

On Linux (especially in Docker), processes with PID=1 receive special treatment that makes Chrome termination unreliable. Without proper process supervision (like dumb-init or Docker's --init flag), Chrome child processes become zombies that are never reaped.

4. Callback Dictionary Leak

A documented bug in Connection.cs caused callbacks to accumulate in a dictionary without removal, leading to managed memory growth independent of the Chrome process issues.

5. Tab Memory Growth

Even when reusing browser instances (a recommended optimization), individual tabs consume more memory over time and do not release it automatically. Eventually, tabs must be closed and recreated.

Attempted Workarounds

The PuppeteerSharp community has developed various approaches to mitigate memory issues.

Workaround 1: Explicit Try-Finally Disposal

Approach: Use try-finally blocks with explicit calls to both CloseAsync() and Dispose().

IPage page = null;
try
{
    page = await browser.NewPageAsync();
    await page.GoToAsync("https://example.com");
    var pdfBytes = await page.PdfDataAsync();
    // Process PDF...
}
finally
{
    if (page != null)
    {
        await page.CloseAsync();
        page.Dispose();
    }
}

Limitations:

Still requires manual tracking of every resource
Does not prevent the DisposeAsync hanging issue
Developers must remember to implement this pattern everywhere
Browser instance itself still needs separate management

Workaround 2: Use Synchronous Dispose Instead of DisposeAsync

Approach: Call Dispose() instead of DisposeAsync() to avoid the hanging issue.

await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
var page = await browser.NewPageAsync();
// ... generate PDF
page.Dispose();  // Use synchronous Dispose
browser.Dispose();  // Instead of DisposeAsync

Limitations:

May not properly wait for Chrome shutdown
Can leave orphaned processes in edge cases
Not idiomatic for async .NET code

Workaround 3: Docker Process Supervision

Approach: Use Docker's --init flag or dumb-init to properly reap zombie processes.

FROM mcr.microsoft.com/dotnet/aspnet:8.0

# Install dumb-init for proper process supervision
RUN apt-get update && apt-get install -y dumb-init

ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["dotnet", "YourApp.dll"]

Or with Docker run:

docker run --init your-image

Limitations:

Only addresses zombie process cleanup, not memory leaks within Chrome
Requires Docker 1.13.0 or later for --init
Does not solve the DisposeAsync hanging issue

Workaround 4: Periodic Browser Restart

Approach: Track memory usage and restart the browser instance when it exceeds a threshold.

private int _pdfCount = 0;
private const int MaxPdfsPerBrowser = 100;

public async Task<byte[]> GeneratePdf(string html)
{
    if (_pdfCount >= MaxPdfsPerBrowser)
    {
        await _browser.CloseAsync();
        _browser.Dispose();
        _browser = await Puppeteer.LaunchAsync(_launchOptions);
        _pdfCount = 0;
    }

    _pdfCount++;
    // Generate PDF...
}

Limitations:

Adds latency when browser restarts (3+ seconds per restart)
Complex to implement correctly with concurrent requests
Arbitrary threshold may not match actual memory pressure

Workaround 5: Limit Concurrency

Approach: Restrict concurrent PDF generation to prevent memory spikes.

private static readonly SemaphoreSlim _semaphore = new(Environment.ProcessorCount - 1);

public async Task<byte[]> GeneratePdfWithConcurrencyLimit(string html)
{
    await _semaphore.WaitAsync();
    try
    {
        // Generate PDF
    }
    finally
    {
        _semaphore.Release();
    }
}

Limitations:

Does not prevent memory accumulation, only slows it
Reduces throughput
Does not address the root disposal issues

A Different Approach: IronPDF

For teams where managing Chromium lifecycle is consuming significant engineering effort, libraries that handle browser lifecycle internally eliminate the category of bugs entirely. IronPDF embeds a Chromium rendering engine but manages its lifecycle automatically, removing the burden of browser instance management from application code.

Why IronPDF Avoids This Issue

IronPDF's architecture differs from PuppeteerSharp in how it manages the Chrome rendering engine:

Automatic lifecycle management: The Chrome engine is started, managed, and terminated internally without developer intervention
No browser instance tracking: Developers do not need to track browser or page objects
Memory efficiency: Built-in streaming support for large documents prevents memory spikes
Proper cleanup: Resources are released when PdfDocument objects are disposed, using familiar .NET patterns
No external dependencies: Chrome is embedded within the library - no separate download or installation required

The difference is architectural: PuppeteerSharp exposes browser automation as a general-purpose API where PDF generation is one feature. IronPDF is purpose-built for PDF operations, using Chrome rendering internally without exposing the complexity.

Code Example

The following example demonstrates high-volume PDF generation without the lifecycle management overhead:

using IronPdf;
using System;
using System.Threading.Tasks;

public class PdfGenerationService
{
    public PdfGenerationService()
    {
        // Optional: Configure once at startup
        // IronPDF manages Chrome lifecycle automatically
        Installation.LinuxAndDockerDependenciesAutoConfig = true;
    }

    public byte[] GeneratePdfFromHtml(string htmlContent)
    {
        // Create renderer - no browser launch delay
        var renderer = new ChromePdfRenderer();

        // Configure PDF options
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;
        renderer.RenderingOptions.PaperSize = IronPdf.Rendering.PdfPaperSize.A4;

        // Render HTML to PDF
        // Chrome engine managed internally - no lifecycle to track
        using (var pdf = renderer.RenderHtmlAsPdf(htmlContent))
        {
            // Memory is released when using block exits
            return pdf.BinaryData;
        }
    }

    public async Task GenerateBatchPdfs(int count)
    {
        var renderer = new ChromePdfRenderer();

        // Process thousands of PDFs without memory accumulation
        for (int i = 0; i < count; i++)
        {
            string html = $@"
                <html>
                <head>
                    <style>
                        body {{ font-family: Arial, sans-serif; padding: 40px; }}
                        h1 {{ color: #333; }}
                        .invoice-number {{ font-size: 24px; color: #666; }}
                    </style>
                </head>
                <body>
                    <h1>Invoice</h1>
                    <p class='invoice-number'>INV-{i:D6}</p>
                    <p>Generated: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss}</p>
                </body>
                </html>";

            using (var pdf = renderer.RenderHtmlAsPdf(html))
            {
                pdf.SaveAs($"/output/invoice_{i:D6}.pdf");
            }

            // Memory returns to baseline after each iteration
            // No browser restarts required
            // No orphaned Chrome processes
        }
    }

    public byte[] GenerateFromUrl(string url)
    {
        var renderer = new ChromePdfRenderer();

        // Render external URL - JavaScript executes automatically
        using (var pdf = renderer.RenderUrlAsPdf(url))
        {
            return pdf.BinaryData;
        }
    }
}

Key points about this code:

No BrowserFetcher.DownloadAsync() - Chrome is embedded
No Puppeteer.LaunchAsync() - browser lifecycle is automatic
No browser or page disposal code - managed by the library
Standard using blocks release memory predictably
Same code works on Windows, Linux, and macOS without changes
Docker containers work without --init flag or process supervisors

Comparison: PuppeteerSharp vs IronPDF Setup

PuppeteerSharp approach:

// Download Chromium (required on each deployment)
var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();

// Launch browser (3+ second startup time)
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = true,
    Args = new[] { "--no-sandbox", "--disable-dev-shm-usage" }
});

// Create page
await using var page = await browser.NewPageAsync();

// Navigate and generate PDF
await page.SetContentAsync(html);
var pdfBytes = await page.PdfDataAsync();

// Must dispose page, then browser, in correct order
// DisposeAsync may hang in Docker

IronPDF approach:

var renderer = new ChromePdfRenderer();
using var pdf = renderer.RenderHtmlAsPdf(html);
var pdfBytes = pdf.BinaryData;
// Done - no lifecycle management

API Reference

For details on the methods used above:

ChromePdfRenderer - Main rendering class
RenderHtmlAsPdf - HTML to PDF conversion
Docker and Linux Deployment - Container configuration guide
IronPDF vs PuppeteerSharp Comparison - Detailed feature comparison

Migration Considerations

Licensing

IronPDF is commercial software with per-developer licensing. A free trial is available for evaluation. Teams should verify that IronPDF meets their requirements before committing to migration, particularly if PuppeteerSharp was chosen specifically for its open-source license.

API Differences

The APIs differ significantly in philosophy:

PuppeteerSharp: General browser automation API with PDF as one capability
IronPDF: Purpose-built PDF API using Chrome rendering internally

Migration involves replacing browser lifecycle code with direct PDF generation calls. For applications using PuppeteerSharp only for PDF generation, this simplifies the codebase. For applications using browser automation features beyond PDF (screenshots, testing, scraping), IronPDF would only replace the PDF portion.

What You Gain

Elimination of browser lifecycle management code
No Chrome process accumulation or zombie processes
Predictable memory behavior without monitoring infrastructure
Consistent behavior across Windows, Linux, and macOS
Docker containers without process supervision requirements
Faster PDF generation (no browser launch overhead per operation)

What to Consider

Commercial licensing cost versus engineering time spent on memory management
Migration effort for existing PuppeteerSharp codebases
If using PuppeteerSharp for non-PDF browser automation, that code remains separate
Different rendering engine may produce slightly different output formatting

Conclusion

PuppeteerSharp memory issues stem from the inherent complexity of managing Chromium browser lifecycles in long-running .NET applications. The combination of async disposal challenges, Chrome process management on Linux, and callback dictionary leaks creates a category of bugs that requires ongoing engineering attention. For teams where PDF generation is the primary use case, switching to a library with managed Chrome lifecycle eliminates these issues at the architectural level.

Jacob Mellor leads technical development at Iron Software and has 25+ years experience building developer tools.

References

Managed memory leak in Connection.cs - Issue #640{:rel="nofollow"} - Original memory leak identification
DisposeAsync hanging forever - Issue #1489{:rel="nofollow"} - Docker disposal hanging issue
PdfDataAsync timeout in Chromium v127+ - Issue #2718{:rel="nofollow"} - Recent PDF generation timeout
Docker container memory always increasing - Issue #8695{:rel="nofollow"} - Container memory growth
Chrome memory leak - Issue #5893{:rel="nofollow"} - Chrome process memory leak
Browser requests leak memory - Issue #9283{:rel="nofollow"} - Request-related memory leak
The Hidden Cost of Headless Browsers: A Puppeteer Memory Leak Journey{:rel="nofollow"} - Production memory leak case study
How to simply workaround RAM-leaking libraries like Puppeteer{:rel="nofollow"} - Community workaround guide
Optimizing Puppeteer PDF generation{:rel="nofollow"} - Performance optimization strategies
Puppeteer Troubleshooting Documentation{:rel="nofollow"} - Official troubleshooting guide
PuppeteerSharp Memory Management Considerations{:rel="nofollow"} - Memory management best practices

For IronPDF documentation and tutorials, visit ironpdf.com.

DEV Community

Puppeteer PDF Generation Memory Issues in .NET (Issue Fixed)

The Problem

Error Messages and Symptoms

Who Is Affected

Evidence from the Developer Community

Timeline

Community Reports

Root Cause Analysis

1. Browser Lifecycle Complexity

2. Async Disposal Challenges

3. Chrome Process Management

4. Callback Dictionary Leak

5. Tab Memory Growth

Attempted Workarounds

Workaround 1: Explicit Try-Finally Disposal

Workaround 2: Use Synchronous Dispose Instead of DisposeAsync

Workaround 3: Docker Process Supervision

Workaround 4: Periodic Browser Restart

Workaround 5: Limit Concurrency

A Different Approach: IronPDF

Why IronPDF Avoids This Issue

Code Example

Comparison: PuppeteerSharp vs IronPDF Setup

API Reference

Migration Considerations

Licensing

API Differences

What You Gain

What to Consider

Conclusion

References

Top comments (0)