IronSoftware

Posted on Apr 3

iTextSharp HTML to PDF in C#: Why HTMLWorker Breaks (Issue Fixed)

#dotnet #csharp

Developers searching for HTML-to-PDF conversion with iTextSharp encounter a fundamental limitation: iTextSharp and iText 7 were not designed for rendering HTML as a web browser would. The libraries provide limited HTML parsing capabilities that break with modern HTML5/CSS3 content. With over 300,000 developers viewing this question on Stack Overflow, the demand is clear but the solution within iText is not straightforward. This article examines the limitations, shows the correct patterns for iTextSharp, and presents alternatives with browser-accurate rendering.

The Problem

iTextSharp (the .NET port of iText 5) and its successor iText 7 are PDF manipulation libraries, not HTML rendering engines. They provide HTMLWorker (deprecated) and pdfHTML (add-on product) for HTML parsing, but neither implements a browser-compatible rendering engine.

When developers attempt to convert HTML to PDF with iTextSharp, they encounter:

Incomplete CSS support (no flexbox, grid, or modern layout)
Missing JavaScript execution (dynamic content doesn't render)
Broken layouts for responsive designs
No support for web fonts without manual configuration
Image handling issues with relative URLs

The expectation gap is significant: developers want browser-quality PDF output, but iTextSharp provides basic HTML parsing at best.

Error Messages and Symptoms

Common issues when using iTextSharp's HTMLWorker:

iTextSharp.text.html.simpleparser.HTMLWorker is obsolete

// This approach produces broken output for modern HTML
using (var stringReader = new StringReader(html))
{
    HTMLWorker htmlWorker = new HTMLWorker(document);
    htmlWorker.Parse(stringReader); // Fails on CSS3, HTML5 elements
}

With iText 7's pdfHTML add-on:

com.itextpdf.html2pdf.exceptions.CssApplierInitializationException:
Cannot find CSS applier for tag 'article'

com.itextpdf.html2pdf.exceptions.TagWorkerInitializationException:
Tag worker for element 'section' was not found

Symptoms include:

Tables rendering without proper column widths
Floated elements collapsing or overlapping
Background images not appearing
Custom fonts displaying as fallback fonts
Responsive layouts breaking completely

Who Is Affected

The 309,000+ views on Stack Overflow indicate massive demand across:

Industries: Reporting systems, invoice generation, document automation, legal document processing, healthcare records, and any system generating documents from web content.

Frameworks: .NET Framework with iTextSharp, .NET Core/5/6/7/8 with iText 7.

Use Cases:

Converting HTML templates to PDF invoices or receipts
Generating reports from HTML dashboards
Creating printable versions of web pages
Automating document generation from CMS content

Evidence from the Developer Community

Scale of the Problem

Stack Overflow tracks these highly-viewed questions:

Question	Views	Score
"How to convert HTML to PDF using iTextSharp"	309,021	77
"Convert HTML to PDF in .NET"	959,034	528
"iText 7 HTML to PDF conversion"	185,727	21

Timeline

Date	Event	Source
2014-08-06	Original question posted	Stack Overflow
2015-2018	Answers reference HTMLWorker (now deprecated)	Stack Overflow
2019-2021	Answers shift to pdfHTML add-on recommendation	Stack Overflow
2025-01-15	Question still receiving new views and answers	Stack Overflow

Community Reports

"I want to convert the below HTML to PDF using iTextSharp but don't know where to start."
— Developer, Stack Overflow, August 2014

"HTMLWorker is deprecated and limited. pdfHTML is better but requires a separate license."
— Stack Overflow Answer, 2020

"For anything with complex CSS, you need a browser engine. iText's HTML support is basic."
— Stack Overflow Comment, 2023

Getting Started: NuGet Package Installation

Before examining the limitations, developers need to understand the different package options.

iTextSharp (.NET Framework)

# For .NET Framework 4.x projects
Install-Package iTextSharp -Version 5.5.13.3

<!-- Package reference in .csproj -->
<PackageReference Include="iTextSharp" Version="5.5.13.3" />

Note: iTextSharp 5.x is no longer actively developed. The last release was in 2022.

iText 7 (.NET Core / .NET 5+)

# Core library
Install-Package iText7 -Version 8.0.2

# HTML conversion add-on (separate commercial license required)
Install-Package itext7.pdfhtml -Version 5.0.2

<!-- Package references -->
<PackageReference Include="iText7" Version="8.0.2" />
<PackageReference Include="itext7.pdfhtml" Version="5.0.2" />

Project Configuration

For .NET Core or .NET 6+ projects:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="iText7" Version="8.0.2" />
    <PackageReference Include="itext7.pdfhtml" Version="5.0.2" />
  </ItemGroup>
</Project>

Licensing Considerations

iText uses dual licensing:

AGPL v3: Free for open-source projects that release source code
Commercial License: Required for proprietary/closed-source applications

The pdfHTML add-on requires an additional commercial license beyond the base iText 7 license.

Root Cause Analysis

iTextSharp and iText 7 are fundamentally PDF object manipulation libraries. PDF generation from scratch, modification, form filling, and extraction are their primary purposes.

HTML-to-PDF conversion requires:

HTML Parser: Understanding HTML5 elements and structure
CSS Engine: Implementing the CSS box model, selectors, cascading
Layout Engine: Calculating positions, handling floats, flexbox, grid
Font Renderer: Loading and rendering fonts including web fonts
JavaScript Engine: Executing scripts that modify content

iText provides partial implementations of 1 and 2, with significant gaps. It does not include 3, 4, or 5 in any browser-compatible form.

HTMLWorker vs pdfHTML Comparison

Feature	HTMLWorker (iText 5)	pdfHTML (iText 7)
Status	Deprecated	Active
License	AGPL/Commercial	Additional license required
HTML5 elements	Limited	Partial
CSS3 support	Minimal	Partial (no flexbox/grid)
JavaScript	No	No
Web fonts	No	Limited
Active development	No	Yes

The pdfHTML add-on improves on HTMLWorker but still lacks:

Full CSS3 support (flexbox, grid, variables)
JavaScript execution
Browser-compatible rendering

This is a design limitation, not a bug. Implementing a full browser engine would be a separate product entirely.

Attempted Workarounds

Workaround 1: pdfHTML Add-on

Approach: Purchase and use iText's pdfHTML commercial add-on.

using iText.Html2pdf;

public byte[] ConvertWithPdfHtml(string html)
{
    using var outputStream = new MemoryStream();
    HtmlConverter.ConvertToPdf(html, outputStream);
    return outputStream.ToArray();
}

Limitations:

Separate commercial license required
Still lacks flexbox, grid, and modern CSS3
No JavaScript execution
Output quality varies significantly from browser rendering

Workaround 2: Pre-process HTML

Approach: Simplify HTML to only use features iTextSharp supports.

// Convert modern HTML to iText-compatible subset
string simplifiedHtml = html
    .Replace("<article>", "<div>")
    .Replace("</article>", "</div>")
    .Replace("<section>", "<div>")
    .Replace("</section>", "</div>");
// Remove CSS that won't work
// Inline all styles
// Etc.

Limitations:

Significant development effort
Maintains two versions of templates
Breaks responsive designs
Not scalable for complex documents

Workaround 3: HTML to Image to PDF

Approach: Render HTML to an image using a browser automation tool, then embed in PDF.

Limitations:

Text is not selectable in output PDF
File sizes much larger
Quality issues with scaling
Loss of PDF features (links, bookmarks)

A Different Approach: IronPDF

IronPDF embeds a Chromium browser engine specifically for HTML-to-PDF conversion. Rather than parsing HTML and approximating browser behavior, it renders HTML exactly as Chrome would.

Why IronPDF Handles HTML Differently

IronPDF's architecture is fundamentally different:

Full Chromium Engine: Same rendering engine used by Google Chrome
Complete CSS3: Flexbox, Grid, Variables, Animations (rendered as final state)
JavaScript Execution: Dynamic content, charting libraries, frameworks all work
Web Fonts: Google Fonts, custom fonts load automatically
Responsive Rendering: Media queries and viewport handling included

When you convert HTML with IronPDF, you're generating a PDF from an actual browser render, not a parsed approximation.

Code Example

using IronPdf;

public class ReportGenerator
{
    public byte[] GenerateReport(string htmlContent)
    {
        var renderer = new ChromePdfRenderer();

        // Configure rendering options
        renderer.RenderingOptions.MarginTop = 25;
        renderer.RenderingOptions.MarginBottom = 25;
        renderer.RenderingOptions.MarginLeft = 20;
        renderer.RenderingOptions.MarginRight = 20;

        // Enable features for complex content
        renderer.RenderingOptions.EnableJavaScript = true;
        renderer.RenderingOptions.RenderDelay = 500; // Wait for JS to complete

        // Render HTML to PDF - uses actual Chrome rendering
        using var pdf = renderer.RenderHtmlAsPdf(htmlContent);

        return pdf.BinaryData;
    }
}

Example with modern HTML/CSS:

public byte[] CreateModernDashboardPdf()
{
    var renderer = new ChromePdfRenderer();

    string html = @"
<!DOCTYPE html>
<html>
<head>
    <link href='https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap' rel='stylesheet'>
    <style>
        body {
            font-family: 'Inter', sans-serif;
            margin: 0;
            padding: 20px;
        }
        .dashboard {
            display: grid;
            grid-template-columns: repeat(3, 1fr);
            gap: 20px;
        }
        .card {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            border-radius: 12px;
            padding: 24px;
            color: white;
        }
        .card h3 {
            margin: 0 0 8px 0;
            font-weight: 600;
        }
        .card .value {
            font-size: 2.5rem;
            font-weight: 600;
        }
        @media print {
            .dashboard { grid-template-columns: repeat(2, 1fr); }
        }
    </style>
</head>
<body>
    <h1>Q4 2025 Dashboard</h1>
    <div class='dashboard'>
        <div class='card'>
            <h3>Revenue</h3>
            <div class='value'>$2.4M</div>
        </div>
        <div class='card'>
            <h3>Users</h3>
            <div class='value'>48.2K</div>
        </div>
        <div class='card'>
            <h3>Conversion</h3>
            <div class='value'>3.2%</div>
        </div>
    </div>
</body>
</html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);
    return pdf.BinaryData;
}

Key points about this code:

CSS Grid layout works exactly as in a browser
Google Fonts load automatically
Gradients and border-radius render correctly
Print media queries are respected

Converting URLs

public byte[] ConvertWebPage(string url)
{
    var renderer = new ChromePdfRenderer();

    // Render a live web page
    using var pdf = renderer.RenderUrlAsPdf(url);

    return pdf.BinaryData;
}

JavaScript Execution: Charts and Dynamic Content

One major difference is JavaScript support. iText cannot execute JavaScript, so charts and dynamically generated content do not render.

public byte[] GenerateChartPdf()
{
    var renderer = new ChromePdfRenderer();

    // Enable JavaScript and wait for execution
    renderer.RenderingOptions.EnableJavaScript = true;
    renderer.RenderingOptions.WaitFor.JavaScript(2000); // Wait up to 2 seconds

    string html = @"
<!DOCTYPE html>
<html>
<head>
    <script src='https://cdn.jsdelivr.net/npm/chart.js'></script>
</head>
<body>
    <h1>Sales Report</h1>
    <canvas id='myChart' width='400' height='200'></canvas>
    <script>
        var ctx = document.getElementById('myChart').getContext('2d');
        new Chart(ctx, {
            type: 'bar',
            data: {
                labels: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
                datasets: [{
                    label: 'Sales ($K)',
                    data: [12, 19, 3, 5, 2, 3],
                    backgroundColor: 'rgba(54, 162, 235, 0.8)'
                }]
            }
        });
    </script>
</body>
</html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);
    return pdf.BinaryData;
}

Print-Specific CSS Handling

IronPDF respects print media queries, allowing print-specific styling:

public byte[] GeneratePrintOptimizedDocument()
{
    var renderer = new ChromePdfRenderer();

    // Use print media type
    renderer.RenderingOptions.CssMediaType = IronPdf.Rendering.PdfCssMediaType.Print;

    string html = @"
<!DOCTYPE html>
<html>
<head>
    <style>
        /* Screen styles */
        @media screen {
            body { background: #f5f5f5; }
            .no-print { display: block; }
        }

        /* Print styles */
        @media print {
            body {
                background: white;
                font-size: 12pt;
            }
            .no-print { display: none; }
            .page-break { page-break-after: always; }

            a[href]::after {
                content: ' (' attr(href) ')';
            }
        }
    </style>
</head>
<body>
    <div class='no-print'>This navigation won't appear in PDF</div>

    <h1>Chapter 1</h1>
    <p>Content for first page...</p>

    <div class='page-break'></div>

    <h1>Chapter 2</h1>
    <p>Content for second page...</p>
</body>
</html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);
    return pdf.BinaryData;
}

PDF/A Compliance for Archival

For long-term document archival:

public byte[] GenerateArchivalPdf()
{
    var renderer = new ChromePdfRenderer();

    string html = "<html><body><h1>Archived Document</h1></body></html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);

    // Save as PDF/A-3b for archival compliance
    pdf.SaveAsPdfA("archived-document.pdf", IronPdf.PdfAVersions.PdfA3b);

    return pdf.BinaryData;
}

API Reference

For more details on the methods used:

Migration Considerations

Licensing

IronPDF is commercial software with per-developer licensing
Free trial available for evaluation
Pricing details

API Differences

iTextSharp: PDF object manipulation with limited HTML parsing
IronPDF: HTML/CSS-first approach with full browser rendering

Migration typically means:

Remove HTMLWorker or pdfHTML code
Replace with ChromePdfRenderer
Keep existing HTML templates unchanged (or simplify them)

What You Gain

Browser-quality PDF output from any HTML/CSS
JavaScript support for dynamic content
Full CSS3 including modern layout (flexbox, grid)
Web fonts load automatically

What to Consider

Chromium binaries add to deployment size
Different pricing model than iText
Different API paradigm (HTML-first vs PDF-first)

Conclusion

iTextSharp and iText 7 were designed for PDF manipulation, not browser-quality HTML rendering. Developers seeking accurate HTML-to-PDF conversion face fundamental limitations with these libraries. For projects where HTML templates must render exactly as they appear in browsers, a Chromium-based approach provides the rendering accuracy that parsed HTML cannot achieve.

Jacob Mellor has spent 25+ years building developer tools, including IronPDF.

References

Stack Overflow: How to convert HTML to PDF using iTextSharp{:rel="nofollow"} - 309K+ views
Stack Overflow: Convert HTML to PDF in .NET{:rel="nofollow"} - 959K+ views

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

DEV Community

iTextSharp HTML to PDF in C#: Why HTMLWorker Breaks (Issue Fixed)

The Problem

Error Messages and Symptoms

Who Is Affected

Evidence from the Developer Community

Scale of the Problem

Timeline

Community Reports

Getting Started: NuGet Package Installation

iTextSharp (.NET Framework)

iText 7 (.NET Core / .NET 5+)

Project Configuration

Licensing Considerations

Root Cause Analysis

HTMLWorker vs pdfHTML Comparison

Attempted Workarounds

Workaround 1: pdfHTML Add-on

Workaround 2: Pre-process HTML

Workaround 3: HTML to Image to PDF

A Different Approach: IronPDF

Why IronPDF Handles HTML Differently

Code Example

Converting URLs

JavaScript Execution: Charts and Dynamic Content

Print-Specific CSS Handling

PDF/A Compliance for Archival

API Reference

Migration Considerations

Licensing

API Differences

What You Gain

What to Consider

Conclusion

References

Top comments (0)