DEV Community

IronSoftware
IronSoftware

Posted on

iTextSharp HTML to PDF in C#: Why HTMLWorker Breaks (Issue Fixed)

Developers searching for HTML-to-PDF conversion with iTextSharp encounter a fundamental limitation: iTextSharp and iText 7 were not designed for rendering HTML as a web browser would. The libraries provide limited HTML parsing capabilities that break with modern HTML5/CSS3 content. With over 300,000 developers viewing this question on Stack Overflow, the demand is clear but the solution within iText is not straightforward. This article examines the limitations, shows the correct patterns for iTextSharp, and presents alternatives with browser-accurate rendering.

The Problem

iTextSharp (the .NET port of iText 5) and its successor iText 7 are PDF manipulation libraries, not HTML rendering engines. They provide HTMLWorker (deprecated) and pdfHTML (add-on product) for HTML parsing, but neither implements a browser-compatible rendering engine.

When developers attempt to convert HTML to PDF with iTextSharp, they encounter:

  • Incomplete CSS support (no flexbox, grid, or modern layout)
  • Missing JavaScript execution (dynamic content doesn't render)
  • Broken layouts for responsive designs
  • No support for web fonts without manual configuration
  • Image handling issues with relative URLs

The expectation gap is significant: developers want browser-quality PDF output, but iTextSharp provides basic HTML parsing at best.

Error Messages and Symptoms

Common issues when using iTextSharp's HTMLWorker:

iTextSharp.text.html.simpleparser.HTMLWorker is obsolete
Enter fullscreen mode Exit fullscreen mode
// This approach produces broken output for modern HTML
using (var stringReader = new StringReader(html))
{
    HTMLWorker htmlWorker = new HTMLWorker(document);
    htmlWorker.Parse(stringReader); // Fails on CSS3, HTML5 elements
}
Enter fullscreen mode Exit fullscreen mode

With iText 7's pdfHTML add-on:

com.itextpdf.html2pdf.exceptions.CssApplierInitializationException:
Cannot find CSS applier for tag 'article'
Enter fullscreen mode Exit fullscreen mode
com.itextpdf.html2pdf.exceptions.TagWorkerInitializationException:
Tag worker for element 'section' was not found
Enter fullscreen mode Exit fullscreen mode

Symptoms include:

  • Tables rendering without proper column widths
  • Floated elements collapsing or overlapping
  • Background images not appearing
  • Custom fonts displaying as fallback fonts
  • Responsive layouts breaking completely

Who Is Affected

The 309,000+ views on Stack Overflow indicate massive demand across:

Industries: Reporting systems, invoice generation, document automation, legal document processing, healthcare records, and any system generating documents from web content.

Frameworks: .NET Framework with iTextSharp, .NET Core/5/6/7/8 with iText 7.

Use Cases:

  • Converting HTML templates to PDF invoices or receipts
  • Generating reports from HTML dashboards
  • Creating printable versions of web pages
  • Automating document generation from CMS content

Evidence from the Developer Community

Scale of the Problem

Stack Overflow tracks these highly-viewed questions:

Question Views Score
"How to convert HTML to PDF using iTextSharp" 309,021 77
"Convert HTML to PDF in .NET" 959,034 528
"iText 7 HTML to PDF conversion" 185,727 21

Timeline

Date Event Source
2014-08-06 Original question posted Stack Overflow
2015-2018 Answers reference HTMLWorker (now deprecated) Stack Overflow
2019-2021 Answers shift to pdfHTML add-on recommendation Stack Overflow
2025-01-15 Question still receiving new views and answers Stack Overflow

Community Reports

"I want to convert the below HTML to PDF using iTextSharp but don't know where to start."
— Developer, Stack Overflow, August 2014

"HTMLWorker is deprecated and limited. pdfHTML is better but requires a separate license."
— Stack Overflow Answer, 2020

"For anything with complex CSS, you need a browser engine. iText's HTML support is basic."
— Stack Overflow Comment, 2023

Getting Started: NuGet Package Installation

Before examining the limitations, developers need to understand the different package options.

iTextSharp (.NET Framework)

# For .NET Framework 4.x projects
Install-Package iTextSharp -Version 5.5.13.3
Enter fullscreen mode Exit fullscreen mode
<!-- Package reference in .csproj -->
<PackageReference Include="iTextSharp" Version="5.5.13.3" />
Enter fullscreen mode Exit fullscreen mode

Note: iTextSharp 5.x is no longer actively developed. The last release was in 2022.

iText 7 (.NET Core / .NET 5+)

# Core library
Install-Package iText7 -Version 8.0.2

# HTML conversion add-on (separate commercial license required)
Install-Package itext7.pdfhtml -Version 5.0.2
Enter fullscreen mode Exit fullscreen mode
<!-- Package references -->
<PackageReference Include="iText7" Version="8.0.2" />
<PackageReference Include="itext7.pdfhtml" Version="5.0.2" />
Enter fullscreen mode Exit fullscreen mode

Project Configuration

For .NET Core or .NET 6+ projects:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="iText7" Version="8.0.2" />
    <PackageReference Include="itext7.pdfhtml" Version="5.0.2" />
  </ItemGroup>
</Project>
Enter fullscreen mode Exit fullscreen mode

Licensing Considerations

iText uses dual licensing:

  • AGPL v3: Free for open-source projects that release source code
  • Commercial License: Required for proprietary/closed-source applications

The pdfHTML add-on requires an additional commercial license beyond the base iText 7 license.

Root Cause Analysis

iTextSharp and iText 7 are fundamentally PDF object manipulation libraries. PDF generation from scratch, modification, form filling, and extraction are their primary purposes.

HTML-to-PDF conversion requires:

  1. HTML Parser: Understanding HTML5 elements and structure
  2. CSS Engine: Implementing the CSS box model, selectors, cascading
  3. Layout Engine: Calculating positions, handling floats, flexbox, grid
  4. Font Renderer: Loading and rendering fonts including web fonts
  5. JavaScript Engine: Executing scripts that modify content

iText provides partial implementations of 1 and 2, with significant gaps. It does not include 3, 4, or 5 in any browser-compatible form.

HTMLWorker vs pdfHTML Comparison

Feature HTMLWorker (iText 5) pdfHTML (iText 7)
Status Deprecated Active
License AGPL/Commercial Additional license required
HTML5 elements Limited Partial
CSS3 support Minimal Partial (no flexbox/grid)
JavaScript No No
Web fonts No Limited
Active development No Yes

The pdfHTML add-on improves on HTMLWorker but still lacks:

  • Full CSS3 support (flexbox, grid, variables)
  • JavaScript execution
  • Browser-compatible rendering

This is a design limitation, not a bug. Implementing a full browser engine would be a separate product entirely.

Attempted Workarounds

Workaround 1: pdfHTML Add-on

Approach: Purchase and use iText's pdfHTML commercial add-on.

using iText.Html2pdf;

public byte[] ConvertWithPdfHtml(string html)
{
    using var outputStream = new MemoryStream();
    HtmlConverter.ConvertToPdf(html, outputStream);
    return outputStream.ToArray();
}
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Separate commercial license required
  • Still lacks flexbox, grid, and modern CSS3
  • No JavaScript execution
  • Output quality varies significantly from browser rendering

Workaround 2: Pre-process HTML

Approach: Simplify HTML to only use features iTextSharp supports.

// Convert modern HTML to iText-compatible subset
string simplifiedHtml = html
    .Replace("<article>", "<div>")
    .Replace("</article>", "</div>")
    .Replace("<section>", "<div>")
    .Replace("</section>", "</div>");
// Remove CSS that won't work
// Inline all styles
// Etc.
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Significant development effort
  • Maintains two versions of templates
  • Breaks responsive designs
  • Not scalable for complex documents

Workaround 3: HTML to Image to PDF

Approach: Render HTML to an image using a browser automation tool, then embed in PDF.

Limitations:

  • Text is not selectable in output PDF
  • File sizes much larger
  • Quality issues with scaling
  • Loss of PDF features (links, bookmarks)

A Different Approach: IronPDF

IronPDF embeds a Chromium browser engine specifically for HTML-to-PDF conversion. Rather than parsing HTML and approximating browser behavior, it renders HTML exactly as Chrome would.

Why IronPDF Handles HTML Differently

IronPDF's architecture is fundamentally different:

  • Full Chromium Engine: Same rendering engine used by Google Chrome
  • Complete CSS3: Flexbox, Grid, Variables, Animations (rendered as final state)
  • JavaScript Execution: Dynamic content, charting libraries, frameworks all work
  • Web Fonts: Google Fonts, custom fonts load automatically
  • Responsive Rendering: Media queries and viewport handling included

When you convert HTML with IronPDF, you're generating a PDF from an actual browser render, not a parsed approximation.

Code Example

using IronPdf;

public class ReportGenerator
{
    public byte[] GenerateReport(string htmlContent)
    {
        var renderer = new ChromePdfRenderer();

        // Configure rendering options
        renderer.RenderingOptions.MarginTop = 25;
        renderer.RenderingOptions.MarginBottom = 25;
        renderer.RenderingOptions.MarginLeft = 20;
        renderer.RenderingOptions.MarginRight = 20;

        // Enable features for complex content
        renderer.RenderingOptions.EnableJavaScript = true;
        renderer.RenderingOptions.RenderDelay = 500; // Wait for JS to complete

        // Render HTML to PDF - uses actual Chrome rendering
        using var pdf = renderer.RenderHtmlAsPdf(htmlContent);

        return pdf.BinaryData;
    }
}
Enter fullscreen mode Exit fullscreen mode

Example with modern HTML/CSS:

public byte[] CreateModernDashboardPdf()
{
    var renderer = new ChromePdfRenderer();

    string html = @"
<!DOCTYPE html>
<html>
<head>
    <link href='https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap' rel='stylesheet'>
    <style>
        body {
            font-family: 'Inter', sans-serif;
            margin: 0;
            padding: 20px;
        }
        .dashboard {
            display: grid;
            grid-template-columns: repeat(3, 1fr);
            gap: 20px;
        }
        .card {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            border-radius: 12px;
            padding: 24px;
            color: white;
        }
        .card h3 {
            margin: 0 0 8px 0;
            font-weight: 600;
        }
        .card .value {
            font-size: 2.5rem;
            font-weight: 600;
        }
        @media print {
            .dashboard { grid-template-columns: repeat(2, 1fr); }
        }
    </style>
</head>
<body>
    <h1>Q4 2025 Dashboard</h1>
    <div class='dashboard'>
        <div class='card'>
            <h3>Revenue</h3>
            <div class='value'>$2.4M</div>
        </div>
        <div class='card'>
            <h3>Users</h3>
            <div class='value'>48.2K</div>
        </div>
        <div class='card'>
            <h3>Conversion</h3>
            <div class='value'>3.2%</div>
        </div>
    </div>
</body>
</html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);
    return pdf.BinaryData;
}
Enter fullscreen mode Exit fullscreen mode

Key points about this code:

  • CSS Grid layout works exactly as in a browser
  • Google Fonts load automatically
  • Gradients and border-radius render correctly
  • Print media queries are respected

Converting URLs

public byte[] ConvertWebPage(string url)
{
    var renderer = new ChromePdfRenderer();

    // Render a live web page
    using var pdf = renderer.RenderUrlAsPdf(url);

    return pdf.BinaryData;
}
Enter fullscreen mode Exit fullscreen mode

JavaScript Execution: Charts and Dynamic Content

One major difference is JavaScript support. iText cannot execute JavaScript, so charts and dynamically generated content do not render.

public byte[] GenerateChartPdf()
{
    var renderer = new ChromePdfRenderer();

    // Enable JavaScript and wait for execution
    renderer.RenderingOptions.EnableJavaScript = true;
    renderer.RenderingOptions.WaitFor.JavaScript(2000); // Wait up to 2 seconds

    string html = @"
<!DOCTYPE html>
<html>
<head>
    <script src='https://cdn.jsdelivr.net/npm/chart.js'></script>
</head>
<body>
    <h1>Sales Report</h1>
    <canvas id='myChart' width='400' height='200'></canvas>
    <script>
        var ctx = document.getElementById('myChart').getContext('2d');
        new Chart(ctx, {
            type: 'bar',
            data: {
                labels: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
                datasets: [{
                    label: 'Sales ($K)',
                    data: [12, 19, 3, 5, 2, 3],
                    backgroundColor: 'rgba(54, 162, 235, 0.8)'
                }]
            }
        });
    </script>
</body>
</html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);
    return pdf.BinaryData;
}
Enter fullscreen mode Exit fullscreen mode

Print-Specific CSS Handling

IronPDF respects print media queries, allowing print-specific styling:

public byte[] GeneratePrintOptimizedDocument()
{
    var renderer = new ChromePdfRenderer();

    // Use print media type
    renderer.RenderingOptions.CssMediaType = IronPdf.Rendering.PdfCssMediaType.Print;

    string html = @"
<!DOCTYPE html>
<html>
<head>
    <style>
        /* Screen styles */
        @media screen {
            body { background: #f5f5f5; }
            .no-print { display: block; }
        }

        /* Print styles */
        @media print {
            body {
                background: white;
                font-size: 12pt;
            }
            .no-print { display: none; }
            .page-break { page-break-after: always; }

            a[href]::after {
                content: ' (' attr(href) ')';
            }
        }
    </style>
</head>
<body>
    <div class='no-print'>This navigation won't appear in PDF</div>

    <h1>Chapter 1</h1>
    <p>Content for first page...</p>

    <div class='page-break'></div>

    <h1>Chapter 2</h1>
    <p>Content for second page...</p>
</body>
</html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);
    return pdf.BinaryData;
}
Enter fullscreen mode Exit fullscreen mode

PDF/A Compliance for Archival

For long-term document archival:

public byte[] GenerateArchivalPdf()
{
    var renderer = new ChromePdfRenderer();

    string html = "<html><body><h1>Archived Document</h1></body></html>";

    using var pdf = renderer.RenderHtmlAsPdf(html);

    // Save as PDF/A-3b for archival compliance
    pdf.SaveAsPdfA("archived-document.pdf", IronPdf.PdfAVersions.PdfA3b);

    return pdf.BinaryData;
}
Enter fullscreen mode Exit fullscreen mode

API Reference

For more details on the methods used:

Migration Considerations

Licensing

  • IronPDF is commercial software with per-developer licensing
  • Free trial available for evaluation
  • Pricing details

API Differences

  • iTextSharp: PDF object manipulation with limited HTML parsing
  • IronPDF: HTML/CSS-first approach with full browser rendering

Migration typically means:

  1. Remove HTMLWorker or pdfHTML code
  2. Replace with ChromePdfRenderer
  3. Keep existing HTML templates unchanged (or simplify them)

What You Gain

  • Browser-quality PDF output from any HTML/CSS
  • JavaScript support for dynamic content
  • Full CSS3 including modern layout (flexbox, grid)
  • Web fonts load automatically

What to Consider

  • Chromium binaries add to deployment size
  • Different pricing model than iText
  • Different API paradigm (HTML-first vs PDF-first)

Conclusion

iTextSharp and iText 7 were designed for PDF manipulation, not browser-quality HTML rendering. Developers seeking accurate HTML-to-PDF conversion face fundamental limitations with these libraries. For projects where HTML templates must render exactly as they appear in browsers, a Chromium-based approach provides the rendering accuracy that parsed HTML cannot achieve.


Jacob Mellor has spent 25+ years building developer tools, including IronPDF.


References

  1. Stack Overflow: How to convert HTML to PDF using iTextSharp{:rel="nofollow"} - 309K+ views
  2. Stack Overflow: Convert HTML to PDF in .NET{:rel="nofollow"} - 959K+ views

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

Top comments (0)