DEV Community

IronSoftware
IronSoftware

Posted on

Spire.PDF HTML Conversion Renders Text as Rasterized Image (Fixed)

Developers using Spire.PDF for .NET to convert HTML to PDF frequently discover that the resulting documents contain no actual text layer. Instead of selectable, searchable text, the PDF contains rasterized bitmap images of text. This creates immediate problems: users cannot select or copy text, search functionality fails completely, screen readers cannot interpret the content, and file sizes balloon to 10-50 times larger than vector-based PDFs. The issue stems from Spire.PDF's reliance on Internet Explorer's rendering engine, which screenshots web content rather than constructing a proper PDF text layer.

The Problem

Spire.PDF's LoadFromHTML method does not actually parse HTML and render it as PDF text. Instead, it captures a screenshot of the HTML content as rendered by Internet Explorer and embeds that screenshot as an image within the PDF. The result is a PDF that looks correct visually but is fundamentally broken for any use case requiring actual text.

When developers open a PDF generated by LoadFromHTML, they discover that clicking and dragging to select text does nothing. The cursor does not change to a text selection cursor because there is no text to select. What appears to be text is actually a bitmap image of text rendered at whatever DPI Internet Explorer used during the capture.

This architectural decision has cascading consequences. PDF search functionality, which relies on an embedded text layer, returns no results regardless of what text appears visually in the document. Screen readers used by visually impaired users cannot interpret the document because the accessibility APIs find no text content. Copy and paste operations fail entirely. And because images require significantly more storage than vector text, file sizes increase dramatically.

The root cause is dependency on the Internet Explorer rendering engine. When IE version 9 or later is installed on the system (which is the case for all modern Windows installations), Spire.PDF renders HTML as an image rather than as text.

Error Messages and Symptoms

Developers typically do not receive error messages because the conversion technically succeeds. The symptoms manifest when using the resulting PDF:

// When attempting to extract text programmatically:
PdfTextExtractor extractor = new PdfTextExtractor(page);
string text = extractor.ExtractAllText();
// Returns empty string or null - no text layer exists

// Text selection in PDF viewer:
// Cursor remains as pointer, not text selection cursor
// Ctrl+A selects nothing
// Ctrl+F finds nothing
Enter fullscreen mode Exit fullscreen mode

Symptoms include:

  • Text cursor never appears when hovering over "text" in the PDF
  • Ctrl+F search finds zero results for words clearly visible on the page
  • Copy operation copies nothing or copies garbled characters
  • Screen readers report the document as empty or containing only images
  • File size of 500KB HTML becomes 5-50MB PDF
  • PDF appears grainy or blurry when zoomed, revealing pixel artifacts
  • Hyperlinks in the original HTML do not work in the PDF
  • Text appears fuzzy compared to vector-rendered PDFs

Who Is Affected

This issue impacts any deployment using Spire.PDF's HTML-to-PDF functionality on systems with Internet Explorer 9 or later:

Operating Systems: All Windows versions since Windows 7, which ship with IE9+. Windows 10 and 11 include Edge but retain IE components that Spire.PDF uses for rendering. The issue affects 100% of modern Windows deployments.

Framework Versions: .NET Framework 4.x, .NET Core 3.1, .NET 5, .NET 6, .NET 7, and .NET 8. The IE rendering dependency exists across all supported .NET versions when using the LoadFromHTML method.

Use Cases: Invoice generation systems, report builders, document archival workflows, web page preservation tools, email-to-PDF converters, and any application that needs searchable PDFs from HTML content.

Accessibility Requirements: Any organization subject to accessibility compliance (ADA in the US, AODA in Canada, EN 301 549 in the EU) faces legal exposure because the generated PDFs are completely inaccessible to assistive technology.

Evidence from the Developer Community

The text rasterization issue has been reported consistently on the E-iceblue forums for years, with users expressing frustration at the fundamental limitation.

Timeline

Date Event Source
2017 Initial reports of text rendered as images E-iceblue Forums
2018 Forum thread: "LoadFromHTML as text instead of image?" E-iceblue Forums
2019 Multiple threads about links not working, text not selectable E-iceblue Forums
2020 Qt plugin introduced as workaround, links still not supported E-iceblue Forums
2021 "Grainy HTML to PDF" quality issues reported E-iceblue Forums
2022 Linux Qt plugin issues documented E-iceblue Forums
2023 Issue persists, workarounds remain incomplete E-iceblue Forums
2024 Community still seeking solutions E-iceblue Forums

Community Reports

"doc.LoadFromHTML(url, false, true, true); This seems to only take screenshots of the webpage and save as an image in the pdf. I am looking to save the html page as text so it can be selectable in the pdf."
— Developer, E-iceblue Forums

"It seems that spire.pdf only makes a screenshot of the html file because in the final pdf you cannot mark text or follow links."
— Developer, E-iceblue Forums

"The quality of the PDF is, quite frankly, awful. The text is blurred and of very poor quality, unlike the original."
— Developer, E-iceblue Forums, discussing LoadFromHTML output

"Spire generates a PDF file that is just an image. Some of the css is not even correct, such as ignoring bold fonts."
— Developer, E-iceblue Forums

The official E-iceblue response confirms the architectural limitation: "The method 'LoadFromHTML' would render the pdf depending on the IE engine installed on your system, if it is IE9 or above, it would be rendered as an image while version 8 or below as text."

Root Cause Analysis

The text rasterization issue exists because Spire.PDF delegates HTML rendering to Internet Explorer's MSHTML engine rather than implementing proper HTML parsing and PDF text rendering. This design decision made development simpler but created a fundamental limitation in the output.

When LoadFromHTML executes, Spire.PDF:

  1. Loads the HTML content into an embedded IE WebBrowser control
  2. Waits for IE to render the page
  3. Captures a bitmap screenshot of the rendered content
  4. Embeds that bitmap as an image in the PDF
  5. Returns a PDF that visually represents the HTML but contains no text layer

Internet Explorer versions 9 and later changed how content is captured, resulting in image-based output. On systems with IE8 or earlier (essentially non-existent in production today), text was captured differently and remained selectable. This created a situation where the feature worked during initial development on older Windows versions but broke on all modern systems.

The consequences of this architecture:

  • No text layer: PDF/A compliance is impossible without a text layer
  • No hyperlinks: Since the PDF contains an image, anchor tags become static pixels
  • Quality loss: Bitmap text at screen DPI becomes fuzzy when zoomed
  • File bloat: A page of text as vectors might be 50KB; as a 300DPI image, it becomes 2-5MB
  • No accessibility: Screen readers require a text layer to function

This is a design-level limitation, not a bug to be fixed. Spire.PDF would need to replace its entire HTML rendering pipeline to generate actual PDF text content from HTML.

Attempted Workarounds

The Spire.PDF community and E-iceblue support have suggested several workarounds, each with significant limitations.

Workaround 1: Qt Plugin Converter

Approach: Use the separate Qt-based HTML converter plugin instead of LoadFromHTML.

using Spire.Pdf.HtmlConverter.Qt;

// Set the plugin path
HtmlConverter.PluginPath = @"C:\path\to\plugins";

// Convert using Qt engine
HtmlConverter.Convert(
    htmlContent,
    @"C:\output\document.pdf",
    true,                           // Enable JavaScript
    60000,                          // Timeout in milliseconds
    new System.Drawing.SizeF(612, 792),  // Page size
    new Spire.Pdf.Graphics.PdfMargins(0),
    LoadHtmlType.SourceCode
);
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Text becomes selectable but hyperlinks are lost
  • Official response: "The plugin method uses QT, which doesn't support keeping the link during conversion. Sorry there is no good way to deal with it yet."
  • Requires separate plugin download and deployment
  • Platform compatibility issues: "The plugin is compatible with X86 platform well, but the compatibility with the X64 platform is not good at present"
  • Requires Microsoft Visual C++ 2015 Redistributable
  • Linux support is problematic with additional dependencies

Workaround 2: Use Spire.Doc Instead

Approach: Use the Spire.Doc library to convert HTML to DOCX, then DOCX to PDF.

using Spire.Doc;

Document doc = new Document();
doc.LoadFromFile(htmlFile, FileFormat.Html);
doc.SaveToFile("output.pdf", FileFormat.PDF);
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Requires additional license for Spire.Doc
  • Two-step conversion adds processing time
  • HTML/CSS support limited to what Word can interpret
  • Complex layouts may not convert correctly
  • Cannot add PDF-specific features like attachments

Workaround 3: PdfHTMLTextElement for Simple HTML

Approach: Use PdfHTMLTextElement class for rendering basic HTML tags as actual text.

using Spire.Pdf;
using Spire.Pdf.HtmlConverter;

PdfDocument doc = new PdfDocument();
PdfPageBase page = doc.Pages.Add();

PdfHTMLTextElement htmlElement = new PdfHTMLTextElement(
    "<b>Bold</b> and <i>italic</i> text",
    PdfFontBase,
    PdfBrush
);
htmlElement.Draw(page, new PointF(0, 0));
Enter fullscreen mode Exit fullscreen mode

Limitations:

  • Only supports basic tags: Font, B, I, U, Sub, Sup, BR
  • No CSS support
  • No JavaScript support
  • No complex layouts
  • Not available in .NET Core or .NET Standard
  • Unusable for real-world HTML documents

Workaround 4: Increase Screenshot DPI

Approach: Configure higher DPI for the IE rendering to improve image quality.

Limitations:

  • Text is still rasterized, just at higher resolution
  • File sizes increase even further
  • Still no text selection, search, or accessibility
  • Does not solve the fundamental problem

A Different Approach: IronPDF

For applications that require actual PDF text content from HTML, the solution requires a library that renders HTML using a modern browser engine and constructs proper PDF text layers. IronPDF uses an embedded Chromium browser engine that renders HTML with full CSS3 and JavaScript support, then constructs PDFs with vector text, working hyperlinks, and accessibility metadata.

Why IronPDF Produces Selectable Text

IronPDF's architecture differs fundamentally from Spire.PDF's IE-based approach. Rather than capturing screenshots, IronPDF:

  1. Renders HTML using an embedded Chromium engine (the same engine as Chrome and Edge)
  2. Processes the rendered DOM to construct PDF text objects
  3. Preserves hyperlinks as PDF annotations
  4. Maintains the document structure for accessibility
  5. Generates vector-based text that scales without quality loss

This approach produces PDFs where:

  • Text is selectable with click-and-drag
  • Ctrl+F search finds all text content
  • Copy and paste works correctly
  • Screen readers can interpret the document
  • Hyperlinks remain clickable
  • File sizes remain small (text as vectors, not bitmaps)
  • Zooming shows sharp text at any scale

Code Example

The following example demonstrates HTML-to-PDF conversion that produces fully selectable, searchable text:

using IronPdf;
using System;

public class HtmlToPdfWithSelectableText
{
    public void ConvertHtmlWithTextLayer()
    {
        // Initialize the Chromium-based renderer
        var renderer = new ChromePdfRenderer();

        // Configure rendering options
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;
        renderer.RenderingOptions.MarginLeft = 15;
        renderer.RenderingOptions.MarginRight = 15;

        // Enable JavaScript for dynamic content
        renderer.RenderingOptions.EnableJavaScript = true;

        // HTML with various formatting and links
        string htmlContent = @"
            <!DOCTYPE html>
            <html>
            <head>
                <style>
                    body {
                        font-family: 'Segoe UI', Arial, sans-serif;
                        line-height: 1.6;
                        color: #333;
                    }
                    h1 { color: #2c3e50; }
                    .highlight { background-color: #ffffcc; }
                    a { color: #3498db; }
                    table { border-collapse: collapse; width: 100%; }
                    td, th { border: 1px solid #ddd; padding: 8px; }
                </style>
            </head>
            <body>
                <h1>Invoice #12345</h1>
                <p>This text is <span class='highlight'>fully selectable</span>
                   and searchable in the resulting PDF.</p>

                <p>Visit our website: <a href='https://example.com'>example.com</a></p>

                <table>
                    <tr><th>Item</th><th>Price</th></tr>
                    <tr><td>Widget A</td><td>$29.99</td></tr>
                    <tr><td>Widget B</td><td>$49.99</td></tr>
                </table>

                <p>Total: $79.98</p>
            </body>
            </html>";

        // Render HTML to PDF with proper text layer
        using (var pdf = renderer.RenderHtmlAsPdf(htmlContent))
        {
            // Save the PDF - text will be selectable
            pdf.SaveAs("invoice.pdf");

            // Demonstrate that text extraction works
            string extractedText = pdf.ExtractAllText();
            Console.WriteLine("Extracted text (proves text layer exists):");
            Console.WriteLine(extractedText);
        }

        // File size comparison:
        // - Rasterized image-based PDF: typically 2-5 MB for a single page
        // - Vector text-based PDF: typically 50-200 KB for the same content
    }

    public void ConvertUrlWithAccessibility()
    {
        var renderer = new ChromePdfRenderer();

        // Configure for accessibility compliance
        renderer.RenderingOptions.Title = "Accessible PDF Document";
        renderer.RenderingOptions.PaperSize = IronPdf.Rendering.PdfPaperSize.A4;

        // Render from URL - hyperlinks remain clickable
        using (var pdf = renderer.RenderUrlAsPdf("https://example.com/article"))
        {
            pdf.SaveAs("accessible-article.pdf");
        }

        // The resulting PDF:
        // - Contains actual text that screen readers can interpret
        // - Preserves hyperlinks as clickable PDF annotations
        // - Maintains reading order from the HTML structure
        // - Complies with accessibility requirements
    }

    public void BatchConvertWithConsistentQuality()
    {
        var renderer = new ChromePdfRenderer();

        string[] htmlFiles = { "report1.html", "report2.html", "report3.html" };

        foreach (var htmlFile in htmlFiles)
        {
            string html = System.IO.File.ReadAllText(htmlFile);

            using (var pdf = renderer.RenderHtmlAsPdf(html))
            {
                string outputPath = htmlFile.Replace(".html", ".pdf");
                pdf.SaveAs(outputPath);

                // Every PDF has:
                // - Selectable text (not screenshots)
                // - Working hyperlinks
                // - Searchable content
                // - Small file size
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Key points about this code:

  • ChromePdfRenderer uses Chromium for rendering, not Internet Explorer
  • Text is rendered as vector PDF text objects, not rasterized images
  • Hyperlinks in HTML become clickable PDF link annotations
  • ExtractAllText() returns actual content because a text layer exists
  • File sizes remain small because text is stored as vectors
  • CSS and JavaScript are fully supported

API Reference

For details on the methods used above:

Migration Considerations

Licensing

IronPDF is commercial software with per-developer licensing. A free trial is available for evaluation without watermarks for development purposes. Teams should verify that IronPDF meets their specific HTML rendering requirements before committing to migration.

API Differences

The migration from Spire.PDF to IronPDF for HTML conversion is straightforward:

// Spire.PDF (produces rasterized images)
PdfDocument doc = new PdfDocument();
doc.LoadFromHTML(url, false, true, true);
doc.SaveToFile("output.pdf");

// IronPDF (produces selectable text)
var renderer = new ChromePdfRenderer();
var pdf = renderer.RenderUrlAsPdf(url);
pdf.SaveAs("output.pdf");
Enter fullscreen mode Exit fullscreen mode

The conceptual model differs slightly: Spire.PDF loads HTML into an existing document object, while IronPDF's renderer creates new PDF documents. For most use cases, this requires minimal code changes.

What You Gain

  • Text that is actually selectable and searchable
  • Hyperlinks that work in the PDF
  • Accessibility for screen readers and assistive technology
  • File sizes 10-50x smaller (text as vectors, not images)
  • Modern CSS3 and JavaScript support
  • Consistent rendering across platforms
  • No dependency on Internet Explorer or system-installed browsers

What to Consider

  • Commercial licensing cost
  • Slightly different API structure
  • Chromium engine adds approximately 100-200MB to deployment size
  • First render may take 1-2 seconds for Chromium initialization

Conclusion

Spire.PDF's HTML-to-PDF conversion produces rasterized screenshots rather than actual PDF text, making the resulting documents unsearchable, inaccessible, and unnecessarily large. This is an architectural limitation of the IE-based rendering approach that cannot be fixed with configuration changes. For applications requiring proper PDF text layers, migrating to a library with Chromium-based rendering provides PDFs where text is selectable, searchable, and accessible.


Written by Jacob Mellor, CTO at Iron Software and original developer of IronPDF.


References

  1. Html to PDF can't select text in PDF - Spire.PDF Forums{:rel="nofollow"} - Primary thread documenting the text selection issue
  2. LoadFromHTML as text instead of image? - Spire.PDF Forums{:rel="nofollow"} - Developer request for actual text rendering
  3. HtmlToPdf not possible to mark text or follow links - Spire.PDF Forums{:rel="nofollow"} - Links and text selection issues
  4. Grainy HTML to PDF - Spire.PDF Forums{:rel="nofollow"} - Quality degradation from rasterization
  5. Issue with Spire HTML to PDF conversion - Spire.PDF Forums{:rel="nofollow"} - General conversion problems
  6. Convert HTML to PDF with New Plugin - E-iceblue Knowledge Base{:rel="nofollow"} - Qt plugin documentation
  7. Massive Filesize Increase Problem - Spire.PDF Forums{:rel="nofollow"} - File size bloat from image-based conversion
  8. HTML to PDF - Spire.PDF Forums{:rel="nofollow"} - Official confirmation of IE dependency
  9. Troubleshoot QT Plugin Issue - Spire.PDF Forums{:rel="nofollow"} - Qt plugin limitations
  10. html string to pdf one Linux with QT plugin not working - Spire.PDF Forums{:rel="nofollow"} - Cross-platform plugin issues

For IronPDF documentation and tutorials, visit ironpdf.com.

Top comments (0)