IronSoftware

Posted on Apr 3

Spire.PDF HTML Conversion Renders Text as Rasterized Image (Fixed)

#dotnet #csharp

Developers using Spire.PDF for .NET to convert HTML to PDF frequently discover that the resulting documents contain no actual text layer. Instead of selectable, searchable text, the PDF contains rasterized bitmap images of text. This creates immediate problems: users cannot select or copy text, search functionality fails completely, screen readers cannot interpret the content, and file sizes balloon to 10-50 times larger than vector-based PDFs. The issue stems from Spire.PDF's reliance on Internet Explorer's rendering engine, which screenshots web content rather than constructing a proper PDF text layer.

The Problem

Spire.PDF's LoadFromHTML method does not actually parse HTML and render it as PDF text. Instead, it captures a screenshot of the HTML content as rendered by Internet Explorer and embeds that screenshot as an image within the PDF. The result is a PDF that looks correct visually but is fundamentally broken for any use case requiring actual text.

When developers open a PDF generated by LoadFromHTML, they discover that clicking and dragging to select text does nothing. The cursor does not change to a text selection cursor because there is no text to select. What appears to be text is actually a bitmap image of text rendered at whatever DPI Internet Explorer used during the capture.

This architectural decision has cascading consequences. PDF search functionality, which relies on an embedded text layer, returns no results regardless of what text appears visually in the document. Screen readers used by visually impaired users cannot interpret the document because the accessibility APIs find no text content. Copy and paste operations fail entirely. And because images require significantly more storage than vector text, file sizes increase dramatically.

The root cause is dependency on the Internet Explorer rendering engine. When IE version 9 or later is installed on the system (which is the case for all modern Windows installations), Spire.PDF renders HTML as an image rather than as text.

Error Messages and Symptoms

Developers typically do not receive error messages because the conversion technically succeeds. The symptoms manifest when using the resulting PDF:

// When attempting to extract text programmatically:
PdfTextExtractor extractor = new PdfTextExtractor(page);
string text = extractor.ExtractAllText();
// Returns empty string or null - no text layer exists

// Text selection in PDF viewer:
// Cursor remains as pointer, not text selection cursor
// Ctrl+A selects nothing
// Ctrl+F finds nothing

Symptoms include:

Text cursor never appears when hovering over "text" in the PDF
Ctrl+F search finds zero results for words clearly visible on the page
Copy operation copies nothing or copies garbled characters
Screen readers report the document as empty or containing only images
File size of 500KB HTML becomes 5-50MB PDF
PDF appears grainy or blurry when zoomed, revealing pixel artifacts
Hyperlinks in the original HTML do not work in the PDF
Text appears fuzzy compared to vector-rendered PDFs

Who Is Affected

This issue impacts any deployment using Spire.PDF's HTML-to-PDF functionality on systems with Internet Explorer 9 or later:

Operating Systems: All Windows versions since Windows 7, which ship with IE9+. Windows 10 and 11 include Edge but retain IE components that Spire.PDF uses for rendering. The issue affects 100% of modern Windows deployments.

Framework Versions: .NET Framework 4.x, .NET Core 3.1, .NET 5, .NET 6, .NET 7, and .NET 8. The IE rendering dependency exists across all supported .NET versions when using the LoadFromHTML method.

Use Cases: Invoice generation systems, report builders, document archival workflows, web page preservation tools, email-to-PDF converters, and any application that needs searchable PDFs from HTML content.

Accessibility Requirements: Any organization subject to accessibility compliance (ADA in the US, AODA in Canada, EN 301 549 in the EU) faces legal exposure because the generated PDFs are completely inaccessible to assistive technology.

Evidence from the Developer Community

The text rasterization issue has been reported consistently on the E-iceblue forums for years, with users expressing frustration at the fundamental limitation.

Timeline

Date	Event	Source
2017	Initial reports of text rendered as images	E-iceblue Forums
2018	Forum thread: "LoadFromHTML as text instead of image?"	E-iceblue Forums
2019	Multiple threads about links not working, text not selectable	E-iceblue Forums
2020	Qt plugin introduced as workaround, links still not supported	E-iceblue Forums
2021	"Grainy HTML to PDF" quality issues reported	E-iceblue Forums
2022	Linux Qt plugin issues documented	E-iceblue Forums
2023	Issue persists, workarounds remain incomplete	E-iceblue Forums
2024	Community still seeking solutions	E-iceblue Forums

Community Reports

"doc.LoadFromHTML(url, false, true, true); This seems to only take screenshots of the webpage and save as an image in the pdf. I am looking to save the html page as text so it can be selectable in the pdf."
— Developer, E-iceblue Forums

"It seems that spire.pdf only makes a screenshot of the html file because in the final pdf you cannot mark text or follow links."
— Developer, E-iceblue Forums

"The quality of the PDF is, quite frankly, awful. The text is blurred and of very poor quality, unlike the original."
— Developer, E-iceblue Forums, discussing LoadFromHTML output

"Spire generates a PDF file that is just an image. Some of the css is not even correct, such as ignoring bold fonts."
— Developer, E-iceblue Forums

The official E-iceblue response confirms the architectural limitation: "The method 'LoadFromHTML' would render the pdf depending on the IE engine installed on your system, if it is IE9 or above, it would be rendered as an image while version 8 or below as text."

Root Cause Analysis

The text rasterization issue exists because Spire.PDF delegates HTML rendering to Internet Explorer's MSHTML engine rather than implementing proper HTML parsing and PDF text rendering. This design decision made development simpler but created a fundamental limitation in the output.

When LoadFromHTML executes, Spire.PDF:

Loads the HTML content into an embedded IE WebBrowser control
Waits for IE to render the page
Captures a bitmap screenshot of the rendered content
Embeds that bitmap as an image in the PDF
Returns a PDF that visually represents the HTML but contains no text layer

Internet Explorer versions 9 and later changed how content is captured, resulting in image-based output. On systems with IE8 or earlier (essentially non-existent in production today), text was captured differently and remained selectable. This created a situation where the feature worked during initial development on older Windows versions but broke on all modern systems.

The consequences of this architecture:

No text layer: PDF/A compliance is impossible without a text layer
No hyperlinks: Since the PDF contains an image, anchor tags become static pixels
Quality loss: Bitmap text at screen DPI becomes fuzzy when zoomed
File bloat: A page of text as vectors might be 50KB; as a 300DPI image, it becomes 2-5MB
No accessibility: Screen readers require a text layer to function

This is a design-level limitation, not a bug to be fixed. Spire.PDF would need to replace its entire HTML rendering pipeline to generate actual PDF text content from HTML.

Attempted Workarounds

The Spire.PDF community and E-iceblue support have suggested several workarounds, each with significant limitations.

Workaround 1: Qt Plugin Converter

Approach: Use the separate Qt-based HTML converter plugin instead of LoadFromHTML.

using Spire.Pdf.HtmlConverter.Qt;

// Set the plugin path
HtmlConverter.PluginPath = @"C:\path\to\plugins";

// Convert using Qt engine
HtmlConverter.Convert(
    htmlContent,
    @"C:\output\document.pdf",
    true,                           // Enable JavaScript
    60000,                          // Timeout in milliseconds
    new System.Drawing.SizeF(612, 792),  // Page size
    new Spire.Pdf.Graphics.PdfMargins(0),
    LoadHtmlType.SourceCode
);

Limitations:

Text becomes selectable but hyperlinks are lost
Official response: "The plugin method uses QT, which doesn't support keeping the link during conversion. Sorry there is no good way to deal with it yet."
Requires separate plugin download and deployment
Platform compatibility issues: "The plugin is compatible with X86 platform well, but the compatibility with the X64 platform is not good at present"
Requires Microsoft Visual C++ 2015 Redistributable
Linux support is problematic with additional dependencies

Workaround 2: Use Spire.Doc Instead

Approach: Use the Spire.Doc library to convert HTML to DOCX, then DOCX to PDF.

using Spire.Doc;

Document doc = new Document();
doc.LoadFromFile(htmlFile, FileFormat.Html);
doc.SaveToFile("output.pdf", FileFormat.PDF);

Limitations:

Requires additional license for Spire.Doc
Two-step conversion adds processing time
HTML/CSS support limited to what Word can interpret
Complex layouts may not convert correctly
Cannot add PDF-specific features like attachments

Workaround 3: PdfHTMLTextElement for Simple HTML

Approach: Use PdfHTMLTextElement class for rendering basic HTML tags as actual text.

using Spire.Pdf;
using Spire.Pdf.HtmlConverter;

PdfDocument doc = new PdfDocument();
PdfPageBase page = doc.Pages.Add();

PdfHTMLTextElement htmlElement = new PdfHTMLTextElement(
    "<b>Bold</b> and <i>italic</i> text",
    PdfFontBase,
    PdfBrush
);
htmlElement.Draw(page, new PointF(0, 0));

Limitations:

Only supports basic tags: Font, B, I, U, Sub, Sup, BR
No CSS support
No JavaScript support
No complex layouts
Not available in .NET Core or .NET Standard
Unusable for real-world HTML documents

Workaround 4: Increase Screenshot DPI

Approach: Configure higher DPI for the IE rendering to improve image quality.

Limitations:

Text is still rasterized, just at higher resolution
File sizes increase even further
Still no text selection, search, or accessibility
Does not solve the fundamental problem

A Different Approach: IronPDF

For applications that require actual PDF text content from HTML, the solution requires a library that renders HTML using a modern browser engine and constructs proper PDF text layers. IronPDF uses an embedded Chromium browser engine that renders HTML with full CSS3 and JavaScript support, then constructs PDFs with vector text, working hyperlinks, and accessibility metadata.

Why IronPDF Produces Selectable Text

IronPDF's architecture differs fundamentally from Spire.PDF's IE-based approach. Rather than capturing screenshots, IronPDF:

Renders HTML using an embedded Chromium engine (the same engine as Chrome and Edge)
Processes the rendered DOM to construct PDF text objects
Preserves hyperlinks as PDF annotations
Maintains the document structure for accessibility
Generates vector-based text that scales without quality loss

This approach produces PDFs where:

Text is selectable with click-and-drag
Ctrl+F search finds all text content
Copy and paste works correctly
Screen readers can interpret the document
Hyperlinks remain clickable
File sizes remain small (text as vectors, not bitmaps)
Zooming shows sharp text at any scale

Code Example

The following example demonstrates HTML-to-PDF conversion that produces fully selectable, searchable text:

using IronPdf;
using System;

public class HtmlToPdfWithSelectableText
{
    public void ConvertHtmlWithTextLayer()
    {
        // Initialize the Chromium-based renderer
        var renderer = new ChromePdfRenderer();

        // Configure rendering options
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;
        renderer.RenderingOptions.MarginLeft = 15;
        renderer.RenderingOptions.MarginRight = 15;

        // Enable JavaScript for dynamic content
        renderer.RenderingOptions.EnableJavaScript = true;

        // HTML with various formatting and links
        string htmlContent = @"
            <!DOCTYPE html>
            <html>
            <head>
                <style>
                    body {
                        font-family: 'Segoe UI', Arial, sans-serif;
                        line-height: 1.6;
                        color: #333;
                    }
                    h1 { color: #2c3e50; }
                    .highlight { background-color: #ffffcc; }
                    a { color: #3498db; }
                    table { border-collapse: collapse; width: 100%; }
                    td, th { border: 1px solid #ddd; padding: 8px; }
                </style>
            </head>
            <body>
                <h1>Invoice #12345</h1>
                <p>This text is <span class='highlight'>fully selectable</span>
                   and searchable in the resulting PDF.</p>

                <p>Visit our website: <a href='https://example.com'>example.com</a></p>

                <table>
                    <tr><th>Item</th><th>Price</th></tr>
                    <tr><td>Widget A</td><td>$29.99</td></tr>
                    <tr><td>Widget B</td><td>$49.99</td></tr>
                </table>

                <p>Total: $79.98</p>
            </body>
            </html>";

        // Render HTML to PDF with proper text layer
        using (var pdf = renderer.RenderHtmlAsPdf(htmlContent))
        {
            // Save the PDF - text will be selectable
            pdf.SaveAs("invoice.pdf");

            // Demonstrate that text extraction works
            string extractedText = pdf.ExtractAllText();
            Console.WriteLine("Extracted text (proves text layer exists):");
            Console.WriteLine(extractedText);
        }

        // File size comparison:
        // - Rasterized image-based PDF: typically 2-5 MB for a single page
        // - Vector text-based PDF: typically 50-200 KB for the same content
    }

    public void ConvertUrlWithAccessibility()
    {
        var renderer = new ChromePdfRenderer();

        // Configure for accessibility compliance
        renderer.RenderingOptions.Title = "Accessible PDF Document";
        renderer.RenderingOptions.PaperSize = IronPdf.Rendering.PdfPaperSize.A4;

        // Render from URL - hyperlinks remain clickable
        using (var pdf = renderer.RenderUrlAsPdf("https://example.com/article"))
        {
            pdf.SaveAs("accessible-article.pdf");
        }

        // The resulting PDF:
        // - Contains actual text that screen readers can interpret
        // - Preserves hyperlinks as clickable PDF annotations
        // - Maintains reading order from the HTML structure
        // - Complies with accessibility requirements
    }

    public void BatchConvertWithConsistentQuality()
    {
        var renderer = new ChromePdfRenderer();

        string[] htmlFiles = { "report1.html", "report2.html", "report3.html" };

        foreach (var htmlFile in htmlFiles)
        {
            string html = System.IO.File.ReadAllText(htmlFile);

            using (var pdf = renderer.RenderHtmlAsPdf(html))
            {
                string outputPath = htmlFile.Replace(".html", ".pdf");
                pdf.SaveAs(outputPath);

                // Every PDF has:
                // - Selectable text (not screenshots)
                // - Working hyperlinks
                // - Searchable content
                // - Small file size
            }
        }
    }
}

Key points about this code:

ChromePdfRenderer uses Chromium for rendering, not Internet Explorer
Text is rendered as vector PDF text objects, not rasterized images
Hyperlinks in HTML become clickable PDF link annotations
ExtractAllText() returns actual content because a text layer exists
File sizes remain small because text is stored as vectors
CSS and JavaScript are fully supported

API Reference

For details on the methods used above:

ChromePdfRenderer - Main rendering class using Chromium engine
RenderHtmlAsPdf - Convert HTML string to PDF
RenderUrlAsPdf - Convert web page URL to PDF
ExtractAllText - Text extraction proving text layer exists
HTML to PDF Tutorial - Complete guide to HTML rendering

Migration Considerations

Licensing

IronPDF is commercial software with per-developer licensing. A free trial is available for evaluation without watermarks for development purposes. Teams should verify that IronPDF meets their specific HTML rendering requirements before committing to migration.

API Differences

The migration from Spire.PDF to IronPDF for HTML conversion is straightforward:

// Spire.PDF (produces rasterized images)
PdfDocument doc = new PdfDocument();
doc.LoadFromHTML(url, false, true, true);
doc.SaveToFile("output.pdf");

// IronPDF (produces selectable text)
var renderer = new ChromePdfRenderer();
var pdf = renderer.RenderUrlAsPdf(url);
pdf.SaveAs("output.pdf");

The conceptual model differs slightly: Spire.PDF loads HTML into an existing document object, while IronPDF's renderer creates new PDF documents. For most use cases, this requires minimal code changes.

What You Gain

Text that is actually selectable and searchable
Hyperlinks that work in the PDF
Accessibility for screen readers and assistive technology
File sizes 10-50x smaller (text as vectors, not images)
Modern CSS3 and JavaScript support
Consistent rendering across platforms
No dependency on Internet Explorer or system-installed browsers

What to Consider

Commercial licensing cost
Slightly different API structure
Chromium engine adds approximately 100-200MB to deployment size
First render may take 1-2 seconds for Chromium initialization

Conclusion

Spire.PDF's HTML-to-PDF conversion produces rasterized screenshots rather than actual PDF text, making the resulting documents unsearchable, inaccessible, and unnecessarily large. This is an architectural limitation of the IE-based rendering approach that cannot be fixed with configuration changes. For applications requiring proper PDF text layers, migrating to a library with Chromium-based rendering provides PDFs where text is selectable, searchable, and accessible.

Written by Jacob Mellor, CTO at Iron Software and original developer of IronPDF.

References

Html to PDF can't select text in PDF - Spire.PDF Forums{:rel="nofollow"} - Primary thread documenting the text selection issue
LoadFromHTML as text instead of image? - Spire.PDF Forums{:rel="nofollow"} - Developer request for actual text rendering
HtmlToPdf not possible to mark text or follow links - Spire.PDF Forums{:rel="nofollow"} - Links and text selection issues
Grainy HTML to PDF - Spire.PDF Forums{:rel="nofollow"} - Quality degradation from rasterization
Issue with Spire HTML to PDF conversion - Spire.PDF Forums{:rel="nofollow"} - General conversion problems
Convert HTML to PDF with New Plugin - E-iceblue Knowledge Base{:rel="nofollow"} - Qt plugin documentation
Massive Filesize Increase Problem - Spire.PDF Forums{:rel="nofollow"} - File size bloat from image-based conversion
HTML to PDF - Spire.PDF Forums{:rel="nofollow"} - Official confirmation of IE dependency
Troubleshoot QT Plugin Issue - Spire.PDF Forums{:rel="nofollow"} - Qt plugin limitations
html string to pdf one Linux with QT plugin not working - Spire.PDF Forums{:rel="nofollow"} - Cross-platform plugin issues

For IronPDF documentation and tutorials, visit ironpdf.com.

DEV Community

Spire.PDF HTML Conversion Renders Text as Rasterized Image (Fixed)

The Problem

Error Messages and Symptoms

Who Is Affected

Evidence from the Developer Community

Timeline

Community Reports

Root Cause Analysis

Attempted Workarounds

Workaround 1: Qt Plugin Converter

Workaround 2: Use Spire.Doc Instead

Workaround 3: PdfHTMLTextElement for Simple HTML

Workaround 4: Increase Screenshot DPI

A Different Approach: IronPDF

Why IronPDF Produces Selectable Text

Code Example

API Reference

Migration Considerations

Licensing

API Differences

What You Gain

What to Consider

Conclusion

References

Top comments (0)