刘发财

Posted on Mar 3

Generating Vector PDFs from HTML in the Browser: A Deep Dive into dompdf.js

#webdev #javascript #pdf #github

Live Demo: https://dompdfjs.lisky.com.cn

GitHub: https://github.com/lmn1919/dompdf.js

The Pain Points of Frontend PDF Generation

Exporting web content to PDF is a common requirement that sounds simple but quickly becomes complex. Whether it's data reports, invoices, contracts, resumes, or certificates, almost every web application faces this challenge at some point.

Current solutions each have significant drawbacks:

Server-side rendering (Puppeteer, wkhtmltopdf, etc.) is powerful but requires maintaining Node.js services, consumes substantial resources under high concurrency, and adds backend coordination overhead.

Screenshot-to-PDF (html2canvas + jsPDF) is the most popular frontend approach, but it's fundamentally rasterization—taking a screenshot of the DOM and stuffing it into a PDF. This results in large file sizes, unselectable text, blurry zoom, and the Canvas maximum pixel height limit (typically 16,384px) prevents generating long documents.

Pure frontend drawing libraries (jsPDF, PDFKit) provide low-level APIs but steep learning curves. Drawing a styled table requires manually calculating coordinates, handling pagination, and embedding fonts—dozens of lines of code just to get started.

Is there a solution that keeps the convenience of HTML/CSS while outputting true vector PDFs and supporting multi-page documents?

The Design Philosophy of dompdf.js

The core idea of dompdf.js is: let the browser do what it does best—parse CSS and calculate layouts—then "transpile" the drawing instructions into PDF vector operations.

It doesn't take screenshots. It doesn't do server-side rendering. Instead, it builds on two mature libraries with deep modifications:

html2canvas: handles DOM traversal and computed style calculation (including all CSS cascading, inheritance, and box model computations)
jsPDF: handles PDF file structure generation

The key innovation is replacing the rendering layer: traditional targets Canvas (bitmap), while dompdf.js changes the target to jsPDF's vector APIs.

Why This Approach Works

The browser has already calculated every element's position, size, color, and font for you. dompdf.js simply translates these computed results into PDF drawing commands:

Browser CSS computed results → dompdf.js transpilation → PDF vector operations

"This div is at (100,200), 300px wide, 150px tall, background #f5f5f5"
                    ↓
doc.setFillColor(245, 245, 245)
doc.rect(100, 200, 300, 150, 'F')

Text is embedded directly as PDF font objects. Graphics remain as path data. The result is a truly structured PDF document, not an image container.

Technical Implementation of Pagination

Paginating long documents is the hard part of frontend PDF generation. dompdf.js solves this through a DOM tree splitting algorithm.

Parsing Phase

First, html2canvas parses the target DOM into a render tree where each node contains:

bounds: element position and dimensions (top, left, width, height)
styles: computed CSS properties
textNodes: text node data

// Example render tree node
{
  bounds: { top: 800, left: 50, width: 600, height: 120 },
  styles: { backgroundColor: '#fff', fontSize: 14 },
  textNodes: [{ text: 'Paragraph content...', bounds: {...} }],
  elements: [/* child nodes */]
}

Splitting Algorithm

Using A4 page height (1123px) as the baseline, recursively traverse the render tree:

Calculate if node crosses page boundary: if node.bounds.top + node.bounds.height > pageHeight, the node needs splitting
Text node handling: if a text node spans current page and next, calculate the truncation point and split the text across pages
Coordinate reset: when entering a new page, subtract pageHeight from all subsequent nodes' top values to maintain relative positioning
Non-splittable elements: support divisionDisable attribute to mark elements that must stay on one page (like images or table blocks)

Rendering Phase

Draw the split multi-page render trees sequentially:

// Pseudocode illustration
pageTrees.forEach((pageTree, index) => {
  if (index > 0) doc.addPage();  // Add new PDF page

  renderHeaderFooter(index + 1, totalPages);  // Draw header/footer
  renderTree(pageTree);  // Draw page content
});

The advantage of this approach is precision: pagination based on the browser's actual computed layout data prevents issues like text being cut in half or table rows being split.

Core Features

Basic Rendering Capabilities

Text: font family, size, color, line height; text is selectable, copyable, and searchable

Box Model: margin, padding, border, border-radius

Backgrounds: background-color, background-image

Tables: full support including merged cells (rowspan/colspan)

Images: JPEG, PNG automatic embedding with CORS support

Vector Graphics: Canvas charts, SVG

Multi-Page Document Support

Automatic pagination: intelligent splitting based on content height
Headers and footers: customizable height, content, font, color, position
Page number variables: ${currentPage}, ${totalPages} template syntax
Anti-split markers: divisionDisable attribute keeps elements intact

Advanced Features

Font Handling

jsPDF only supports English by default. dompdf.js supports injecting custom fonts (like Source Han Sans) to fix CJK character issues:

dompdf(element, {
  fontConfig: {
    fontFamily: 'SourceHanSansSC',
    fontBase64: 'AAEAAA...'  // Base64 of TTF file
  }
})

Complex Style Fallback

For CSS effects not natively supported by PDF like gradients and shadows, foreignObjectRendering mechanism is provided: render the element as high-res image via SVG foreignObject, balancing compatibility and visual effects.

Document Security

Support for PDF password encryption, permission controls (print, modify, copy, annotate), and file compression.

Usage Examples

Basic Usage: Single Page Document

import dompdf from 'dompdf.js';

dompdf(document.querySelector('#content'), {
  useCORS: true  // Allow cross-origin images
}).then(blob => {
  const url = URL.createObjectURL(blob);
  const link = document.createElement('a');
  link.href = url;
  link.download = 'document.pdf';
  link.click();
});

Multi-Page Document: With Headers and Footers

<!-- Important: container width must match target paper, A4 = 794px -->
<div id="report" style="width: 794px;">
  <!-- Long content... -->
</div>

dompdf(document.querySelector('#report'), {
  pagination: true,
  format: 'a4',
  pageConfig: {
    header: {
      content: 'Company Annual Report',
      height: 60,
      contentFontSize: 14,
      contentPosition: 'center'
    },
    footer: {
      content: 'Page ${currentPage} of ${totalPages}',
      height: 50,
      contentPosition: 'centerRight'
    }
  }
}).then(blob => {
  // Handle download or upload
});

Prevent Table Rows from Being Split

<table>
  <tr divisionDisable>
    <!-- This row must appear on the same page -->
    <td>Important data block</td>
  </tr>
</table>

Embedding ECharts Charts

// Wait for chart rendering to complete
await chartInstance.renderToCanvas();

dompdf(document.querySelector('#chart-container'), {
  useCORS: true,
  pagination: true
}).then(blob => {
  // Charts embedded as vector or high-res images
});

Technical Boundaries and Selection Guide

Best for:

Documents with relatively fixed layouts (invoices, contracts, reports, resumes)
Archival files requiring searchable and copyable text
Network transfers sensitive to file size
Projects wanting to avoid server-side rendering infrastructure

Current limitations:

Extremely complex CSS layouts (advanced Grid or Flexbox features) may need simplification
foreignObjectRendering fallback produces bitmaps, increasing file size—use only for necessary elements
Very large documents (hundreds of pages) require attention to memory usage; consider batch generation

Comparison with server-side solutions:

Feature	dompdf.js	Puppeteer
Server resources	Not required	Required
Deployment complexity	Low	High
Vector output	Yes	Yes
CSS support	Medium (browser-dependent)	Full
Large file generation	Limited by browser memory	Limited by server memory
First paint overhead	WASM/JS download required	No frontend overhead

Conclusion

dompdf.js offers a "lightweight" approach to PDF generation: no server dependency, no screenshots, but leveraging the browser's native CSS parsing capabilities through API transpilation to achieve vector output.

For frontend developers, this means writing templates in familiar HTML/CSS without learning complex PDF specifications or maintaining additional server-side services. If your project needs to generate fixed-layout, text-heavy, medium-length PDF documents, this solution is worth considering.

Repository: https://github.com/lmn1919/dompdf.js

Live Demo: https://dompdfjs.lisky.com.cn

DEV Community