Convert Word and Excel to PDF in the browser (no server, no upload)

#javascript #webdev #pdf #opensource

Office-to-PDF converters almost always upload your file to a server. For an invoice, a contract or a salary sheet, that's a lot of trust for a format change. But you can convert Word and Excel to PDF entirely in the browser — read the file with the right parser, rebuild it as clean HTML, and let the browser's own print engine produce a pixel-faithful PDF.

Here's the pattern for both, plus the one trick that makes the PDF output rock-solid instead of blank.

The idea: parse → HTML → native print

There's no need for a heavyweight PDF-drawing library. Browsers already have a great PDF renderer behind window.print() ("Save as PDF"). So the job is just: get the document into clean HTML, then print that HTML off-screen.

The reliable HTML→PDF helper

Render the HTML in a hidden <iframe> and call print() on it. This never produces the blank pages that html2canvas-based approaches sometimes do, and it keeps real text (selectable, searchable) instead of rasterizing:

function printHtmlAsPdf(html, { format = "a4", orientation = "portrait", margin = 12 } = {}) {
  const pageCss =
    `<style>@page{size:${format} ${orientation};margin:${margin}mm}
     html,body{margin:0;background:#fff;-webkit-print-color-adjust:exact;print-color-adjust:exact}</style>`;
  const doc = `<!DOCTYPE html><html><head><meta charset="utf-8">${pageCss}</head><body>${html}</body></html>`;

  const iframe = document.createElement("iframe");
  iframe.style.cssText = "position:fixed;left:-10000px;top:0;width:820px;height:1160px;border:0;";
  document.body.appendChild(iframe);

  iframe.onload = () => {
    setTimeout(() => {
      iframe.contentWindow.focus();
      iframe.contentWindow.print();
      setTimeout(() => iframe.remove(), 120000);
    }, 300);
  };
  const d = iframe.contentWindow.document;
  d.open(); d.write(doc); d.close();
}

The user picks "Save as PDF" in the print dialog. That's the only UX trade-off, and you get perfect pagination and crisp text for free.

Word (.docx) → PDF with mammoth.js

mammoth.js converts .docx into semantic HTML — headings, lists, tables, bold/italic, embedded images. It deliberately ignores fiddly Word styling, which is exactly what you want for a clean PDF.

<script src="https://cdnjs.cloudflare.com/ajax/libs/mammoth/1.6.0/mammoth.browser.min.js"></script>

const buf = await file.arrayBuffer();                // .docx read locally
const { value: html } = await mammoth.convertToHtml({ arrayBuffer: buf });

const styled = `
  <style>
    body{font-family:Georgia,serif;font-size:12pt;line-height:1.5;color:#111}
    h1,h2,h3{font-family:Arial,sans-serif;line-height:1.25}
    table{border-collapse:collapse;width:100%}
    td,th{border:1px solid #999;padding:6px 8px}
    img{max-width:100%;height:auto}
  </style>${html}`;

printHtmlAsPdf(styled, { format: "a4", margin: 16 });

Gotcha: .docx only

mammoth handles the modern .docx (Open XML) format, not the legacy binary .doc. Detect it and tell the user to re-save:

if (!/\.docx$/i.test(file.name)) {
  alert("Please use a .docx file (open old .doc in Word and Save As .docx).");
}

Excel (.xlsx/.csv) → PDF with SheetJS

SheetJS reads .xlsx, .xls and .csv, and can emit an HTML table per sheet:

<script src="https://cdnjs.cloudflare.com/ajax/libs/xlsx/0.18.5/xlsx.full.min.js"></script>

const bytes = new Uint8Array(await file.arrayBuffer());
const wb = XLSX.read(bytes, { type: "array" });

let sections = "";
for (const name of wb.SheetNames) {
  const fullHtml = XLSX.utils.sheet_to_html(wb.Sheets[name]);
  // sheet_to_html returns a whole document — pull out just the <table>
  const table = (/<table[\s\S]*<\/table>/i.exec(fullHtml) || [fullHtml])[0];
  sections += `<h2>${name}</h2>${table}`;
}

const styled = `
  <style>
    body{font-family:Arial,sans-serif;font-size:10pt}
    table{border-collapse:collapse;width:100%}
    td,th{border:1px solid #b3b3b3;padding:4px 7px;white-space:nowrap}
    h2 + table{page-break-inside:auto}
  </style>${sections}`;

printHtmlAsPdf(styled, { format: "a4", orientation: "landscape", margin: 12 });

Gotcha: sheet_to_html returns a full document

XLSX.utils.sheet_to_html() gives you a complete <html> page, not a fragment. If you concatenate several of those you get nested documents. Extract just the <table> (regex above) before stitching sheets together. Also: default to landscape — spreadsheets are wide and clip badly in portrait.

What carries over

Source	Preserved	Dropped
Word `.docx`	headings, lists, tables, images, basic styling	text boxes, footnotes, complex columns
Excel `.xlsx`	every sheet's values + table structure	charts, conditional formatting, cell colors

The tables and text are rebuilt from content, so the result is clean and readable rather than a pixel copy — and for sharing a finished document, that's the point.

Privacy: files are parsed and rendered locally; nothing is uploaded.
Cost: pure static hosting.
Text stays text: real, selectable PDF text — not a screenshot.

I built both of these as free tools — Word to PDF and Excel to PDF — running fully in the browser with no upload. The whole free PDF toolkit is here. Happy to talk through the mammoth/SheetJS edge cases in the comments.

Top comments (2)

Frank • Jun 23

How does mammoth.js handle complex Word document layouts and formatting, such as tables and footers, during the conversion to PDF?

Muhammad Omer Mirza • Jun 23

Mammoth handles tables headings lists images and basic formatting pretty well. Things like footers, text boxes, complex page layouts, and some advanced Word formatting can be simplified or skipped/ignored since Mammoth focuses more on the content than matching Word pixel for pixel. For most reports contracts and business documents though the output is usually very clean and works great for PDF generation.