Office-to-PDF converters almost always upload your file to a server. For an invoice, a contract or a salary sheet, that's a lot of trust for a format change. But you can convert Word and Excel to PDF entirely in the browser — read the file with the right parser, rebuild it as clean HTML, and let the browser's own print engine produce a pixel-faithful PDF.
Here's the pattern for both, plus the one trick that makes the PDF output rock-solid instead of blank.
The idea: parse → HTML → native print
There's no need for a heavyweight PDF-drawing library. Browsers already have a great PDF renderer behind window.print() ("Save as PDF"). So the job is just: get the document into clean HTML, then print that HTML off-screen.
The reliable HTML→PDF helper
Render the HTML in a hidden <iframe> and call print() on it. This never produces the blank pages that html2canvas-based approaches sometimes do, and it keeps real text (selectable, searchable) instead of rasterizing:
function printHtmlAsPdf(html, { format = "a4", orientation = "portrait", margin = 12 } = {}) {
const pageCss =
`<style>@page{size:${format} ${orientation};margin:${margin}mm}
html,body{margin:0;background:#fff;-webkit-print-color-adjust:exact;print-color-adjust:exact}</style>`;
const doc = `<!DOCTYPE html><html><head><meta charset="utf-8">${pageCss}</head><body>${html}</body></html>`;
const iframe = document.createElement("iframe");
iframe.style.cssText = "position:fixed;left:-10000px;top:0;width:820px;height:1160px;border:0;";
document.body.appendChild(iframe);
iframe.onload = () => {
setTimeout(() => {
iframe.contentWindow.focus();
iframe.contentWindow.print();
setTimeout(() => iframe.remove(), 120000);
}, 300);
};
const d = iframe.contentWindow.document;
d.open(); d.write(doc); d.close();
}
The user picks "Save as PDF" in the print dialog. That's the only UX trade-off, and you get perfect pagination and crisp text for free.
Word (.docx) → PDF with mammoth.js
mammoth.js converts .docx into semantic HTML — headings, lists, tables, bold/italic, embedded images. It deliberately ignores fiddly Word styling, which is exactly what you want for a clean PDF.
<script src="https://cdnjs.cloudflare.com/ajax/libs/mammoth/1.6.0/mammoth.browser.min.js"></script>
const buf = await file.arrayBuffer(); // .docx read locally
const { value: html } = await mammoth.convertToHtml({ arrayBuffer: buf });
const styled = `
<style>
body{font-family:Georgia,serif;font-size:12pt;line-height:1.5;color:#111}
h1,h2,h3{font-family:Arial,sans-serif;line-height:1.25}
table{border-collapse:collapse;width:100%}
td,th{border:1px solid #999;padding:6px 8px}
img{max-width:100%;height:auto}
</style>${html}`;
printHtmlAsPdf(styled, { format: "a4", margin: 16 });
Gotcha: .docx only
mammoth handles the modern .docx (Open XML) format, not the legacy binary .doc. Detect it and tell the user to re-save:
if (!/\.docx$/i.test(file.name)) {
alert("Please use a .docx file (open old .doc in Word and Save As .docx).");
}
Excel (.xlsx/.csv) → PDF with SheetJS
SheetJS reads .xlsx, .xls and .csv, and can emit an HTML table per sheet:
<script src="https://cdnjs.cloudflare.com/ajax/libs/xlsx/0.18.5/xlsx.full.min.js"></script>
const bytes = new Uint8Array(await file.arrayBuffer());
const wb = XLSX.read(bytes, { type: "array" });
let sections = "";
for (const name of wb.SheetNames) {
const fullHtml = XLSX.utils.sheet_to_html(wb.Sheets[name]);
// sheet_to_html returns a whole document — pull out just the <table>
const table = (/<table[\s\S]*<\/table>/i.exec(fullHtml) || [fullHtml])[0];
sections += `<h2>${name}</h2>${table}`;
}
const styled = `
<style>
body{font-family:Arial,sans-serif;font-size:10pt}
table{border-collapse:collapse;width:100%}
td,th{border:1px solid #b3b3b3;padding:4px 7px;white-space:nowrap}
h2 + table{page-break-inside:auto}
</style>${sections}`;
printHtmlAsPdf(styled, { format: "a4", orientation: "landscape", margin: 12 });
Gotcha: sheet_to_html returns a full document
XLSX.utils.sheet_to_html() gives you a complete <html> page, not a fragment. If you concatenate several of those you get nested documents. Extract just the <table> (regex above) before stitching sheets together. Also: default to landscape — spreadsheets are wide and clip badly in portrait.
What carries over
| Source | Preserved | Dropped |
|---|---|---|
Word .docx
|
headings, lists, tables, images, basic styling | text boxes, footnotes, complex columns |
Excel .xlsx
|
every sheet's values + table structure | charts, conditional formatting, cell colors |
The tables and text are rebuilt from content, so the result is clean and readable rather than a pixel copy — and for sharing a finished document, that's the point.
- Privacy: files are parsed and rendered locally; nothing is uploaded.
- Cost: pure static hosting.
- Text stays text: real, selectable PDF text — not a screenshot.
I built both of these as free tools — Word to PDF and Excel to PDF — running fully in the browser with no upload. The whole free PDF toolkit is here. Happy to talk through the mammoth/SheetJS edge cases in the comments.
Top comments (0)