Some time ago I was struggling with generating Word documents dynamically in Node.js.
I was using the docx library, which is great, but it was missing one feature I really needed:
I wanted to take HTML from a WYSIWYG editor and insert it into a Word document in multiple places.
Manually building Paragraph, TextRun, Table and other XmlComponent objects just to recreate existing HTML felt painful and repetitive.
From that need, I built html-docxjs-compiler – an HTML → docx XmlComponent compiler that lets you:
- Parse HTML into native docx.js components
- Reuse that content across different sections of a document
- Mix compiled HTML with your hand-crafted docx.js layout
html-docxjs-compiler takes an HTML string and converts it into a list of XmlComponent elements that you can embed directly into the docx
API.
You’re not limited to a single insert:
- You can build a whole document from multiple HTML snippets.
- You can combine compiled HTML with regular docx.js components (headers, footers, cover pages, etc.).
That means you can keep your content in HTML (e.g. from a WYSIWYG editor or CMS) and still generate fully native Word documents.
QuickStart
Install the library from npm:
npm install html-docxjs-compiler docx
The latest version is compatible with docx ^9.5.0.
Here’s a minimal example that:
- takes an HTML string
- compiles it to XmlComponents
- creates a .docx file
Basic example
import { transformHtmlToDocx } from 'html-docxjs-compiler';
import { Document, Packer } from 'docx';
import * as fs from 'fs';
async function createDocument() {
const html = `
<h1>My Document</h1>
<p>This is a paragraph with <strong>bold</strong> and <em>italic</em> text.</p>
<ul>
<li>First item</li>
<li>Second item</li>
</ul>
`;
// Transform HTML to DOCX elements
const elements = await transformHtmlToDocx(html);
// Create a document with the elements
const doc = new Document({
sections: [{
children: elements
}]
});
// Generate and save the document
const buffer = await Packer.toBuffer(doc);
fs.writeFileSync('output.docx', buffer);
}
createDocument();
More in detail example
Here is the example of generating lists, using element styles and different html elements.
const { transformHtmlToDocx } = require('../dist/index');
const { Document, Packer } = require('docx');
const { WORD_NUMBERING_CONFIGURATION } = require('./config');
const fs = require('fs');
const path = require('path');
async function runBasicDemo() {
console.log('Running Basic Demo...\n');
// Simple HTML content with various elements
const html = `
<h1>Welcome to HTML-DOCX Compiler</h1>
<p>This is a <strong>basic demo</strong> showing how to convert HTML to DOCX format.</p>
<h2>Features</h2>
<p>The package supports:</p>
<ul>
<li>Headings (H1 through H6)</li>
<li><strong>Bold</strong> and <em>italic</em> text</li>
<li><u>Underlined</u> and <s>strikethrough</s> text</li>
<li>Superscript<sup>2</sup> and subscript<sub>2</sub></li>
<li>Links like <a href="https://github.com">GitHub</a></li>
</ul>
<h2>Ordered Lists</h2>
<p>You can also create numbered lists:</p>
<ol>
<li>First item</li>
<li>Second item</li>
<li>Third item</li>
</ol>
<h2>Text Formatting</h2>
<p style="text-align: center">This text is centered.</p>
<p style="text-align: right">This text is right-aligned.</p>
<p style="color: blue">This text is blue.</p>
<p style="background-color: yellow">This text has a yellow background.</p>
<div>
<p>You can also use div elements to group content.</p>
<p>Multiple paragraphs within a div work perfectly.</p>
</div>
`;
try {
// Transform HTML to DOCX elements
console.log('Converting HTML to DOCX elements...');
const docxElements = await transformHtmlToDocx(html);
// Create DOCX document
const doc = new Document({
numbering: WORD_NUMBERING_CONFIGURATION,
sections: [{
properties: {},
children: docxElements
}]
});
// Generate buffer and save file
const buffer = await Packer.toBuffer(doc);
const outputDir = path.join(__dirname, 'output');
// Create output directory if it doesn't exist
if (!fs.existsSync(outputDir)) {
fs.mkdirSync(outputDir, { recursive: true });
}
const outputPath = path.join(outputDir, 'basic-demo.docx');
fs.writeFileSync(outputPath, buffer);
console.log('✅ Success! Document created at:', outputPath);
console.log('\nYou can now open basic-demo.docx in Microsoft Word or any DOCX viewer.');
} catch (error) {
console.error('❌ Error:', error.message);
console.error(error.stack);
}
}
// Run the demo
runBasicDemo();
It will generate document that looks like this.
Multiple HTML blocks in one document
const headerHtml = `<h1>Invoice #123</h1>`;
const bodyHtml = `<p>Thank you for your purchase.</p>`;
const footerHtml = `<p><small>This is an automated document.</small></p>`;
const header = compileHtmlToComponents(headerHtml);
const body = compileHtmlToComponents(bodyHtml);
const footer = compileHtmlToComponents(footerHtml);
const doc = new Document({
sections: [
{
children: [
...header,
...body,
...footer,
],
},
],
});
Adding Images with a breeze
Even the image adding is easy with the library.
const { transformHtmlToDocx, HttpImageDownloadStrategy, ImageDownloadStrategyManager } = require('../dist/index');
const { Document, Packer } = require('docx');
const fs = require('fs');
const path = require('path');
async function runAdvancedDemo() {
console.log('Running Advanced Demo...\n');
// Complex HTML content with tables, nested lists, and advanced formatting
const html = `
<h1 style="color: darkblue; text-align: center">Image adding</h1>
<div>
<p><strong>Image example:</strong></p>
<img src="https://fastly.picsum.photos/id/237/200/300.jpg?hmac=TmmQSbShHz9CdQm0NkEjx1Dyh_Y984R9LpNrpvH2D_U" alt="Modern art" />
</div>
<h2>Important Notes</h2>
<p style="background-color: lightyellow; color: darkred">
⚠️ <strong>Confidential:</strong> This document contains sensitive information
and should not be distributed outside the organization.
</p>
`;
try {
// Set up image download strategy (optional)
console.log('Setting up configuration...');
const strategyManager = new ImageDownloadStrategyManager();
strategyManager.addStrategy(new HttpImageDownloadStrategy());
const config = {
strategyManager: strategyManager
};
// Transform HTML to DOCX elements
console.log('Converting HTML to DOCX elements...');
const docxElements = await transformHtmlToDocx(html, config);
// Create DOCX document with custom properties
const doc = new Document({
sections: [{
properties: {
page: {
margin: {
top: 1440, // 1 inch = 1440 twips
right: 1440,
bottom: 1440,
left: 1440,
},
},
},
children: docxElements
}],
creator: "HTML-DOCX Compiler Demo",
title: "Advanced Demo Document",
description: "Demonstrates advanced features of html-docxjs-compiler"
});
// Generate buffer and save file
const buffer = await Packer.toBuffer(doc);
const outputDir = path.join(__dirname, 'output');
// Create output directory if it doesn't exist
if (!fs.existsSync(outputDir)) {
fs.mkdirSync(outputDir, { recursive: true });
}
const outputPath = path.join(outputDir, 'image-demo.docx');
fs.writeFileSync(outputPath, buffer);
} catch (error) {
console.error(error.stack);
}
}
// Run the demo
runAdvancedDemo();
The result document will look this.
Thing that image parsing requires is defining the strategy logic for getting images. The default one is in the library, and it's HttpImageDownloadStrategy. It works with urls and base64 strings. If you need any other way for getting images, for example from your firebase storage or S3 bucket, you can create your own strategy for getting images and add it to ImageDownloadStrategyManager.
It needs to implement ImageDownloadStrategy interface that contains two methods:
export interface ImageDownloadStrategy {
/**
* Check if this strategy can handle the given URL
* @param url - Image URL to check
* @returns True if this strategy can handle the URL
*/
canHandle(url: string): boolean;
/**
* Download image and return as base64 data URI
* @param url - Image URL to download
* @returns Base64 data URI string (e.g., 'data:image/png;base64,...')
*/
download(url: string): Promise<string>;
}
How it works under the hood
Normally, when you work with docx, you manually create Paragraph, TextRun, Table and other XmlComponent objects.
This library automates that:
- You pass an HTML string to the compiler.
- It parses the HTML using cheerio.
- It walks the DOM tree and maps tags and inline styles to the appropriate docx.js constructs.
- It returns a list of XmlComponent instances you can drop directly into your document structure.
You still have full control over the rest of the document (sections, headers, footers, page breaks, etc.) – the compiler only handles the HTML part.
📋 Supported HTML Elements
Block Elements
| Element | Description | Styling Support |
|---|---|---|
h1 - h6
|
Headings (converted to DOCX heading styles) | ✅ |
p |
Paragraphs | ✅ text-align, color, etc. |
div |
Division container | ✅ |
ul, ol
|
Unordered/Ordered lists | ✅ Nested lists supported |
li |
List items | ✅ |
table |
Tables | ✅ |
tr |
Table rows | ✅ |
td, th
|
Table cells/headers | ✅ colspan, rowspan, background-color, vertical-align |
thead, tbody
|
Table sections | ✅ |
Inline Elements
| Element | Description | Styling Support |
|---|---|---|
strong, b
|
Bold text | ✅ |
em, i
|
Italic text | ✅ |
u |
Underlined text | ✅ |
s |
Strikethrough text | ✅ |
sub |
Subscript | ✅ |
sup |
Superscript | ✅ |
span |
Inline container | ✅ color, background-color, etc. |
a |
Hyperlinks | ✅ Creates clickable links |
br |
Line break | ✅ |
img |
Images | ✅ Auto-resize, multiple sources |
Supported CSS Styles
-
Colors: 147+ named colors + hex values (e.g.,
#FF0000,red,darkblue) -
Text Alignment:
left,center,right,justify -
Vertical Alignment:
top,middle,bottom(table cells) - Background Color: For table cells and spans
- Font Styles: bold, italic, underline, strikethrough
Try it out and tell me what you think
You can find the package on npm:
If you’re already using docx and you’re tired of hand-building XmlComponents for content that already exists as HTML, this library might save you a lot of time (and patience).
Feedback, issues and ideas for improvements are very welcome – especially real-world examples of HTML you’d like it to handle better.


Top comments (0)