Janko Stevanovic

Posted on Dec 12, 2025

Embed HTML into Word Documents with docx.js the Easy Way

#node #npm #docxjs #docx

Some time ago I was struggling with generating Word documents dynamically in Node.js.

I was using the docx library, which is great, but it was missing one feature I really needed:

I wanted to take HTML from a WYSIWYG editor and insert it into a Word document in multiple places.

Manually building Paragraph, TextRun, Table and other XmlComponent objects just to recreate existing HTML felt painful and repetitive.

From that need, I built html-docxjs-compiler – an HTML → docx XmlComponent compiler that lets you:

Parse HTML into native docx.js components
Reuse that content across different sections of a document
Mix compiled HTML with your hand-crafted docx.js layout

html-docxjs-compiler takes an HTML string and converts it into a list of XmlComponent elements that you can embed directly into the docx
API.

You’re not limited to a single insert:

You can build a whole document from multiple HTML snippets.
You can combine compiled HTML with regular docx.js components (headers, footers, cover pages, etc.).

That means you can keep your content in HTML (e.g. from a WYSIWYG editor or CMS) and still generate fully native Word documents.

QuickStart

Install the library from npm:

   npm install html-docxjs-compiler docx

The latest version is compatible with docx ^9.5.0.

Here’s a minimal example that:

takes an HTML string
compiles it to XmlComponents
creates a .docx file

Basic example

import { transformHtmlToDocx } from 'html-docxjs-compiler';
import { Document, Packer } from 'docx';
import * as fs from 'fs';

async function createDocument() {
  const html = `
    <h1>My Document</h1>
    <p>This is a paragraph with <strong>bold</strong> and <em>italic</em> text.</p>
    <ul>
      <li>First item</li>
      <li>Second item</li>
    </ul>
  `;

  // Transform HTML to DOCX elements
  const elements = await transformHtmlToDocx(html);

  // Create a document with the elements
  const doc = new Document({
    sections: [{
      children: elements
    }]
  });

  // Generate and save the document
  const buffer = await Packer.toBuffer(doc);
  fs.writeFileSync('output.docx', buffer);
}

createDocument();

More in detail example

Here is the example of generating lists, using element styles and different html elements.

const { transformHtmlToDocx } = require('../dist/index');
const { Document, Packer } = require('docx');
const { WORD_NUMBERING_CONFIGURATION } = require('./config');
const fs = require('fs');
const path = require('path');

async function runBasicDemo() {
  console.log('Running Basic Demo...\n');

  // Simple HTML content with various elements
  const html = `
    <h1>Welcome to HTML-DOCX Compiler</h1>
    <p>This is a <strong>basic demo</strong> showing how to convert HTML to DOCX format.</p>

    <h2>Features</h2>
    <p>The package supports:</p>
    <ul>
      <li>Headings (H1 through H6)</li>
      <li><strong>Bold</strong> and <em>italic</em> text</li>
      <li><u>Underlined</u> and <s>strikethrough</s> text</li>
      <li>Superscript<sup>2</sup> and subscript<sub>2</sub></li>
      <li>Links like <a href="https://github.com">GitHub</a></li>
    </ul>

    <h2>Ordered Lists</h2>
    <p>You can also create numbered lists:</p>
    <ol>
      <li>First item</li>
      <li>Second item</li>
      <li>Third item</li>
    </ol>

    <h2>Text Formatting</h2>
    <p style="text-align: center">This text is centered.</p>
    <p style="text-align: right">This text is right-aligned.</p>
    <p style="color: blue">This text is blue.</p>
    <p style="background-color: yellow">This text has a yellow background.</p>

    <div>
      <p>You can also use div elements to group content.</p>
      <p>Multiple paragraphs within a div work perfectly.</p>
    </div>
  `;

  try {
    // Transform HTML to DOCX elements
    console.log('Converting HTML to DOCX elements...');
    const docxElements = await transformHtmlToDocx(html);

    // Create DOCX document
    const doc = new Document({
      numbering: WORD_NUMBERING_CONFIGURATION,
      sections: [{
        properties: {},
        children: docxElements
      }]
    });

    // Generate buffer and save file
    const buffer = await Packer.toBuffer(doc);
    const outputDir = path.join(__dirname, 'output');

    // Create output directory if it doesn't exist
    if (!fs.existsSync(outputDir)) {
      fs.mkdirSync(outputDir, { recursive: true });
    }

    const outputPath = path.join(outputDir, 'basic-demo.docx');
    fs.writeFileSync(outputPath, buffer);

    console.log('✅ Success! Document created at:', outputPath);
    console.log('\nYou can now open basic-demo.docx in Microsoft Word or any DOCX viewer.');
  } catch (error) {
    console.error('❌ Error:', error.message);
    console.error(error.stack);
  }
}

// Run the demo
runBasicDemo();

It will generate document that looks like this.

Multiple HTML blocks in one document

const headerHtml = `<h1>Invoice #123</h1>`;
const bodyHtml = `<p>Thank you for your purchase.</p>`;
const footerHtml = `<p><small>This is an automated document.</small></p>`;

const header = compileHtmlToComponents(headerHtml);
const body = compileHtmlToComponents(bodyHtml);
const footer = compileHtmlToComponents(footerHtml);

const doc = new Document({
  sections: [
    {
      children: [
        ...header,
        ...body,
        ...footer,
      ],
    },
  ],
});

Adding Images with a breeze

Even the image adding is easy with the library.

const { transformHtmlToDocx, HttpImageDownloadStrategy, ImageDownloadStrategyManager } = require('../dist/index');
const { Document, Packer } = require('docx');
const fs = require('fs');
const path = require('path');

async function runAdvancedDemo() {
  console.log('Running Advanced Demo...\n');

  // Complex HTML content with tables, nested lists, and advanced formatting
  const html = `
    <h1 style="color: darkblue; text-align: center">Image adding</h1>

    <div>
      <p><strong>Image example:</strong></p>
      <img src="https://fastly.picsum.photos/id/237/200/300.jpg?hmac=TmmQSbShHz9CdQm0NkEjx1Dyh_Y984R9LpNrpvH2D_U" alt="Modern art" />
    </div>

    <h2>Important Notes</h2>
    <p style="background-color: lightyellow; color: darkred">
      ⚠️ <strong>Confidential:</strong> This document contains sensitive information 
      and should not be distributed outside the organization.
    </p>
  `;

  try {
    // Set up image download strategy (optional)
    console.log('Setting up configuration...');
    const strategyManager = new ImageDownloadStrategyManager();
    strategyManager.addStrategy(new HttpImageDownloadStrategy());

    const config = {
      strategyManager: strategyManager
    };

    // Transform HTML to DOCX elements
    console.log('Converting HTML to DOCX elements...');
    const docxElements = await transformHtmlToDocx(html, config);

    // Create DOCX document with custom properties
    const doc = new Document({
      sections: [{
        properties: {
          page: {
            margin: {
              top: 1440,    // 1 inch = 1440 twips
              right: 1440,
              bottom: 1440,
              left: 1440,
            },
          },
        },
        children: docxElements
      }],
      creator: "HTML-DOCX Compiler Demo",
      title: "Advanced Demo Document",
      description: "Demonstrates advanced features of html-docxjs-compiler"
    });

    // Generate buffer and save file
    const buffer = await Packer.toBuffer(doc);
    const outputDir = path.join(__dirname, 'output');

    // Create output directory if it doesn't exist
    if (!fs.existsSync(outputDir)) {
      fs.mkdirSync(outputDir, { recursive: true });
    }

    const outputPath = path.join(outputDir, 'image-demo.docx');
    fs.writeFileSync(outputPath, buffer);
  } catch (error) {
    console.error(error.stack);
  }
}

// Run the demo
runAdvancedDemo();

The result document will look this.

Thing that image parsing requires is defining the strategy logic for getting images. The default one is in the library, and it's HttpImageDownloadStrategy. It works with urls and base64 strings. If you need any other way for getting images, for example from your firebase storage or S3 bucket, you can create your own strategy for getting images and add it to ImageDownloadStrategyManager.

It needs to implement ImageDownloadStrategy interface that contains two methods:

export interface ImageDownloadStrategy {
  /**
   * Check if this strategy can handle the given URL
   * @param url - Image URL to check
   * @returns True if this strategy can handle the URL
   */
  canHandle(url: string): boolean;

  /**
   * Download image and return as base64 data URI
   * @param url - Image URL to download
   * @returns Base64 data URI string (e.g., 'data:image/png;base64,...')
   */
  download(url: string): Promise<string>;
}

How it works under the hood

Normally, when you work with docx, you manually create Paragraph, TextRun, Table and other XmlComponent objects.

This library automates that:

You pass an HTML string to the compiler.
It parses the HTML using cheerio.
It walks the DOM tree and maps tags and inline styles to the appropriate docx.js constructs.
It returns a list of XmlComponent instances you can drop directly into your document structure.

You still have full control over the rest of the document (sections, headers, footers, page breaks, etc.) – the compiler only handles the HTML part.

📋 Supported HTML Elements

Block Elements

Element	Description	Styling Support
`h1` - `h6`	Headings (converted to DOCX heading styles)	✅
`p`	Paragraphs	✅ text-align, color, etc.
`div`	Division container	✅
`ul`, `ol`	Unordered/Ordered lists	✅ Nested lists supported
`li`	List items	✅
`table`	Tables	✅
`tr`	Table rows	✅
`td`, `th`	Table cells/headers	✅ colspan, rowspan, background-color, vertical-align
`thead`, `tbody`	Table sections	✅

Inline Elements

Element	Description	Styling Support
`strong`, `b`	Bold text	✅
`em`, `i`	Italic text	✅
`u`	Underlined text	✅
`s`	Strikethrough text	✅
`sub`	Subscript	✅
`sup`	Superscript	✅
`span`	Inline container	✅ color, background-color, etc.
`a`	Hyperlinks	✅ Creates clickable links
`br`	Line break	✅
`img`	Images	✅ Auto-resize, multiple sources

Supported CSS Styles

Colors: 147+ named colors + hex values (e.g., #FF0000, red, darkblue)
Text Alignment: left, center, right, justify
Vertical Alignment: top, middle, bottom (table cells)
Background Color: For table cells and spans
Font Styles: bold, italic, underline, strikethrough

Try it out and tell me what you think

You can find the package on npm:

👉html-docxjs-compiler

If you’re already using docx and you’re tired of hand-building XmlComponents for content that already exists as HTML, this library might save you a lot of time (and patience).

Feedback, issues and ideas for improvements are very welcome – especially real-world examples of HTML you’d like it to handle better.

DEV Community