DEV Community

Cover image for Embed HTML into Word Documents with docx.js the Easy Way
Janko Stevanovic
Janko Stevanovic

Posted on

Embed HTML into Word Documents with docx.js the Easy Way

Some time ago I was struggling with generating Word documents dynamically in Node.js.

I was using the docx library, which is great, but it was missing one feature I really needed:

I wanted to take HTML from a WYSIWYG editor and insert it into a Word document in multiple places.

Manually building Paragraph, TextRun, Table and other XmlComponent objects just to recreate existing HTML felt painful and repetitive.

From that need, I built html-docxjs-compileran HTML → docx XmlComponent compiler that lets you:

  • Parse HTML into native docx.js components
  • Reuse that content across different sections of a document
  • Mix compiled HTML with your hand-crafted docx.js layout

html-docxjs-compiler takes an HTML string and converts it into a list of XmlComponent elements that you can embed directly into the docx
API.

You’re not limited to a single insert:

  • You can build a whole document from multiple HTML snippets.
  • You can combine compiled HTML with regular docx.js components (headers, footers, cover pages, etc.).

That means you can keep your content in HTML (e.g. from a WYSIWYG editor or CMS) and still generate fully native Word documents.

QuickStart

Install the library from npm:

   npm install html-docxjs-compiler docx
Enter fullscreen mode Exit fullscreen mode

The latest version is compatible with docx ^9.5.0.

Here’s a minimal example that:

  • takes an HTML string
  • compiles it to XmlComponents
  • creates a .docx file

Basic example

import { transformHtmlToDocx } from 'html-docxjs-compiler';
import { Document, Packer } from 'docx';
import * as fs from 'fs';

async function createDocument() {
  const html = `
    <h1>My Document</h1>
    <p>This is a paragraph with <strong>bold</strong> and <em>italic</em> text.</p>
    <ul>
      <li>First item</li>
      <li>Second item</li>
    </ul>
  `;

  // Transform HTML to DOCX elements
  const elements = await transformHtmlToDocx(html);

  // Create a document with the elements
  const doc = new Document({
    sections: [{
      children: elements
    }]
  });

  // Generate and save the document
  const buffer = await Packer.toBuffer(doc);
  fs.writeFileSync('output.docx', buffer);
}

createDocument();
Enter fullscreen mode Exit fullscreen mode

More in detail example

Here is the example of generating lists, using element styles and different html elements.

const { transformHtmlToDocx } = require('../dist/index');
const { Document, Packer } = require('docx');
const { WORD_NUMBERING_CONFIGURATION } = require('./config');
const fs = require('fs');
const path = require('path');

async function runBasicDemo() {
  console.log('Running Basic Demo...\n');

  // Simple HTML content with various elements
  const html = `
    <h1>Welcome to HTML-DOCX Compiler</h1>
    <p>This is a <strong>basic demo</strong> showing how to convert HTML to DOCX format.</p>

    <h2>Features</h2>
    <p>The package supports:</p>
    <ul>
      <li>Headings (H1 through H6)</li>
      <li><strong>Bold</strong> and <em>italic</em> text</li>
      <li><u>Underlined</u> and <s>strikethrough</s> text</li>
      <li>Superscript<sup>2</sup> and subscript<sub>2</sub></li>
      <li>Links like <a href="https://github.com">GitHub</a></li>
    </ul>

    <h2>Ordered Lists</h2>
    <p>You can also create numbered lists:</p>
    <ol>
      <li>First item</li>
      <li>Second item</li>
      <li>Third item</li>
    </ol>

    <h2>Text Formatting</h2>
    <p style="text-align: center">This text is centered.</p>
    <p style="text-align: right">This text is right-aligned.</p>
    <p style="color: blue">This text is blue.</p>
    <p style="background-color: yellow">This text has a yellow background.</p>

    <div>
      <p>You can also use div elements to group content.</p>
      <p>Multiple paragraphs within a div work perfectly.</p>
    </div>
  `;

  try {
    // Transform HTML to DOCX elements
    console.log('Converting HTML to DOCX elements...');
    const docxElements = await transformHtmlToDocx(html);

    // Create DOCX document
    const doc = new Document({
      numbering: WORD_NUMBERING_CONFIGURATION,
      sections: [{
        properties: {},
        children: docxElements
      }]
    });

    // Generate buffer and save file
    const buffer = await Packer.toBuffer(doc);
    const outputDir = path.join(__dirname, 'output');

    // Create output directory if it doesn't exist
    if (!fs.existsSync(outputDir)) {
      fs.mkdirSync(outputDir, { recursive: true });
    }

    const outputPath = path.join(outputDir, 'basic-demo.docx');
    fs.writeFileSync(outputPath, buffer);

    console.log('✅ Success! Document created at:', outputPath);
    console.log('\nYou can now open basic-demo.docx in Microsoft Word or any DOCX viewer.');
  } catch (error) {
    console.error('❌ Error:', error.message);
    console.error(error.stack);
  }
}

// Run the demo
runBasicDemo();
Enter fullscreen mode Exit fullscreen mode

It will generate document that looks like this.

Generated document

Multiple HTML blocks in one document

const headerHtml = `<h1>Invoice #123</h1>`;
const bodyHtml = `<p>Thank you for your purchase.</p>`;
const footerHtml = `<p><small>This is an automated document.</small></p>`;

const header = compileHtmlToComponents(headerHtml);
const body = compileHtmlToComponents(bodyHtml);
const footer = compileHtmlToComponents(footerHtml);

const doc = new Document({
  sections: [
    {
      children: [
        ...header,
        ...body,
        ...footer,
      ],
    },
  ],
});

Enter fullscreen mode Exit fullscreen mode

Adding Images with a breeze

Even the image adding is easy with the library.

const { transformHtmlToDocx, HttpImageDownloadStrategy, ImageDownloadStrategyManager } = require('../dist/index');
const { Document, Packer } = require('docx');
const fs = require('fs');
const path = require('path');

async function runAdvancedDemo() {
  console.log('Running Advanced Demo...\n');

  // Complex HTML content with tables, nested lists, and advanced formatting
  const html = `
    <h1 style="color: darkblue; text-align: center">Image adding</h1>

    <div>
      <p><strong>Image example:</strong></p>
      <img src="https://fastly.picsum.photos/id/237/200/300.jpg?hmac=TmmQSbShHz9CdQm0NkEjx1Dyh_Y984R9LpNrpvH2D_U" alt="Modern art" />
    </div>

    <h2>Important Notes</h2>
    <p style="background-color: lightyellow; color: darkred">
      ⚠️ <strong>Confidential:</strong> This document contains sensitive information 
      and should not be distributed outside the organization.
    </p>
  `;

  try {
    // Set up image download strategy (optional)
    console.log('Setting up configuration...');
    const strategyManager = new ImageDownloadStrategyManager();
    strategyManager.addStrategy(new HttpImageDownloadStrategy());

    const config = {
      strategyManager: strategyManager
    };

    // Transform HTML to DOCX elements
    console.log('Converting HTML to DOCX elements...');
    const docxElements = await transformHtmlToDocx(html, config);

    // Create DOCX document with custom properties
    const doc = new Document({
      sections: [{
        properties: {
          page: {
            margin: {
              top: 1440,    // 1 inch = 1440 twips
              right: 1440,
              bottom: 1440,
              left: 1440,
            },
          },
        },
        children: docxElements
      }],
      creator: "HTML-DOCX Compiler Demo",
      title: "Advanced Demo Document",
      description: "Demonstrates advanced features of html-docxjs-compiler"
    });

    // Generate buffer and save file
    const buffer = await Packer.toBuffer(doc);
    const outputDir = path.join(__dirname, 'output');

    // Create output directory if it doesn't exist
    if (!fs.existsSync(outputDir)) {
      fs.mkdirSync(outputDir, { recursive: true });
    }

    const outputPath = path.join(outputDir, 'image-demo.docx');
    fs.writeFileSync(outputPath, buffer);
  } catch (error) {
    console.error(error.stack);
  }
}

// Run the demo
runAdvancedDemo();

Enter fullscreen mode Exit fullscreen mode

The result document will look this.

Image example

Thing that image parsing requires is defining the strategy logic for getting images. The default one is in the library, and it's HttpImageDownloadStrategy. It works with urls and base64 strings. If you need any other way for getting images, for example from your firebase storage or S3 bucket, you can create your own strategy for getting images and add it to ImageDownloadStrategyManager.

It needs to implement ImageDownloadStrategy interface that contains two methods:

export interface ImageDownloadStrategy {
  /**
   * Check if this strategy can handle the given URL
   * @param url - Image URL to check
   * @returns True if this strategy can handle the URL
   */
  canHandle(url: string): boolean;

  /**
   * Download image and return as base64 data URI
   * @param url - Image URL to download
   * @returns Base64 data URI string (e.g., 'data:image/png;base64,...')
   */
  download(url: string): Promise<string>;
}
Enter fullscreen mode Exit fullscreen mode

How it works under the hood

Normally, when you work with docx, you manually create Paragraph, TextRun, Table and other XmlComponent objects.

This library automates that:

  1. You pass an HTML string to the compiler.
  2. It parses the HTML using cheerio.
  3. It walks the DOM tree and maps tags and inline styles to the appropriate docx.js constructs.
  4. It returns a list of XmlComponent instances you can drop directly into your document structure.

You still have full control over the rest of the document (sections, headers, footers, page breaks, etc.) – the compiler only handles the HTML part.

📋 Supported HTML Elements

Block Elements

Element Description Styling Support
h1 - h6 Headings (converted to DOCX heading styles)
p Paragraphs ✅ text-align, color, etc.
div Division container
ul, ol Unordered/Ordered lists ✅ Nested lists supported
li List items
table Tables
tr Table rows
td, th Table cells/headers ✅ colspan, rowspan, background-color, vertical-align
thead, tbody Table sections

Inline Elements

Element Description Styling Support
strong, b Bold text
em, i Italic text
u Underlined text
s Strikethrough text
sub Subscript
sup Superscript
span Inline container ✅ color, background-color, etc.
a Hyperlinks ✅ Creates clickable links
br Line break
img Images ✅ Auto-resize, multiple sources

Supported CSS Styles

  • Colors: 147+ named colors + hex values (e.g., #FF0000, red, darkblue)
  • Text Alignment: left, center, right, justify
  • Vertical Alignment: top, middle, bottom (table cells)
  • Background Color: For table cells and spans
  • Font Styles: bold, italic, underline, strikethrough

Try it out and tell me what you think

You can find the package on npm:

👉html-docxjs-compiler

If you’re already using docx and you’re tired of hand-building XmlComponents for content that already exists as HTML, this library might save you a lot of time (and patience).

Feedback, issues and ideas for improvements are very welcome – especially real-world examples of HTML you’d like it to handle better.

Top comments (0)