DEV Community

Hardi
Hardi

Posted on

XML to PDF Converter: The Complete Guide to Document Generation

Converting XML to PDF is a cornerstone of enterprise document generation—from invoices and reports to contracts and compliance documents. Whether you're building an invoicing system, generating reports, or automating document workflows, mastering XML to PDF conversion is essential. Let's explore the techniques, tools, and best practices for professional document generation.

Why XML to PDF Conversion Matters

The Document Generation Challenge

// The business problem
const documentGenerationNeeds = {
  invoices: 'Generate thousands of invoices monthly',
  reports: 'Automated weekly/monthly reports',
  contracts: 'Dynamic contract generation',
  receipts: 'Transaction receipts on-demand',
  statements: 'Account statements',
  labels: 'Shipping labels and barcodes',

  challenges: [
    'Consistent formatting across documents',
    'Professional PDF output',
    'Dynamic data injection',
    'Template management',
    'Internationalization',
    'Performance at scale',
    'Compliance requirements'
  ]
};

// Why XML?
const xmlAdvantages = {
  separation: 'Data separate from presentation',
  structure: 'Hierarchical, validates easily',
  standard: 'Universal format for data exchange',
  transformation: 'XSLT for powerful transformations',
  integration: 'Works with existing systems',

  realWorld: [
    'ERP systems output XML',
    'APIs return XML data',
    'EDI standards use XML',
    'Government/healthcare mandates XML',
    'Financial systems use XML'
  ]
};

console.log('XML to PDF: Bridge data to professional documents');
Enter fullscreen mode Exit fullscreen mode

Real-World Impact

// Invoice generation at scale
const invoiceSystem = {
  before: {
    method: 'Manual word processor',
    time: '30 minutes per invoice',
    errors: 'Frequent data entry mistakes',
    cost: '$25 per invoice in labor',
    scale: '100 invoices/month = $2,500/month'
  },

  after: {
    method: 'Automated XML to PDF',
    time: '3 seconds per invoice',
    errors: 'Zero (data validated)',
    cost: '$0.01 per invoice in compute',
    scale: '10,000 invoices/month = $100/month',
    savings: '$2,400/month = $28,800/year'
  },

  benefits: [
    'Instant generation',
    'Consistent formatting',
    'No human errors',
    'Scales infinitely',
    'Easy template updates',
    'Multi-language support'
  ]
};

// Real cost savings example:
// Company: 1,000 invoices/month
// Manual: 500 hours/month @ $50/hr = $25,000
// Automated: $10 in server costs
// Savings: $24,990/month = $299,880/year
Enter fullscreen mode Exit fullscreen mode

XML to PDF Approaches

1. XSL-FO (XML Formatting Objects)

<!-- invoice.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<invoice>
  <number>INV-2024-001</number>
  <date>2024-01-31</date>
  <customer>
    <name>Acme Corporation</name>
    <address>123 Business St, New York, NY 10001</address>
  </customer>
  <items>
    <item>
      <description>Web Development Services</description>
      <quantity>40</quantity>
      <rate>150.00</rate>
      <amount>6000.00</amount>
    </item>
    <item>
      <description>Hosting (Annual)</description>
      <quantity>1</quantity>
      <rate>1200.00</rate>
      <amount>1200.00</amount>
    </item>
  </items>
  <total>7200.00</total>
</invoice>
Enter fullscreen mode Exit fullscreen mode
<!-- invoice.xsl - XSL-FO transformation -->
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format">

  <xsl:template match="/">
    <fo:root>
      <fo:layout-master-set>
        <fo:simple-page-master master-name="invoice"
          page-height="11in" page-width="8.5in"
          margin-top="1in" margin-bottom="1in"
          margin-left="1in" margin-right="1in">
          <fo:region-body/>
        </fo:simple-page-master>
      </fo:layout-master-set>

      <fo:page-sequence master-reference="invoice">
        <fo:flow flow-name="xsl-region-body">

          <!-- Header -->
          <fo:block font-size="24pt" font-weight="bold" 
            space-after="20pt" text-align="center">
            INVOICE
          </fo:block>

          <!-- Invoice details -->
          <fo:block space-after="10pt">
            Invoice #: <xsl:value-of select="invoice/number"/>
          </fo:block>
          <fo:block space-after="20pt">
            Date: <xsl:value-of select="invoice/date"/>
          </fo:block>

          <!-- Customer info -->
          <fo:block font-weight="bold" space-after="5pt">
            Bill To:
          </fo:block>
          <fo:block space-after="5pt">
            <xsl:value-of select="invoice/customer/name"/>
          </fo:block>
          <fo:block space-after="20pt">
            <xsl:value-of select="invoice/customer/address"/>
          </fo:block>

          <!-- Items table -->
          <fo:table table-layout="fixed" width="100%" 
            border="1pt solid black" space-after="20pt">
            <fo:table-column column-width="50%"/>
            <fo:table-column column-width="15%"/>
            <fo:table-column column-width="15%"/>
            <fo:table-column column-width="20%"/>

            <fo:table-header background-color="#e0e0e0">
              <fo:table-row>
                <fo:table-cell padding="5pt" border="1pt solid black">
                  <fo:block font-weight="bold">Description</fo:block>
                </fo:table-cell>
                <fo:table-cell padding="5pt" border="1pt solid black">
                  <fo:block font-weight="bold">Qty</fo:block>
                </fo:table-cell>
                <fo:table-cell padding="5pt" border="1pt solid black">
                  <fo:block font-weight="bold">Rate</fo:block>
                </fo:table-cell>
                <fo:table-cell padding="5pt" border="1pt solid black">
                  <fo:block font-weight="bold">Amount</fo:block>
                </fo:table-cell>
              </fo:table-row>
            </fo:table-header>

            <fo:table-body>
              <xsl:for-each select="invoice/items/item">
                <fo:table-row>
                  <fo:table-cell padding="5pt" border="1pt solid black">
                    <fo:block><xsl:value-of select="description"/></fo:block>
                  </fo:table-cell>
                  <fo:table-cell padding="5pt" border="1pt solid black">
                    <fo:block><xsl:value-of select="quantity"/></fo:block>
                  </fo:table-cell>
                  <fo:table-cell padding="5pt" border="1pt solid black">
                    <fo:block>$<xsl:value-of select="rate"/></fo:block>
                  </fo:table-cell>
                  <fo:table-cell padding="5pt" border="1pt solid black">
                    <fo:block>$<xsl:value-of select="amount"/></fo:block>
                  </fo:table-cell>
                </fo:table-row>
              </xsl:for-each>
            </fo:table-body>
          </fo:table>

          <!-- Total -->
          <fo:block font-size="14pt" font-weight="bold" text-align="right">
            Total: $<xsl:value-of select="invoice/total"/>
          </fo:block>

        </fo:flow>
      </fo:page-sequence>
    </fo:root>
  </xsl:template>
</xsl:stylesheet>
Enter fullscreen mode Exit fullscreen mode
// Node.js: Convert XML to PDF using Apache FOP
const { execFile } = require('child_process');
const fs = require('fs').promises;

async function xmlToPdfWithFOP(xmlPath, xslPath, outputPath) {
  console.log(`Converting ${xmlPath} to PDF...\n`);

  try {
    // Apache FOP command
    await execFile('fop', [
      '-xml', xmlPath,
      '-xsl', xslPath,
      '-pdf', outputPath
    ]);

    const stats = await fs.stat(outputPath);
    console.log(`✓ PDF generated: ${outputPath}`);
    console.log(`  File size: ${(stats.size / 1024).toFixed(2)}KB`);

    return outputPath;
  } catch (error) {
    console.error('✗ Conversion failed:', error.message);
    throw error;
  }
}

// Usage
await xmlToPdfWithFOP('invoice.xml', 'invoice.xsl', 'invoice.pdf');
Enter fullscreen mode Exit fullscreen mode

2. HTML/CSS as Intermediate (Modern Approach)

// Convert XML → HTML → PDF using modern tools
const puppeteer = require('puppeteer');
const { parseString } = require('xml2js');
const Handlebars = require('handlebars');

class XMLToPDFConverter {
  constructor() {
    this.browser = null;
  }

  async initialize() {
    this.browser = await puppeteer.launch({
      headless: 'new',
      args: ['--no-sandbox']
    });
  }

  async parseXML(xmlString) {
    return new Promise((resolve, reject) => {
      parseString(xmlString, (err, result) => {
        if (err) reject(err);
        else resolve(result);
      });
    });
  }

  async convertToPDF(xmlPath, templatePath, outputPath) {
    console.log(`\nConverting ${xmlPath} to PDF...\n`);

    try {
      // Read XML
      const xmlContent = await fs.readFile(xmlPath, 'utf8');
      const data = await this.parseXML(xmlContent);

      console.log('✓ XML parsed');

      // Read HTML template
      const templateContent = await fs.readFile(templatePath, 'utf8');
      const template = Handlebars.compile(templateContent);

      // Generate HTML
      const html = template(data);
      console.log('✓ HTML generated');

      // Create PDF
      const page = await this.browser.newPage();
      await page.setContent(html, { waitUntil: 'networkidle0' });

      await page.pdf({
        path: outputPath,
        format: 'A4',
        margin: {
          top: '20mm',
          right: '20mm',
          bottom: '20mm',
          left: '20mm'
        },
        printBackground: true
      });

      console.log(`✓ PDF generated: ${outputPath}`);

      const stats = await fs.stat(outputPath);
      console.log(`  File size: ${(stats.size / 1024).toFixed(2)}KB\n`);

      await page.close();
      return outputPath;
    } catch (error) {
      console.error('✗ Conversion failed:', error);
      throw error;
    }
  }

  async close() {
    if (this.browser) {
      await this.browser.close();
    }
  }
}

// HTML template (invoice.html)
const invoiceTemplate = `
<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <style>
    body {
      font-family: Arial, sans-serif;
      font-size: 12pt;
      line-height: 1.6;
    }
    .header {
      text-align: center;
      font-size: 24pt;
      font-weight: bold;
      margin-bottom: 30px;
    }
    .invoice-details {
      margin-bottom: 20px;
    }
    .customer-info {
      margin-bottom: 30px;
    }
    table {
      width: 100%;
      border-collapse: collapse;
      margin-bottom: 20px;
    }
    th, td {
      border: 1px solid #000;
      padding: 10px;
      text-align: left;
    }
    th {
      background-color: #e0e0e0;
      font-weight: bold;
    }
    .total {
      text-align: right;
      font-size: 14pt;
      font-weight: bold;
      margin-top: 20px;
    }
  </style>
</head>
<body>
  <div class="header">INVOICE</div>

  <div class="invoice-details">
    <div>Invoice #: {{invoice.number}}</div>
    <div>Date: {{invoice.date}}</div>
  </div>

  <div class="customer-info">
    <strong>Bill To:</strong><br>
    {{invoice.customer.name}}<br>
    {{invoice.customer.address}}
  </div>

  <table>
    <thead>
      <tr>
        <th>Description</th>
        <th>Quantity</th>
        <th>Rate</th>
        <th>Amount</th>
      </tr>
    </thead>
    <tbody>
      {{#each invoice.items.item}}
      <tr>
        <td>{{description}}</td>
        <td>{{quantity}}</td>
        <td>${{rate}}</td>
        <td>${{amount}}</td>
      </tr>
      {{/each}}
    </tbody>
  </table>

  <div class="total">
    Total: ${{invoice.total}}
  </div>
</body>
</html>
`;

// Usage
const converter = new XMLToPDFConverter();
await converter.initialize();
await converter.convertToPDF('invoice.xml', 'invoice.html', 'invoice.pdf');
await converter.close();
Enter fullscreen mode Exit fullscreen mode

3. Direct PDF Generation with PDFKit

// Generate PDF directly from XML (no intermediate HTML)
const PDFDocument = require('pdfkit');
const { parseString } = require('xml2js');

class DirectXMLToPDF {
  async convert(xmlPath, outputPath) {
    console.log(`\nGenerating PDF from ${xmlPath}...\n`);

    // Parse XML
    const xmlContent = await fs.readFile(xmlPath, 'utf8');
    const data = await this.parseXML(xmlContent);

    const invoice = data.invoice;

    // Create PDF
    const doc = new PDFDocument({
      size: 'A4',
      margin: 50
    });

    // Pipe to file
    const stream = fs.createWriteStream(outputPath);
    doc.pipe(stream);

    // Header
    doc.fontSize(24)
       .text('INVOICE', { align: 'center' })
       .moveDown(2);

    // Invoice details
    doc.fontSize(12)
       .text(`Invoice #: ${invoice.number[0]}`)
       .text(`Date: ${invoice.date[0]}`)
       .moveDown();

    // Customer info
    doc.fontSize(12)
       .font('Helvetica-Bold')
       .text('Bill To:')
       .font('Helvetica')
       .text(invoice.customer[0].name[0])
       .text(invoice.customer[0].address[0])
       .moveDown(2);

    // Table header
    const tableTop = doc.y;
    const col1X = 50;
    const col2X = 300;
    const col3X = 380;
    const col4X = 460;

    doc.font('Helvetica-Bold');
    doc.text('Description', col1X, tableTop);
    doc.text('Qty', col2X, tableTop);
    doc.text('Rate', col3X, tableTop);
    doc.text('Amount', col4X, tableTop);

    // Draw line under header
    doc.moveTo(50, tableTop + 20)
       .lineTo(550, tableTop + 20)
       .stroke();

    // Items
    doc.font('Helvetica');
    let y = tableTop + 30;

    invoice.items[0].item.forEach(item => {
      doc.text(item.description[0], col1X, y, { width: 240 });
      doc.text(item.quantity[0], col2X, y);
      doc.text(`$${item.rate[0]}`, col3X, y);
      doc.text(`$${item.amount[0]}`, col4X, y);
      y += 25;
    });

    // Total
    y += 20;
    doc.fontSize(14)
       .font('Helvetica-Bold')
       .text(`Total: $${invoice.total[0]}`, { align: 'right' });

    // Finalize
    doc.end();

    return new Promise((resolve, reject) => {
      stream.on('finish', async () => {
        const stats = await fs.stat(outputPath);
        console.log(`✓ PDF generated: ${outputPath}`);
        console.log(`  File size: ${(stats.size / 1024).toFixed(2)}KB\n`);
        resolve(outputPath);
      });
      stream.on('error', reject);
    });
  }

  async parseXML(xmlString) {
    return new Promise((resolve, reject) => {
      parseString(xmlString, (err, result) => {
        if (err) reject(err);
        else resolve(result);
      });
    });
  }
}

// Usage
const directConverter = new DirectXMLToPDF();
await directConverter.convert('invoice.xml', 'invoice.pdf');
Enter fullscreen mode Exit fullscreen mode

4. Enterprise Solution with Apache FOP

# Install Apache FOP
# Download from: https://xmlgraphics.apache.org/fop/download.html

# Convert XML to PDF
fop -xml invoice.xml -xsl invoice.xsl -pdf invoice.pdf

# With configuration
fop -c fop.xconf -xml invoice.xml -xsl invoice.xsl -pdf invoice.pdf

# Multiple outputs
fop -xml invoice.xml -xsl invoice.xsl -pdf invoice.pdf -ps invoice.ps
Enter fullscreen mode Exit fullscreen mode
// Node.js wrapper for Apache FOP
const { spawn } = require('child_process');

class ApacheFOPConverter {
  constructor(fopPath = 'fop') {
    this.fopPath = fopPath;
  }

  async convert(xmlPath, xslPath, outputPath, options = {}) {
    const {
      format = 'pdf',
      config = null
    } = options;

    const args = [];

    if (config) {
      args.push('-c', config);
    }

    args.push(
      '-xml', xmlPath,
      '-xsl', xslPath,
      `-${format}`, outputPath
    );

    console.log(`\nRunning: ${this.fopPath} ${args.join(' ')}\n`);

    return new Promise((resolve, reject) => {
      const fop = spawn(this.fopPath, args);

      let stderr = '';

      fop.stdout.on('data', (data) => {
        console.log(data.toString());
      });

      fop.stderr.on('data', (data) => {
        stderr += data.toString();
      });

      fop.on('close', async (code) => {
        if (code === 0) {
          const stats = await fs.stat(outputPath);
          console.log(`✓ PDF generated: ${outputPath}`);
          console.log(`  File size: ${(stats.size / 1024).toFixed(2)}KB\n`);
          resolve(outputPath);
        } else {
          console.error('✗ Conversion failed:');
          console.error(stderr);
          reject(new Error(`FOP exited with code ${code}`));
        }
      });
    });
  }

  async batchConvert(files) {
    console.log(`\nBatch converting ${files.length} files...\n`);

    const results = [];

    for (const { xml, xsl, output } of files) {
      try {
        await this.convert(xml, xsl, output);
        results.push({ success: true, output });
      } catch (error) {
        results.push({ success: false, error: error.message });
      }
    }

    const successful = results.filter(r => r.success).length;
    console.log(`\n✓ Batch complete: ${successful}/${files.length} successful\n`);

    return results;
  }
}
Enter fullscreen mode Exit fullscreen mode

5. Express API for XML to PDF Conversion

const express = require('express');
const multer = require('multer');
const { parseString } = require('xml2js');
const puppeteer = require('puppeteer');

const app = express();
const upload = multer({ storage: multer.memoryStorage() });

let browser;

// Initialize browser
(async () => {
  browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox']
  });
  console.log('Browser initialized');
})();

// Convert XML to PDF
app.post('/api/xml-to-pdf', upload.single('xml'), async (req, res) => {
  try {
    if (!req.file) {
      return res.status(400).json({ error: 'XML file required' });
    }

    const xmlContent = req.file.buffer.toString('utf8');
    const template = req.body.template || 'default';

    console.log(`Converting XML to PDF (template: ${template})...`);

    // Parse XML
    const data = await new Promise((resolve, reject) => {
      parseString(xmlContent, (err, result) => {
        if (err) reject(err);
        else resolve(result);
      });
    });

    // Generate HTML from template
    const html = generateHTML(data, template);

    // Generate PDF
    const page = await browser.newPage();
    await page.setContent(html, { waitUntil: 'networkidle0' });

    const pdfBuffer = await page.pdf({
      format: 'A4',
      margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' },
      printBackground: true
    });

    await page.close();

    console.log(`✓ PDF generated (${pdfBuffer.length} bytes)`);

    // Send PDF
    res.set({
      'Content-Type': 'application/pdf',
      'Content-Disposition': 'attachment; filename="document.pdf"',
      'Content-Length': pdfBuffer.length
    });

    res.send(pdfBuffer);

  } catch (error) {
    console.error('Conversion error:', error);
    res.status(500).json({ 
      error: 'Conversion failed',
      details: error.message
    });
  }
});

// Batch conversion
app.post('/api/xml-to-pdf/batch', upload.array('files'), async (req, res) => {
  try {
    if (!req.files || req.files.length === 0) {
      return res.status(400).json({ error: 'No files uploaded' });
    }

    console.log(`Batch converting ${req.files.length} files...`);

    const results = [];

    for (const file of req.files) {
      try {
        const xmlContent = file.buffer.toString('utf8');
        const data = await new Promise((resolve, reject) => {
          parseString(xmlContent, (err, result) => {
            if (err) reject(err);
            else resolve(result);
          });
        });

        const html = generateHTML(data, 'default');

        const page = await browser.newPage();
        await page.setContent(html, { waitUntil: 'networkidle0' });

        const pdfBuffer = await page.pdf({
          format: 'A4',
          printBackground: true
        });

        await page.close();

        results.push({
          filename: file.originalname,
          success: true,
          size: pdfBuffer.length,
          pdf: pdfBuffer.toString('base64')
        });
      } catch (error) {
        results.push({
          filename: file.originalname,
          success: false,
          error: error.message
        });
      }
    }

    const successful = results.filter(r => r.success).length;
    console.log(`✓ Batch complete: ${successful}/${req.files.length} successful`);

    res.json({
      total: req.files.length,
      successful,
      results
    });

  } catch (error) {
    console.error('Batch conversion error:', error);
    res.status(500).json({ error: 'Batch conversion failed' });
  }
});

function generateHTML(data, template) {
  // Simple invoice template
  return `
    <!DOCTYPE html>
    <html>
    <head>
      <style>
        body { font-family: Arial; padding: 40px; }
        .header { font-size: 24pt; font-weight: bold; text-align: center; margin-bottom: 30px; }
        table { width: 100%; border-collapse: collapse; margin: 20px 0; }
        th, td { border: 1px solid #000; padding: 10px; text-align: left; }
        th { background-color: #e0e0e0; }
        .total { text-align: right; font-size: 14pt; font-weight: bold; margin-top: 20px; }
      </style>
    </head>
    <body>
      <div class="header">INVOICE</div>
      <div>Invoice #: ${data.invoice.number[0]}</div>
      <div>Date: ${data.invoice.date[0]}</div>
      <br>
      <strong>Bill To:</strong><br>
      ${data.invoice.customer[0].name[0]}<br>
      ${data.invoice.customer[0].address[0]}
      <table>
        <tr>
          <th>Description</th>
          <th>Qty</th>
          <th>Rate</th>
          <th>Amount</th>
        </tr>
        ${data.invoice.items[0].item.map(item => `
          <tr>
            <td>${item.description[0]}</td>
            <td>${item.quantity[0]}</td>
            <td>$${item.rate[0]}</td>
            <td>$${item.amount[0]}</td>
          </tr>
        `).join('')}
      </table>
      <div class="total">Total: $${data.invoice.total[0]}</div>
    </body>
    </html>
  `;
}

app.listen(3000, () => {
  console.log('XML to PDF API running on port 3000');
  console.log('POST /api/xml-to-pdf - Convert single XML');
  console.log('POST /api/xml-to-pdf/batch - Batch conversion');
});
Enter fullscreen mode Exit fullscreen mode

6. Quick Online Conversion

For rapid testing, prototyping, or one-off conversions during development, using an XML to PDF converter can quickly demonstrate the conversion process. This is particularly useful when:

  • Testing templates: Preview how XML data renders in PDF
  • Client demonstrations: Show invoice/report samples
  • Template development: Iterate on layout and styling
  • Data validation: Verify XML structure before automation

For production systems, integrate automated conversion into your application for scalability and reliability.

Real-World Applications

1. Invoice Generation System

// Production-ready invoice generator
class InvoiceGenerator {
  constructor(converter) {
    this.converter = converter;
  }

  async generateInvoice(orderId) {
    console.log(`\nGenerating invoice for order ${orderId}...\n`);

    // Fetch order data from database
    const order = await db.query(
      'SELECT * FROM orders WHERE id = $1',
      [orderId]
    );

    const items = await db.query(
      'SELECT * FROM order_items WHERE order_id = $1',
      [orderId]
    );

    // Convert to XML
    const xml = this.createInvoiceXML(order, items);

    // Save XML
    const xmlPath = `temp/invoice-${orderId}.xml`;
    await fs.writeFile(xmlPath, xml);

    // Convert to PDF
    const pdfPath = `invoices/invoice-${orderId}.pdf`;
    await this.converter.convertToPDF(xmlPath, 'invoice-template.html', pdfPath);

    // Update database
    await db.query(
      'UPDATE orders SET invoice_path = $1, invoice_generated_at = NOW() WHERE id = $2',
      [pdfPath, orderId]
    );

    // Send email
    await this.emailInvoice(order.customer_email, pdfPath);

    // Cleanup temp XML
    await fs.unlink(xmlPath);

    console.log(`✓ Invoice generated and emailed: ${pdfPath}\n`);

    return pdfPath;
  }

  createInvoiceXML(order, items) {
    const itemsXML = items.map(item => `
      <item>
        <description>${this.escapeXML(item.description)}</description>
        <quantity>${item.quantity}</quantity>
        <rate>${item.rate.toFixed(2)}</rate>
        <amount>${(item.quantity * item.rate).toFixed(2)}</amount>
      </item>
    `).join('');

    return `<?xml version="1.0" encoding="UTF-8"?>
<invoice>
  <number>${order.invoice_number}</number>
  <date>${order.created_at.toISOString().split('T')[0]}</date>
  <customer>
    <name>${this.escapeXML(order.customer_name)}</name>
    <address>${this.escapeXML(order.customer_address)}</address>
  </customer>
  <items>
    ${itemsXML}
  </items>
  <total>${order.total.toFixed(2)}</total>
</invoice>`;
  }

  escapeXML(str) {
    return str
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
      .replace(/'/g, '&apos;');
  }

  async emailInvoice(email, pdfPath) {
    // Implementation depends on email service
    console.log(`Emailing invoice to ${email}...`);
  }
}

// Usage
const generator = new InvoiceGenerator(converter);
await generator.generateInvoice('ORD-2024-001');
Enter fullscreen mode Exit fullscreen mode

2. Report Generation at Scale

// Generate thousands of reports
class BulkReportGenerator {
  async generateMonthlyReports(year, month) {
    console.log(`\nGenerating monthly reports for ${year}-${month}...\n`);

    // Get all customers
    const customers = await db.query(
      'SELECT id, name, email FROM customers WHERE active = true'
    );

    console.log(`Found ${customers.length} active customers\n`);

    let generated = 0;
    let failed = 0;

    for (const customer of customers) {
      try {
        await this.generateCustomerReport(customer.id, year, month);
        generated++;

        if (generated % 100 === 0) {
          console.log(`Progress: ${generated}/${customers.length} generated`);
        }
      } catch (error) {
        console.error(`✗ Failed for customer ${customer.id}:`, error.message);
        failed++;
      }
    }

    console.log(`\n✓ Report generation complete`);
    console.log(`  Generated: ${generated}`);
    console.log(`  Failed: ${failed}\n`);
  }

  async generateCustomerReport(customerId, year, month) {
    // Fetch customer data
    const transactions = await db.query(`
      SELECT * FROM transactions 
      WHERE customer_id = $1 
        AND EXTRACT(YEAR FROM created_at) = $2
        AND EXTRACT(MONTH FROM created_at) = $3
    `, [customerId, year, month]);

    // Create XML
    const xml = this.createReportXML(customerId, transactions);

    // Convert to PDF
    const pdfPath = `reports/${year}/${month}/customer-${customerId}.pdf`;
    await this.convertToPDF(xml, pdfPath);

    return pdfPath;
  }
}
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

// Parallel PDF generation
const pLimit = require('p-limit');

async function generatePDFsInParallel(xmlFiles, concurrency = 4) {
  const limit = pLimit(concurrency);

  console.log(`\nGenerating ${xmlFiles.length} PDFs (${concurrency} at a time)...\n`);

  const startTime = Date.now();

  const promises = xmlFiles.map((xmlFile, index) =>
    limit(async () => {
      const outputPath = xmlFile.replace('.xml', '.pdf');
      await converter.convertToPDF(xmlFile, 'template.html', outputPath);
      console.log(`[${index + 1}/${xmlFiles.length}] ✓ ${outputPath}`);
    })
  );

  await Promise.all(promises);

  const duration = Date.now() - startTime;
  const rate = (xmlFiles.length / (duration / 1000)).toFixed(2);

  console.log(`\n✓ All PDFs generated in ${duration}ms`);
  console.log(`  Rate: ${rate} PDFs/second\n`);
}

// Typical performance:
// Single-threaded: 2-3 PDFs/second
// 4 parallel: 8-10 PDFs/second
// 8 parallel: 15-18 PDFs/second
Enter fullscreen mode Exit fullscreen mode

Testing PDF Generation

// Jest tests
const fs = require('fs').promises;

describe('XML to PDF Conversion', () => {
  test('generates valid PDF from XML', async () => {
    const pdfPath = 'test-output.pdf';

    await converter.convertToPDF('test.xml', 'template.html', pdfPath);

    expect(await fileExists(pdfPath)).toBe(true);

    const stats = await fs.stat(pdfPath);
    expect(stats.size).toBeGreaterThan(1000); // At least 1KB

    await fs.unlink(pdfPath);
  });

  test('handles invalid XML gracefully', async () => {
    const invalidXML = '<invoice><unclosed>';

    await expect(
      converter.parseXML(invalidXML)
    ).rejects.toThrow();
  });

  test('escapes XML special characters', () => {
    const input = '<script>alert("XSS")</script>';
    const escaped = escapeXML(input);

    expect(escaped).not.toContain('<script>');
    expect(escaped).toContain('&lt;script&gt;');
  });
});
Enter fullscreen mode Exit fullscreen mode

Conclusion: Automate Document Generation

XML to PDF conversion transforms tedious manual document creation into instant, scalable automation. Whether generating invoices, reports, or contracts, mastering this skill enables professional document workflows at any scale.

Consistent formatting (professional every time)

Dynamic data (real-time information)

Scalable (thousands per hour)

Template-based (easy updates)

Multi-format (PDF, HTML, print)

Cost-effective (eliminate manual labor)

Audit trail (automated records)

Integration-ready (APIs, webhooks)

Implementation Checklist:

[ ] Choose conversion approach (XSL-FO, HTML, or direct)
[ ] Design PDF templates
[ ] Implement XML parsing
[ ] Add error handling
[ ] Test with production data
[ ] Optimize for performance
[ ] Set up monitoring
[ ] Document templates
Enter fullscreen mode Exit fullscreen mode

The Bottom Line:
Manual document generation doesn't scale. XML to PDF automation transforms hours of work into seconds, eliminates errors, and enables professional document workflows at enterprise scale. The ROI is immediate—one month's labor savings typically pays for the entire implementation.


What documents are you generating? Share your automation wins in the comments!

webdev #pdf #xml #automation #nodejs

Top comments (0)