How to Convert HTML to PDF with Python Portable Tool

#python #automation #tutorial

Converting HTML documents to PDFs often becomes a frustrating bottleneck when you're trying to automate report generation. The standard approaches either require complex setups or break when moving between environments, making python html to pdf conversion feel unnecessarily complicated.

The Manual Way (And Why It Breaks)

Most developers start by manually printing webpages to PDF through browsers or using online converters. You open each webpage individually, navigate to print settings, select PDF destination, adjust margins, and hope the formatting stays consistent. When working with multiple reports or needing to automate this process, the manual approach becomes unsustainable. Tools like wkhtmltopdf require system installations, phantomjs needs node dependencies, and browser automation scripts break when updates change DOM structures. These solutions also fail when you need a portable python script that runs across different machines without installation overhead.

The Python Approach

Here's a basic implementation to demonstrate the core concepts:

import requests
from pathlib import Path
from weasyprint import HTML, CSS

def basic_html_to_pdf(input_source, output_path, page_size='A4'):
    """Basic HTML to PDF conversion function"""
    # Determine if input is URL or file path
    if input_source.startswith(('http://', 'https://')):
        response = requests.get(input_source)
        html_content = response.text
    else:
        html_file = Path(input_source)
        html_content = html_file.read_text()

    # Create PDF using WeasyPrint
    html_doc = HTML(string=html_content)
    css = CSS(string=f'@page {{ size: {page_size}; margin: 1cm; }}')

    # Generate output file
    output_file = Path(output_path)
    html_doc.write_pdf(output_file, stylesheets=[css])

    return str(output_file)

This code handles basic conversion from HTML files or URLs to PDF format with simple styling. However, it requires external dependencies like WeasyPrint and doesn't include advanced features like headers, footers, or proper error handling. A production solution would need additional complexity for margin controls, page numbering, and cross-platform compatibility.

What the Full Tool Handles

• Convert local HTML files to PDF - Process static HTML documents with proper formatting preservation
• Fetch and convert live URLs to PDF - Automatically retrieve web content and convert to portable documents

• Set custom page size and margins - Configure A3, A4, letter sizes with precise margin specifications
• Add headers and footers with page numbers - Include professional document elements automatically
• Run as a CLI tool or import as a Python library - Flexible integration options for different workflows

The full python html to pdf solution eliminates dependency management while providing enterprise-level features for document processing pipelines.

Running It

html_to_pdf convert --input report.html --output report.pdf --format A4

The command accepts various flags including --margin, --header, --footer for customization. Output appears in the specified location with preserved formatting from the source HTML.