lu liu

Posted on Mar 11

Converting Excel Files to PDF Using Python

In data reporting and business document workflows, converting Excel spreadsheets to PDF is a fundamental operation. The PDF format preserves exact layout, prevents accidental modifications, and ensures consistent display across different devices and platforms. This article explores how to efficiently convert Excel files to PDF format using Python while maintaining formatting quality and controlling various conversion parameters.

Why Convert Excel to PDF

Excel spreadsheets are excellent for data analysis and editing, but they have limitations when it comes to distribution:

Layout Preservation: PDFs maintain exact cell formatting, column widths, and page布局 regardless of the viewer's system
Data Protection: PDFs are harder to accidentally modify, protecting sensitive calculations and data
Universal Viewing: Recipients can view PDFs without needing Excel or spreadsheet software
Print Ready: PDFs are optimized for printing with consistent page breaks and margins
Professional Presentation: PDFs provide a polished, final appearance for reports and statements

Automating Excel to PDF conversion with Python enables batch processing of financial reports, automated invoice generation, and integration into larger data pipelines.

Environment Setup

Before starting, you need to install a Python library that supports Excel operations. Spire.XLS for Python provides comprehensive APIs for handling XLSX format workbooks, including PDF conversion capabilities.

pip install Spire.XLS

Once installed, import the relevant modules in your Python script:

from spire.xls import *
from spire.xls.common import *

Basic Conversion Process

The core steps for converting an Excel file to PDF are straightforward: create a workbook object, load the Excel file, and save as PDF. Here's a minimal working example:

from spire.xls import *
from spire.xls.common import *

# Define input and output paths
inputFile = "spreadsheet.xlsx"
outputFile = "output.pdf"

# Create a workbook object
workbook = Workbook()

# Load the Excel file from disk
workbook.LoadFromFile(inputFile)

# Configure sheet fitting for better PDF output
workbook.ConverterSetting.SheetFitToPage = True

# Save as PDF format
workbook.SaveToFile(outputFile, FileFormat.PDF)

# Release resources
workbook.Dispose()

This code demonstrates the basic conversion flow. The Workbook object handles loading and managing the Excel file, while SaveToFile() with FileFormat.PDF specifies PDF as the output format. The SheetFitToPage setting ensures that worksheets are properly scaled to fit within PDF pages.

PDF/A Compliance for Archiving

For long-term document archiving, PDF/A is an ISO-standardized version of PDF designed for digital preservation. Converting Excel to PDF/A ensures documents remain accessible decades into the future:

from spire.xls import *
from spire.xls.common import *

inputFile = "financial_data.xlsx"
outputFile = "archival_copy.pdf"

workbook = Workbook()
workbook.LoadFromFile(inputFile)

# Set PDF conformance level to PDF/A-1B
workbook.ConverterSetting.PdfConformanceLevel = PdfConformanceLevel.Pdf_A1B

# Save as PDF/A compliant PDF
workbook.SaveToFile(outputFile, FileFormat.PDF)
workbook.Dispose()

The PdfConformanceLevel parameter offers several options:

Pdf_A1B: PDF/A-1 Level B compliance - suitable for most archival needs
Pdf_A2B: PDF/A-2 Level B - supports more modern features
Pdf_A3B: PDF/A-3 Level B - allows embedding arbitrary files

PDF/A compliance is essential for legal documents, financial records, and any content requiring long-term preservation.

Converting Specific Cell Ranges

Sometimes you only need to export a specific data range rather than entire worksheets. This is useful for creating summary reports or extracting key metrics:

from spire.xls import *
from spire.xls.common import *

inputFile = "quarterly_report.xlsx"
outputFile = "summary_section.pdf"

workbook = Workbook()
workbook.LoadFromFile(inputFile)

# Add a new worksheet for the selected range
workbook.Worksheets.Add("summary")

# Copy specific cell range to the new sheet
workbook.Worksheets[0].Range["A9:E15"].Copy(
    workbook.Worksheets[1].Range["A9:E15"], 
    False,  # Don't copy formulas
    True    # Copy formatting
)

# Auto-fit column widths for better appearance
workbook.Worksheets[1].Range["A9:E15"].AutoFitColumns()

# Save only the selected range to PDF
workbook.Worksheets[1].SaveToPdf(outputFile)
workbook.Dispose()

This approach is particularly useful when:

Creating executive summaries from detailed reports
Extracting specific data tables for presentations
Generating focused views of large datasets
Sharing selected portions without revealing complete worksheets

Converting Each Worksheet to Separate PDFs

Workbooks often contain multiple worksheets that may need to be distributed individually. You can automate this by converting each sheet to a separate PDF file:

from spire.xls import *
from spire.xls.common import *

inputFile = "multi_sheet_report.xlsx"

workbook = Workbook()
workbook.LoadFromFile(inputFile)

# Iterate through all worksheets
for sheet in workbook.Worksheets:
    # Generate filename from sheet name
    FileName = sheet.Name + ".pdf"

    # Save individual sheet to PDF
    sheet.SaveToPdf(FileName)
    print("Exported: {0}".format(FileName))

workbook.Dispose()

This technique is valuable for:

Distributing department-specific sheets from a consolidated report
Creating individual invoices from a batch workbook
Generating separate product catalogs from a master dataset
Automating report distribution to multiple stakeholders

Fitting Content to Page Width

Excel worksheets often span multiple columns that may not fit well on standard page sizes. You can configure page setup to force content to fit within specified dimensions:

from spire.xls import *
from spire.xls.common import *

inputFile = "wide_dataset.xlsx"
outputFile = "fitted_report.pdf"

workbook = Workbook()
workbook.LoadFromFile(inputFile)

# Configure each worksheet
for sheet in workbook.Worksheets:
    # Fit to one page wide (no horizontal scaling limit)
    sheet.PageSetup.FitToPagesWide = 1

    # No vertical scaling limit (allow multiple pages tall)
    sheet.PageSetup.FitToPagesTall = 0

# Save with applied page setup
workbook.SaveToFile(outputFile, FileFormat.PDF)
workbook.Dispose()

The page setup parameters work as follows:

FitToPagesWide = 1: Forces all columns to fit within one page width
FitToPagesTall = 0: Allows unlimited pages vertically (no vertical scaling)
FitToPagesTall = 1: Would force entire sheet onto single page

This configuration ensures that wide spreadsheets remain readable without excessive horizontal scrolling or tiny font sizes.

Batch Converting Multiple Excel Files

In production environments, you often need to convert many Excel files simultaneously. A batch conversion function streamlines this process:

import os
from spire.xls import *
from spire.xls.common import *

def batch_convert_excel_to_pdf(input_folder, output_folder, pdf_a_compliant=False):
    """Convert all Excel files in a folder to PDF"""

    # Ensure output directory exists
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Supported Excel formats
    excel_extensions = ['.xlsx', '.xls', '.xlsm', '.xltx']

    # Process all Excel files
    for filename in os.listdir(input_folder):
        if any(filename.lower().endswith(ext) for ext in excel_extensions):
            input_path = os.path.join(input_folder, filename)
            base_name = os.path.splitext(filename)[0]
            output_path = os.path.join(output_folder, base_name + '.pdf')

            # Convert current file
            workbook = Workbook()
            workbook.LoadFromFile(input_path)

            # Enable sheet fitting
            workbook.ConverterSetting.SheetFitToPage = True

            # Optionally enable PDF/A compliance
            if pdf_a_compliant:
                workbook.ConverterSetting.PdfConformanceLevel = PdfConformanceLevel.Pdf_A1B

            workbook.SaveToFile(output_path, FileFormat.PDF)
            workbook.Dispose()

            print("Converted: {0} -> {1}".format(filename, base_name + '.pdf'))

# Usage example
batch_convert_excel_to_pdf("input_spreadsheets", "output_pdfs", pdf_a_compliant=True)

This batch conversion function provides:

Automatic output directory creation
Support for multiple Excel formats (XLSX, XLS, XLSM, XLTX)
Optional PDF/A compliance for archival
Progress reporting for monitoring large batches

Handling Different Excel Formats

Spire.XLS supports converting various Excel file formats to PDF:

from spire.xls import *

workbook = Workbook()

# Convert XLSX (Excel 2007+)
workbook.LoadFromFile("modern_file.xlsx")
workbook.SaveToFile("output.pdf", FileFormat.PDF)
workbook.Dispose()

# Convert XLS (Excel 97-2003)
workbook = Workbook()
workbook.LoadFromFile("legacy_file.xls")
workbook.SaveToFile("output.pdf", FileFormat.PDF)
workbook.Dispose()

# Convert XLSM (Macro-enabled)
workbook = Workbook()
workbook.LoadFromFile("macro_file.xlsm")
workbook.SaveToFile("output.pdf", FileFormat.PDF)
workbook.Dispose()

# Convert CSV
workbook = Workbook()
workbook.LoadFromFile("data.csv")
workbook.SaveToFile("output.pdf", FileFormat.PDF)
workbook.Dispose()

Regardless of the input format, the output PDF maintains consistent quality and formatting.

Practical Example: Automated Report Distribution System

Combining these techniques, you can build an automated report distribution system:

import os
from datetime import datetime
from spire.xls import *
from spire.xls.common import *

class ExcelReportDistributor:
    def __init__(self, report_root="reports"):
        self.report_root = report_root
        if not os.path.exists(report_root):
            os.makedirs(report_root)

    def generate_monthly_report(self, excel_file, department=None):
        """Generate PDF report from Excel file"""

        # Create department subdirectory
        if department:
            dept_dir = os.path.join(self.report_root, department)
            if not os.path.exists(dept_dir):
                os.makedirs(dept_dir)
        else:
            dept_dir = self.report_root

        # Generate timestamped filename
        timestamp = datetime.now().strftime("%Y%m_%H%M%S")
        base_name = os.path.splitext(os.path.basename(excel_file))[0]
        pdf_filename = "{0}_{1}.pdf".format(base_name, timestamp)
        pdf_path = os.path.join(dept_dir, pdf_filename)

        # Perform conversion
        workbook = Workbook()
        workbook.LoadFromFile(excel_file)

        # Optimize for PDF output
        workbook.ConverterSetting.SheetFitToPage = True

        # Apply page setup to all sheets
        for sheet in workbook.Worksheets:
            sheet.PageSetup.FitToPagesWide = 1
            sheet.PageSetup.FitToPagesTall = 0

        workbook.SaveToFile(pdf_path, FileFormat.PDF)
        workbook.Dispose()

        return pdf_path

    def distribute_workbook_sheets(self, excel_file, output_dir):
        """Extract and save each worksheet as separate PDF"""

        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        workbook = Workbook()
        workbook.LoadFromFile(excel_file)

        exported_files = []
        for sheet in workbook.Worksheets:
            pdf_filename = sheet.Name + ".pdf"
            pdf_path = os.path.join(output_dir, pdf_filename)

            sheet.SaveToPdf(pdf_path)
            exported_files.append(pdf_path)
            print("Exported sheet: {0}".format(pdf_path))

        workbook.Dispose()
        return exported_files

# Usage example
distributor = ExcelReportDistributor("monthly_reports")
report_pdf = distributor.generate_monthly_report(
    "sales_data.xlsx", 
    department="sales"
)
print("Report generated: {0}".format(report_pdf))

This distribution system provides:

Department-based organization
Timestamp tracking for version control
Flexible single-file or multi-sheet export
Automatic page optimization

Common Issues and Solutions

Issue 1: Content Cut Off in PDF

Adjust page setup to fit content properly:

sheet.PageSetup.FitToPagesWide = 1
sheet.PageSetup.FitToPagesTall = 0

Issue 2: Poor Print Quality

Ensure proper scaling and enable high-quality settings:

workbook.ConverterSetting.SheetFitToPage = True

Issue 3: Formulas Showing Errors

If copying ranges, decide whether to copy formulas or values:

# Copy with values only (no formulas)
range.Copy(target_range, False, True)

Issue 4: Large File Sizes

For archival purposes where size matters, consider standard PDF instead of PDF/A:

workbook.ConverterSetting.PdfConformanceLevel = PdfConformanceLevel.None

Summary

Converting Excel files to PDF is an essential skill for automated reporting and document management. Through this article, we've learned:

How to load and convert Excel files using the Workbook object
Creating PDF/A compliant documents for long-term archiving
Exporting specific cell ranges for focused reports
Converting individual worksheets to separate PDF files
Configuring page setup to fit content appropriately
Building batch conversion systems for production use
Handling multiple Excel file formats consistently

These techniques apply directly to financial reporting, automated invoice generation, business intelligence dashboards, and regulatory compliance workflows. After mastering basic conversion, you can explore advanced features like password protection, digital signatures, and custom watermarks to build comprehensive document automation solutions.

DEV Community