DEV Community

Leon Davis
Leon Davis

Posted on

Python PDF Compression: Reduce File Size Without Losing Quality

In daily office work, electronic document management, and file sharing, PDF files are widely used due to their fixed formatting and strong compatibility. However, as documents become richer with images, charts, and graphics, PDF files can grow quite large, reducing efficiency in uploading, sharing, and storage. How to reduce PDF file size while maintaining readability has become a common challenge in practical applications.

This article demonstrates how to compress PDF documents using Python, helping developers efficiently manage PDF files.

1. Why PDF Compression Is Necessary

  • Save storage space: Large PDFs occupy significant disk space, especially in enterprise document management systems, increasing storage costs.
  • Improve transfer efficiency: Large files may fail during email or online transfers. Compressing PDFs accelerates uploads and downloads.
  • Enhance user experience: Large files may cause lag when opening. Compressed PDFs are faster to read and browse.
  • Meet system limits: Some websites or systems restrict file size; compression ensures compliance.

2. Installation and Environment Setup

Before compressing PDFs with Python, a PDF processing library is needed. This guide uses Spire.PDF for Python, which allows direct manipulation of text, images, and fonts in PDFs without relying on third-party software. You can install it easily using pip:

pip install spire.pdf
Enter fullscreen mode Exit fullscreen mode

Once installed, import spire.pdf in your Python code to compress, merge, split, or encrypt PDFs.

3. Single PDF Compression Example

The following example shows how to load a single PDF and configure image compression settings:

from spire.pdf import *

# Load PDF file
compressor = PdfCompressor("C:/Users/Administrator/Documents/example.pdf")

# Get compression options
options = compressor.OptimizationOptions

# Enable image resizing
options.SetResizeImages(True)

# Enable image compression
options.SetIsCompressImage(True)

# Set image quality (Low, Medium, High)
options.SetImageQuality(ImageQuality.Medium)

# Compress and save
compressor.CompressToFile("compressed.pdf")
print("PDF compression completed: saved as compressed.pdf")
Enter fullscreen mode Exit fullscreen mode

Notes:

  • SetResizeImages(True): Resizes embedded images to reduce dimensions.
  • SetIsCompressImage(True): Enables image compression.
  • SetImageQuality(ImageQuality.Medium): Sets compression quality (Low, Medium, High).
  • CompressToFile: Compresses and saves to the specified path.

4. Font Compression Example

Embedded fonts can occupy considerable space. Compressing or unembedding fonts can significantly reduce PDF size:

from spire.pdf import *

# Load PDF file
compressor = PdfCompressor("C:/Users/Administrator/Documents/example.pdf")

# Get compression options
options = compressor.OptimizationOptions

# Enable font compression
options.SetIsCompressFonts(True)

# Optional: unembed fonts
# options.SetIsUnembedFonts(True)

# Compress and save
compressor.CompressToFile("compressed_fonts.pdf")
print("PDF font compression completed")
Enter fullscreen mode Exit fullscreen mode

Notes:

  • Enabling font compression reduces font storage size.
  • Unembedding fonts further reduces file size but may affect how the PDF displays on other systems.

5. Batch PDF Compression Example

To compress all PDFs in a folder, use Python’s os module in combination with Spire.PDF:

import os
from spire.pdf import *

input_folder = "C:/Users/Administrator/Documents/PDFs"
output_folder = "C:/Users/Administrator/Documents/CompressedPDFs"

os.makedirs(output_folder, exist_ok=True)

for filename in os.listdir(input_folder):
    if filename.endswith(".pdf"):
        input_path = os.path.join(input_folder, filename)
        output_path = os.path.join(output_folder, filename)

        compressor = PdfCompressor(input_path)
        options = compressor.OptimizationOptions
        options.SetResizeImages(True)
        options.SetIsCompressImage(True)
        options.SetImageQuality(ImageQuality.Medium)
        options.SetIsCompressFonts(True)

        compressor.CompressToFile(output_path)
        print(f"{filename} compressed successfully")
Enter fullscreen mode Exit fullscreen mode

Notes:

  • Automatically iterates through all PDFs in the folder.
  • Image and font compression settings can be applied consistently.
  • Output path is customizable for better file management.
  • For large volumes, multithreading can improve efficiency.

6. Compression Considerations

  • Balance compression and clarity: Over-compressing images may reduce readability; choose quality according to actual needs.
  • Font embedding: Unembedding fonts reduces file size but may affect display on other systems.
  • Backup originals: Keep original files before batch compression in case results are unsatisfactory.
  • Performance optimization: Use multithreading or batch processing when handling many files to avoid blocking.
  • Compatibility: Ensure PDFs are not encrypted or protected; otherwise, compression may fail.

7. Conclusion

Using Python, developers can efficiently compress PDF files without relying on additional software. Whether handling single files or performing batch processing, this can be achieved with just a few lines of code. Mastering these methods allows for significant reductions in PDF file size, improving transfer, storage, and viewing efficiency—a practical solution for document management and office automation.

Top comments (0)