DEV Community

jelizaveta
jelizaveta

Posted on

Easy Way to Compare PDF Files for Differences Using Python

When handling contracts, legal files, or technical documentation, multiple versions of the same PDF are often involved. Identifying what has changed between versions manually can be tedious and prone to mistakes.

Fortunately, Spire.PDF for Python makes it easy to detect and highlight differences between two PDF files automatically , using only a small amount of code.

This tutorial shows you how to compare PDFs step by step, including setup and optional configuration.

Install the Library

To begin, install the required package from PyPI:

pip install spire.pdf
Enter fullscreen mode Exit fullscreen mode

After installation, you can start comparing PDF documents right away.

Basic Example: Detect Differences Between Two PDFs

The example below compares an original document with an updated version and outputs a comparison file that visually marks the changes:

from spire.pdf.common import *
from spire.pdf import *

# Load the original PDF
original = PdfDocument("C:\\Users\\Administrator\\Desktop\\original.pdf")    

# Load the updated PDF
revised = PdfDocument("C:\\Users\\Administrator\\Desktop\\revised.pdf")  

# Initialize comparer
comparer = PdfComparer(original, revised)

# Generate comparison result
comparer.Compare("output/CompareResult.pdf") 

# Release resources
original.Dispose()
revised.Dispose()
Enter fullscreen mode Exit fullscreen mode

Open the resulting file in a PDF viewer (such as Adobe Acrobat), and you’ll see a side-by-side comparison. Removed content appears highlighted in red in the original file, while added content is marked in yellow in the revised version.

Advanced Options for Comparison

You can further control how the comparison works by adjusting settings before calling the Compare method.

Compare Text Only

If you want to ignore layout or graphical differences and focus purely on text changes:

comparer.PdfCompareOptions.OnlyCompareText = True
Enter fullscreen mode Exit fullscreen mode

Limit Comparison to Specific Pages

For large documents, you may only need to analyze certain sections. You can define page ranges like this:

comparer.PdfCompareOptions.SetPageRanges(1, 3, 1, 3)
# Parameters: (oldStartIndex, oldEndIndex, newStartIndex, newEndIndex)
Enter fullscreen mode Exit fullscreen mode

This allows you to compare only selected pages instead of the entire document.

Conclusion

Manually reviewing differences between PDF versions can be inefficient and error-prone. By using Spire.PDF for Python, you can quickly produce a clear visual comparison and identify changes with ease. This method is especially useful for contract reviews, document proofreading, and version tracking in professional workflows.

Top comments (0)