Allen Yang

Posted on Jun 25

Adding and Managing Attachments in PDF Documents Using Python

#python #programming #pdf #attachment

In everyday document processing, the PDF attachment feature has a wide range of use cases. For instance, bundling a contract with supporting materials in a single PDF file, or attaching relevant data files and images to a technical document. While doing this manually is possible, it becomes a bottleneck when processing large volumes of documents.

Using Python, you can programmatically add, read, and remove PDF attachments efficiently, making it well-suited for integration into automated workflows.

Environment Setup

To work with PDF documents in Python, install the Spire.PDF library:

pip install Spire.PDF

This library provides a complete set of PDF document processing APIs, including attachment management.

Adding Attachments to a PDF Document

There are two common ways to add attachments to a PDF: as document-level attachments (similar to email attachments) and as attachment annotations displayed as clickable icons on the page.

Adding Document-Level Attachments

Document-level attachments appear in the PDF reader's attachment panel and are not directly visible on the page content:

from spire.pdf.common import *
from spire.pdf import *

# Load the PDF document
doc = PdfDocument()
doc.LoadFromFile("input.pdf")

# Create an attachment object with the file name
attachment = PdfAttachment("data.xlsx")

# Read the file data into the attachment
with open("data.xlsx", "rb") as f:
    attachment.Data = f.read()

# Set the description and MIME type
attachment.Description = "Source data table"
attachment.MimeType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

# Add the attachment to the document
doc.Attachments.Add(attachment)

# Save the document
doc.SaveToFile("output.pdf")
doc.Close()

Key API notes:

PdfAttachment: Represents an attachment object; the constructor parameter is the display name of the attachment
Data property: Used to set the binary data of the attachment
MimeType property: Specifies the MIME type so the reader can identify the file correctly
Attachments.Add(): Adds the attachment to the document's attachment collection

Adding Attachment Annotations

Attachment annotations appear as icons (such as a paperclip or pushpin) on the PDF page, allowing readers to click and open them directly:

from spire.pdf.common import *
from spire.pdf import *

doc = PdfDocument()
doc.LoadFromFile("input.pdf")
page = doc.Pages[0]

# Read the file data to attach
with open("report.pdf", "rb") as f:
    data = Stream(f.read())

# Create an attachment annotation
bounds = RectangleF(50.0, 100.0, 15.0, 15.0)
annotation = PdfAttachmentAnnotation(bounds, "report.pdf", data)
annotation.Color = PdfRGBColor(Color.get_Teal())
annotation.Icon = PdfAttachmentIcon.Paperclip
annotation.Text = "Attachment: Detailed Report"

# Add it to the page
page.AnnotationsWidget.Add(annotation)

doc.SaveToFile("output.pdf")
doc.Close()

Attachment annotations support various icon styles, including Paperclip, PushPin, Graph, and Tag, allowing you to choose the most appropriate icon based on the content type.

Reading and Extracting Attachments

When you receive a PDF document containing attachments, you may need to inspect the attachment information or extract the files.

Retrieving Attachment Information

from spire.pdf.common import *
from spire.pdf import *

pdf = PdfDocument()
pdf.LoadFromFile("document_with_attachments.pdf")

# Get the attachment collection
collection = pdf.Attachments

if collection.Count > 0:
    for i in range(collection.Count):
        attachment = collection.get_Item(i)
        print(f"File Name: {attachment.FileName}")
        print(f"Description: {attachment.Description}")
        if attachment.CreationDate:
            print(f"Creation Date: {attachment.CreationDate}")
        print("---")

pdf.Close()

Extracting Attachments to Local Files

from spire.pdf.common import *
from spire.pdf import *

pdf = PdfDocument()
pdf.LoadFromFile("document_with_attachments.pdf")

collection = pdf.Attachments

# Extract all attachments
for i in range(collection.Count):
    attachment = collection.get_Item(i)
    attachment.Data.Save(attachment.FileName)

# Or extract a single attachment (e.g., the second one)
attachment = collection.get_Item(1)
attachment.Data.Save(attachment.FileName)

pdf.Close()

The attachment.Data.Save() method saves the attachment data directly to a file, automatically using the original file name.

Deleting Attachments

When cleaning up or repackaging a document, you can remove attachments that are no longer needed:

from spire.pdf.common import *
from spire.pdf import *

doc = PdfDocument()
doc.LoadFromFile("document_with_attachments.pdf")

# Delete all attachments
doc.Attachments.Clear()

doc.SaveToFile("cleaned.pdf")
doc.Close()

If you need to remove only specific attachments, you can access them by index and handle them individually, or match them by file name.

Practical Tips

Before adding attachments, make sure to open files in binary mode ("rb"), otherwise data may become corrupted
Setting the correct MIME type helps PDF readers properly identify and handle the attachment content
The Flags property of attachment annotations controls interaction behavior, such as ReadOnly for read-only and Locked to prevent movement
Document-level attachments and attachment annotations can coexist without conflict

Conclusion

This article covered the basic methods of adding, extracting, and removing attachments in PDF documents using Python. These operations can be easily performed through the Spire.PDF API and are well-suited for integration into batch document processing workflows. Building on this foundation, you can combine these techniques with other PDF operations—such as page merging and text extraction—to create more complex document automation solutions.

DEV Community