In everyday document processing, the PDF attachment feature has a wide range of use cases. For instance, bundling a contract with supporting materials in a single PDF file, or attaching relevant data files and images to a technical document. While doing this manually is possible, it becomes a bottleneck when processing large volumes of documents.
Using Python, you can programmatically add, read, and remove PDF attachments efficiently, making it well-suited for integration into automated workflows.
Environment Setup
To work with PDF documents in Python, install the Spire.PDF library:
pip install Spire.PDF
This library provides a complete set of PDF document processing APIs, including attachment management.
Adding Attachments to a PDF Document
There are two common ways to add attachments to a PDF: as document-level attachments (similar to email attachments) and as attachment annotations displayed as clickable icons on the page.
Adding Document-Level Attachments
Document-level attachments appear in the PDF reader's attachment panel and are not directly visible on the page content:
from spire.pdf.common import *
from spire.pdf import *
# Load the PDF document
doc = PdfDocument()
doc.LoadFromFile("input.pdf")
# Create an attachment object with the file name
attachment = PdfAttachment("data.xlsx")
# Read the file data into the attachment
with open("data.xlsx", "rb") as f:
attachment.Data = f.read()
# Set the description and MIME type
attachment.Description = "Source data table"
attachment.MimeType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
# Add the attachment to the document
doc.Attachments.Add(attachment)
# Save the document
doc.SaveToFile("output.pdf")
doc.Close()
Key API notes:
-
PdfAttachment: Represents an attachment object; the constructor parameter is the display name of the attachment -
Dataproperty: Used to set the binary data of the attachment -
MimeTypeproperty: Specifies the MIME type so the reader can identify the file correctly -
Attachments.Add(): Adds the attachment to the document's attachment collection
Adding Attachment Annotations
Attachment annotations appear as icons (such as a paperclip or pushpin) on the PDF page, allowing readers to click and open them directly:
from spire.pdf.common import *
from spire.pdf import *
doc = PdfDocument()
doc.LoadFromFile("input.pdf")
page = doc.Pages[0]
# Read the file data to attach
with open("report.pdf", "rb") as f:
data = Stream(f.read())
# Create an attachment annotation
bounds = RectangleF(50.0, 100.0, 15.0, 15.0)
annotation = PdfAttachmentAnnotation(bounds, "report.pdf", data)
annotation.Color = PdfRGBColor(Color.get_Teal())
annotation.Icon = PdfAttachmentIcon.Paperclip
annotation.Text = "Attachment: Detailed Report"
# Add it to the page
page.AnnotationsWidget.Add(annotation)
doc.SaveToFile("output.pdf")
doc.Close()
Attachment annotations support various icon styles, including Paperclip, PushPin, Graph, and Tag, allowing you to choose the most appropriate icon based on the content type.
Reading and Extracting Attachments
When you receive a PDF document containing attachments, you may need to inspect the attachment information or extract the files.
Retrieving Attachment Information
from spire.pdf.common import *
from spire.pdf import *
pdf = PdfDocument()
pdf.LoadFromFile("document_with_attachments.pdf")
# Get the attachment collection
collection = pdf.Attachments
if collection.Count > 0:
for i in range(collection.Count):
attachment = collection.get_Item(i)
print(f"File Name: {attachment.FileName}")
print(f"Description: {attachment.Description}")
if attachment.CreationDate:
print(f"Creation Date: {attachment.CreationDate}")
print("---")
pdf.Close()
Extracting Attachments to Local Files
from spire.pdf.common import *
from spire.pdf import *
pdf = PdfDocument()
pdf.LoadFromFile("document_with_attachments.pdf")
collection = pdf.Attachments
# Extract all attachments
for i in range(collection.Count):
attachment = collection.get_Item(i)
attachment.Data.Save(attachment.FileName)
# Or extract a single attachment (e.g., the second one)
attachment = collection.get_Item(1)
attachment.Data.Save(attachment.FileName)
pdf.Close()
The attachment.Data.Save() method saves the attachment data directly to a file, automatically using the original file name.
Deleting Attachments
When cleaning up or repackaging a document, you can remove attachments that are no longer needed:
from spire.pdf.common import *
from spire.pdf import *
doc = PdfDocument()
doc.LoadFromFile("document_with_attachments.pdf")
# Delete all attachments
doc.Attachments.Clear()
doc.SaveToFile("cleaned.pdf")
doc.Close()
If you need to remove only specific attachments, you can access them by index and handle them individually, or match them by file name.
Practical Tips
- Before adding attachments, make sure to open files in binary mode (
"rb"), otherwise data may become corrupted - Setting the correct MIME type helps PDF readers properly identify and handle the attachment content
- The
Flagsproperty of attachment annotations controls interaction behavior, such asReadOnlyfor read-only andLockedto prevent movement - Document-level attachments and attachment annotations can coexist without conflict
Conclusion
This article covered the basic methods of adding, extracting, and removing attachments in PDF documents using Python. These operations can be easily performed through the Spire.PDF API and are well-suited for integration into batch document processing workflows. Building on this foundation, you can combine these techniques with other PDF operations—such as page merging and text extraction—to create more complex document automation solutions.

Top comments (0)