In daily office automation scenarios, adding metadata to Word documents is an important means of improving document management efficiency. By setting document properties, we can record key information such as authors, titles, and keywords, facilitating subsequent retrieval, classification, and archiving. Additionally, obtaining document statistics (such as word count and character count) helps us better evaluate document content. This article introduces how to use Python to implement Word document property management and information statistics functionality.
Why Document Property Management is Needed
In practical work, document property management has the following important application scenarios:
- Document Archiving: Add metadata such as authors, titles, and keywords to documents for easier retrieval and management
- Compliance Requirements: Certain industries require recording metadata such as creators, company information, and managers in documents
- Workflow Management: Mark document status (such as draft, under review, approved) using custom properties
- Batch Processing: Uniformly set standardized properties for enterprise document libraries to improve management standardization
By processing these tasks programmatically, work efficiency can be significantly improved while reducing human errors.
Environment Setup
First, install the Spire.Doc for Python library:
pip install Spire.Doc
This library provides a complete Word document operation API, supporting document property settings, statistics retrieval, and other functions without requiring Microsoft Word installation.
Managing Built-in Document Properties
Word documents contain rich built-in properties (also known as metadata), such as title, author, subject, and company. This information is crucial for document classification, search, and management. It can be viewed in Windows File Explorer or accessed through Word's "File > Info > Properties" panel.
Setting and Reading Built-in Properties
The following code demonstrates how to set and read various built-in properties for a Word document:
from spire.doc import Document, FileFormat
# Load document
document = Document()
document.LoadFromFile("Report Template.docx")
# Set built-in document properties
document.BuiltinDocumentProperties.Title = "2024 Annual Financial Report"
document.BuiltinDocumentProperties.Subject = "Financial Analysis"
document.BuiltinDocumentProperties.Author = "Li Ming"
document.BuiltinDocumentProperties.Company = "ABC Technology Co., Ltd."
document.BuiltinDocumentProperties.Manager = "Manager Wang"
document.BuiltinDocumentProperties.Category = "Financial Report"
document.BuiltinDocumentProperties.Keywords = "finance, report, 2024, analysis"
document.BuiltinDocumentProperties.Comments = "This document contains the company's 2024 annual financial data analysis"
# Save document
document.SaveToFile("Report with Properties.docx", FileFormat.Docx)
document.Close()
# Read the set properties
print(f"Title: {document.BuiltinDocumentProperties.Title}")
print(f"Author: {document.BuiltinDocumentProperties.Author}")
print(f"Company: {document.BuiltinDocumentProperties.Company}")
print(f"Keywords: {document.BuiltinDocumentProperties.Keywords}")
Common built-in properties include: Title (document title), Subject (subject), Author (author), Company (company name), Manager (manager), Category (category), Keywords (keywords, separated by commas), Comments (remarks), etc. This is very useful when batch processing documents, allowing documents to be classified or filtered based on property values.
Managing Custom Document Properties
In addition to built-in properties, Spire.Doc also supports adding custom properties, which are very useful for specific business scenarios. For example, marking workflow status (draft, under review, approved), recording version numbers, adding project numbers, and other business-related information. Notably, the _MarkAsFinal property can be used to mark a document as final, indicating to users that it should not be modified further.
Setting and Reading Custom Properties
from spire.doc import Document, FileFormat
from spire.doc.common import Boolean
# Load document
document = Document()
document.LoadFromFile("Project Document.docx")
# Get custom properties collection
customProperties = document.CustomDocumentProperties
# Add custom properties
customProperties.Add("Project Name", "Intelligent Office System")
customProperties.Add("Version Number", "2.0")
customProperties.Add("Review Status", "Approved")
customProperties.Add("_MarkAsFinal", Boolean(True))
# Save document
document.SaveToFile("Document with Custom Properties.docx", FileFormat.Docx2013)
document.Close()
document.Dispose()
# Read custom properties
document2 = Document()
document2.LoadFromFile("Document with Custom Properties.docx")
for i in range(document2.CustomDocumentProperties.Count):
prop = document2.CustomDocumentProperties.get_Item(i)
print(f"{prop.Name}: {prop.Value}")
document2.Close()
Note that some custom properties may not be directly displayed in the Word interface but can be accessed through the API.
Retrieving Document Statistics
In certain scenarios, we need to obtain basic document statistics such as word count and character count. This is useful for evaluating document length, calculating translation workload, generating document summary reports, and other scenarios.
from spire.doc import Document
# Load document
document = Document()
document.LoadFromFile("Long Document.docx")
# Get document statistics
char_count = document.BuiltinDocumentProperties.CharCount
char_count_with_space = document.BuiltinDocumentProperties.CharCountWithSpace
word_count = document.BuiltinDocumentProperties.WordCount
print(f"Character count (excluding spaces): {char_count}")
print(f"Character count (including spaces): {char_count_with_space}")
print(f"Word count: {word_count}")
document.Close()
Practical Application Example: Batch Processing Document Properties
The following is a comprehensive example showing how to batch set unified properties for multiple documents:
from spire.doc import Document, FileFormat
import os
def set_document_properties(file_path, author, company, keywords):
"""Set unified properties for a document"""
doc = Document()
doc.LoadFromFile(file_path)
# Set properties
doc.BuiltinDocumentProperties.Author = author
doc.BuiltinDocumentProperties.Company = company
doc.BuiltinDocumentProperties.Keywords = keywords
# Save (overwrite original file)
doc.SaveToFile(file_path, FileFormat.Docx)
doc.Close()
print(f"Processed: {file_path}")
# Batch process all Word documents in a folder
folder_path = "./Documents to Process"
author = "Technical Department"
company = "XYZ Company"
keywords = "technical documentation, internal materials"
for filename in os.listdir(folder_path):
if filename.endswith(".docx"):
file_path = os.path.join(folder_path, filename)
set_document_properties(file_path, author, company, keywords)
print("Batch processing completed")
This example demonstrates how to apply document property management to actual workflows, especially suitable for enterprise document standardization scenarios.
Important Considerations
When using document property management features, please note the following points:
- Property Visibility: Some custom properties may not be directly displayed in the Word interface but can be accessed through the API
-
Resource Release: Be sure to call the
Close()andDispose()methods after processing documents to release resources - File Format: It is recommended to use Docx or Docx2013 format to ensure compatibility
- Large File Processing: For large documents, property operations may take a long time; it is advisable to execute them in background threads
Summary
This article introduced a complete solution for managing Word document properties and statistics using Python. Through these techniques, we can:
- Add rich metadata to documents to improve management efficiency
- Batch process document properties to achieve standardization
- Retrieve document statistics to support decision-making
- Mark document status and versions to optimize workflows
These features have wide applications in document management systems, collaborative office platforms, contract management, and other scenarios. Combined with other Word operation functions, complete document automation processing solutions can be built to significantly improve work efficiency.
With the development of office automation, mastering these programming skills will bring significant value to developers and enterprises. It is recommended to combine these basic functions according to specific business needs to create automated tools suitable for your own workflows.

Top comments (0)