DEV Community

lu liu
lu liu

Posted on

Mastering Word Document Properties in Java: A Spire.Doc Tutorial

Word documents are more than just text and images; they often contain valuable metadata known as document properties. These properties, including author, title, subject, and custom fields, are crucial for organization, searchability, and automated workflows. However, programmatically accessing and manipulating these properties in Java can be a challenge. This tutorial introduces Spire.Doc for Java as an efficient solution. You'll learn how to seamlessly read and delete both built-in and custom document properties, empowering you to better manage your Word files.


Streamlining Document Automation with Spire.Doc for Java

Spire.Doc for Java is a professional API designed to create, write, edit, convert, and print Word documents in Java applications without requiring Microsoft Word to be installed. It supports a wide range of Word features, from basic text manipulation to complex table and section management, and crucially, document property handling. Its robust capabilities make it an excellent choice for document automation tasks.

To integrate Spire.Doc into your Java project, you'll need to add its dependency. For Maven projects, include the following in your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.1.3</version>
    </dependency>
</dependencies>
Enter fullscreen mode Exit fullscreen mode

After adding the dependency, you can start utilizing Spire.Doc's functionalities in your Java code.


Accessing Document Metadata: Reading Properties

Word documents contain two main types of properties: built-in and custom. Built-in properties are standard fields like "Author," "Title," "Subject," and "Creation Date." Custom properties are user-defined fields that allow for more specific metadata, such as "Project Code" or "Document Status." Spire.Doc provides straightforward methods to access both.

The following code demonstrates how to load a Word document and then read both its built-in and custom properties. For this example, ensure you have a Sample.docx file with some built-in properties set and at least one custom property (e.g., "Project" with value "Alpha").

import com.spire.doc.BuiltinDocumentProperties;
import com.spire.doc.CustomDocumentProperties;
import com.spire.doc.Document;

public class GetDocumentProperties {
    public static void main(String[] args) {

        //Create an object of Document
        Document document = new Document();

        //Load a Word document
        document.loadFromFile("C:/Sample.docx");

        //Create an object of StringBuilder
        StringBuilder properties = new StringBuilder();

        //Get all the built-in properties and custom properties
        BuiltinDocumentProperties builtinDocumentProperties = document.getBuiltinDocumentProperties();
        CustomDocumentProperties customDocumentProperties = document.getCustomDocumentProperties();

        //Get each built-in property
        String title = builtinDocumentProperties.getTitle();
        String subject = builtinDocumentProperties.getSubject();
        String author = builtinDocumentProperties.getAuthor();
        String manager = builtinDocumentProperties.getManager();
        String category = builtinDocumentProperties.getCategory();
        String company = builtinDocumentProperties.getCompany();
        String keywords = builtinDocumentProperties.getKeywords();
        String comments = builtinDocumentProperties.getComments();

        //Set string format for displaying
        String builtinProperties = String.format("The built-in properties:\r\nTitle: " + title
                + "\r\nSubject: " + subject + "\r\nAuthor: " + author
                + "\r\nManager: " + manager + "\r\nCategory: " + category
                + "\r\nCompany: " + company + "\r\nKeywords: "+ keywords
                + "\r\nComments:" + comments
        );

        //Add the built-in properties to the StringBuilder object
        properties.append(builtinProperties);

        //Get each custom property
        properties.append("\r\n\r\nThe custom properties:");
        for (int i = 0; i < customDocumentProperties.getCount(); i++) {
            String customProperties = String.format("\r\n" + customDocumentProperties.get(i).getName() + ": " + document.getCustomDocumentProperties().get(i).getValue());

            //Add the custom properties to the StringBuilder object
            properties.append(customProperties);
        }

        //Output the properties of the document
        System.out.println(properties);
    }
}
Enter fullscreen mode Exit fullscreen mode

In this code, we first load the document. Then, document.getBuiltinDocumentProperties() provides access to standard properties, which can be retrieved using dedicated getter methods. For custom properties, document.getCustomDocumentProperties() returns a collection that can be iterated through to access each custom property's name and value.


Cleaning Up Document Metadata: Deleting Properties

There are various scenarios where deleting document properties becomes necessary, such as ensuring data privacy by removing sensitive author information, standardizing document metadata, or cleaning up outdated custom fields. Spire.Doc simplifies the process of removing both built-in and custom properties.

The following Java code demonstrates how to load a Word document, delete a specific built-in property, and then remove a custom property by its name. After deletion, the modified document is saved.

import com.spire.doc.BuiltinDocumentProperties;
import com.spire.doc.CustomDocumentProperties;
import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class RemoveDocumentProperties {
    public static void main(String[] args) {

        //Create an object of Document
        Document document = new Document();

        //Load a Word document
        document.loadFromFile("C:/Sample.docx");

        //Get all built-in properties and custom properties
        BuiltinDocumentProperties builtinDocumentProperties = document.getBuiltinDocumentProperties();
        CustomDocumentProperties customDocumentProperties = document.getCustomDocumentProperties();

        //Remove built-in properties by setting their value to empty
        builtinDocumentProperties.setTitle("");
        builtinDocumentProperties.setSubject("");
        builtinDocumentProperties.setAuthor("");
        builtinDocumentProperties.setManager("");
        builtinDocumentProperties.setCompany("");
        builtinDocumentProperties.setCategory("");
        builtinDocumentProperties.setKeywords("");
        builtinDocumentProperties.setComments("");

        //Get the count of custom properties
        int count = customDocumentProperties.getCount();

        //Loop through the custom properties to remove them
        for (int i = count; i > 0; i-- ){

            //Get the name of a custom property
            String name = customDocumentProperties.get(i-1).getName();

            //Remove the custom property by its name
            customDocumentProperties.remove(name);
        }

        //Save the document
        document.saveToFile("RemoveDocumentProperties.docx", FileFormat.Auto);
        document.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

For built-in properties, direct deletion isn't always possible in the same way as custom properties. Instead, you typically clear their values by setting them to null or an empty string. For custom properties, Spire.Doc offers a remove() method on the CustomDocumentProperties collection, allowing you to delete a property by its name. After making changes, it's crucial to save the document to persist the modifications.


Conclusion

In this tutorial, we've explored the straightforward process of managing Word document properties using Spire.Doc for Java. You've learned how to read both built-in and custom properties from an existing document, as well as how to effectively delete them. This capability is invaluable for enhancing document control, ensuring data privacy, and streamlining various document automation workflows. By leveraging Spire.Doc, Java developers can achieve granular control over Word document metadata, leading to more robust and efficient applications. We encourage you to further explore Spire.Doc's extensive features for broader document manipulation needs, continually improving your document management strategies.

Top comments (0)