DEV Community

lu liu
lu liu

Posted on

Mastering Excel Document Properties: Extracting and Deleting with Java

Managing Excel document properties, or metadata, is crucial for effective data organization, ensuring compliance, and enhancing security. These hidden details, such as author, creation date, and custom tags, can significantly impact how information is tracked, searched, and protected within an organization. However, many users struggle with efficiently accessing or modifying these properties programmatically, leading to inconsistencies or potential data governance issues. This article will demystify the process, offering practical, Java-based solutions for extracting and deleting Excel document properties using the powerful Spire.XLS for Java library.


The Significance of Excel Document Properties

Excel document properties are pieces of information stored within an Excel file that describe the file itself, rather than its content. They are broadly categorized into two types: built-in properties and custom properties. Built-in properties are standard fields provided by Microsoft Excel, such as Author, Title, Subject, Keywords, Last Modified By, Creation Date, and Last Printed. Custom properties, on the other hand, are user-defined fields that allow for more specific metadata, like project codes, department names, or unique identifiers, to be associated with a workbook.

Managing these properties is vital for several reasons. They enhance searchability within document management systems, aid in version control by tracking modifications, and contribute to data integrity by providing a clear audit trail. Furthermore, they play a critical role in data privacy and compliance, as sensitive information embedded in properties might need to be removed before sharing documents externally.

Getting Started with Spire.XLS for Java

Spire.XLS for Java is a professional Java API designed for creating, reading, writing, and converting Excel files without relying on Microsoft Office. It offers extensive functionalities for Excel manipulation, making it an ideal choice for programmatic management of document properties. Its robust feature set and ease of use empower developers to handle complex Excel tasks efficiently within their Java applications.

To integrate Spire.XLS for Java into your project, you'll need to add its dependency. Below are instructions for Maven:

For Maven projects, add the following to your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.xls</artifactId>
        <version>15.12.15</version>
    </dependency>
</dependencies>
Enter fullscreen mode Exit fullscreen mode

Ensure you replace 15.12.15 with the latest version available. Once added, you can begin leveraging Spire.XLS for Java's capabilities to manage Excel properties.


Retrieving Built-in and Custom Excel Document Properties

Extracting Excel properties is a fundamental step in understanding the metadata associated with a workbook. These properties typically fall into two categories: built-in document properties and custom document properties.

Spire.XLS for Java provides straightforward APIs for retrieving both types, making it easy to access standard metadata as well as application-specific information stored in the workbook.

Retrieving Built-in Properties

Built-in properties contain standard metadata automatically maintained by Excel, such as the document title, author, subject, and creation date. These properties are commonly used for document management, auditing, and version tracking.

Using Spire.XLS for Java, you can easily retrieve built-in document properties to gain insight into the basic information of an Excel file.

Retrieving Custom Document Properties

Custom document properties allow developers and users to store additional, user-defined metadata in an Excel workbook. These properties are especially useful for business scenarios where extra information—such as project IDs, workflow status, department names, or internal tags—needs to be embedded directly into the file.

Spire.XLS for Java also supports accessing custom document properties, enabling you to read metadata that is not covered by Excel’s built-in fields. This makes it convenient to integrate Excel files into enterprise systems, automate document classification, or implement custom data tracking logic. Here’s how to extract them using Spire.XLS for Java:

  1. Load the Excel workbook: Create a Workbook object and load your Excel file.
  2. Access the built-in properties: Use workbook.getDocumentProperties() to get the built-in document properties. And use workbook.getCustomDocumentProperties() to get customedocument properties.
  3. Use append() method to add these document properties to a list.
  4. Save the extracted document properties as a text file.
import com.spire.xls.Workbook;
import com.spire.xls.collections.BuiltInDocumentProperties;
import com.spire.xls.core.ICustomDocumentProperties;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class ReadStandardDocumentProperties {
    public static void main(String[] args) throws IOException {
        //Initialize an instance of the Workbook class.
        Workbook workbook = new Workbook();
        //Load an Excel file
        workbook.loadFromFile("Sample.xlsx");

        //Initialize an instance of the StringBuilder instance
        StringBuilder sb = new StringBuilder();

        //Get the collection of all standard document properties
        BuiltInDocumentProperties standardProperties = workbook.getDocumentProperties();

        //Get specific standard document properties
        String title = standardProperties.getTitle();
        String subject = standardProperties.getSubject();
        String author = standardProperties.getAuthor();
        String keywords = standardProperties.getKeywords();
        String manager = standardProperties.getManager();
        String company = standardProperties.getCompany();
        String category = standardProperties.getCategory();
        String comments = standardProperties.getComments();

        //Append the standard document properties to the StringBuilder instance
        sb.append("Standard Document properties:"
                +"\r\nTitle: " + title
                + "\r\nSubject: " + subject
                + "\r\nAuthor: " + author
                + "\r\nKeywords: "+ keywords
                + "\r\nManager: " + manager.toString()
                + "\r\nCompany: " + company.toString()
                + "\r\nCategory: " + category.toString()
                + "\r\nComments: " + comments.toString()
        );

        sb.append("\r\n\nCustom Document Properties:");

        //Get the collection of all custom document properties
        ICustomDocumentProperties customProperties = workbook.getCustomDocumentProperties();
        //Iterate through the collection
        for(int i =0; i < customProperties.getCount(); i++)
        {
            //Append the name and value of each custom document property to the StringBuilder instance
            sb.append("\r\n" + customProperties.get(i).getName() + ": " + customProperties.get(i).getValue());
        }

        //Write the content of the StringBuilder instance into a text file
        String output = "ReadDocumentProperties.txt";
        FileWriter fw = new FileWriter(output, true);
        BufferedWriter bw = new BufferedWriter(fw);
        bw.append(sb);
        bw.close();
        fw.close();
        workbook.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

Deleting Built-in and Custom Excel Document Properties

In certain scenarios, such as preparing files for public distribution or anonymizing data, it becomes necessary to delete Excel properties. This includes not only standard metadata but also any user-defined information stored in the workbook.

Spire.XLS for Java facilitates this process by allowing you to clear built-in properties and remove custom document properties programmatically.

Removing Built-in Properties

While Spire.XLS for Java doesn't offer a direct "delete" method for built-in properties in the same way it does for custom properties (as built-in fields are intrinsic to the Excel file structure), you can effectively clear their values. This approach removes the information associated with these fields, which is sufficient for anonymization or metadata cleanup purposes.

Removing Custom Document Properties

Custom document properties are user-defined metadata and can be explicitly removed from an Excel workbook. Spire.XLS for Java provides direct methods to delete these custom properties, making it easy to eliminate business-specific or sensitive information that may have been added for internal tracking or workflow purposes.

The following code example demonstrates how to clear built-in document properties and remove custom document properties from an Excel file using Spire.XLS for Java.

  1. Load the Excel workbook.
  2. Access built-in properties and costum properties.
  3. Set property values to empty or null: Assign an empty string ("") to the built-in properties you wish to clear.
  4. Use the customProperties.remove() method to remove custom properties.
  5. Save the workbook: Save the changes to a new file or overwrite the existing one.
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.collections.BuiltInDocumentProperties;
import com.spire.xls.core.ICustomDocumentProperties;

public class DeleteDocumentProperties {
    public static void main(String[] args) {
        //Initialize an instance of the Workbook class.
        Workbook workbook = new Workbook();
        //Load an Excel file
        workbook.loadFromFile("Sample.xlsx");

        //Get the collection of all standard document properties
        BuiltInDocumentProperties standardProperties = workbook.getDocumentProperties();

        //Set the value of each standard document property as empty
        standardProperties.setTitle("");
        standardProperties.setSubject("");
        standardProperties.setAuthor("");
        standardProperties.setManager("");
        standardProperties.setCompany("");
        standardProperties.setCategory("");
        standardProperties.setKeywords("");
        standardProperties.setComments("");

        //Get the collection of all custom document properties
        ICustomDocumentProperties customProperties = workbook.getCustomDocumentProperties();
        //Iterate through the collection
        for(int i = customProperties.getCount() - 1; i >= 0; i--)
        {
            //Delete each custom document property from the collection by its name
            customProperties.remove(customProperties.get(i).getName());
        }

        //Save the result file
        workbook.saveToFile("DeleteDocumentProperties.xlsx", ExcelVersion.Version2016);
        workbook.dispose();
    }
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

Effectively managing Excel metadata is a vital skill for any developer working with Java Excel applications. This tutorial has demonstrated how Spire.XLS for Java provides robust and intuitive methods to both extract Excel properties and delete Excel properties, encompassing both built-in and custom types. By leveraging these capabilities, you can ensure your Excel documents maintain data consistency, adhere to privacy standards, and are properly categorized for efficient retrieval. The library's ease of integration and comprehensive features make it an indispensable tool for programmatic Excel manipulation, reinforcing data integrity and streamlining document workflows for diverse use cases.

Top comments (0)