DEV Community

Shahzad Ashraf
Shahzad Ashraf

Posted on

Read and Extract DOCX Metadata in Java Apps

DOCX files often contain more than just visible text. They hold important metadata — such as author names, creation dates, modification records, and even custom properties that can be critical for tracking, auditing, or compliance processes. For Java developers, effectively extracting this data can significantly enhance application functionality and performance. That’s where the GroupDocs.Metadata Cloud Java SDK is valuable, as it provides an efficient method for reading metadata from DOCX files in Java without excessive complication.

By utilizing the Java Cloud SDK and its comprehensive REST API, developers can seamlessly integrate metadata extraction into their applications with minimal coding effort. There’s no need to struggle with intricate file parsing or concerns about format compatibility — the SDK takes care of everything. It offers precise results for both basic file attributes and advanced metadata fields. This is especially beneficial for document management systems, enterprise search solutions, or any Java-based applications where data accuracy and traceability are crucial.

Whether you're developing a content archiving system, automating compliance verifications, or simply improving your application's document-handling features, GroupDocs.Metadata Cloud for Java provides a robust yet user-friendly approach to accomplish these tasks. With support for a diverse array of document formats and cloud scalability, you can guarantee that your applications remain efficient, trustworthy, and prepared for the future. Begin incorporating DOCX metadata reading into your projects today by exploring our detailed article and elevate your Java projects with smarter document insights.

This code sample aids you in adding this capability to your Java projects:

package com.groupdocs;
import com.groupdocs.cloud.metadata.client.*;
import com.groupdocs.cloud.metadata.api.*;
import com.groupdocs.cloud.metadata.model.*;
import com.groupdocs.cloud.metadata.model.requests.*;

public class ReadMetadataFromDOCX {

    public static void main(String[] args) {

        // Step 1: Configure your API credentials
        String MyAppKey = "your-app-key";
        String MyAppSid = "your-app-sid";
        Configuration configuration = new Configuration(MyAppKey, MyAppSid);

        // Step 2: Initialize the Metadata API
        MetadataApi metadataApi = new MetadataApi(configuration);

        try {
            // Step 3: Add source file from the cloud storage
            FileInfo fileInfo = new FileInfo();
            fileInfo.setFilePath("SampleFiles/source.docx"); 

            // Step 4: Apply extraction options
            ExtractOptions options = new ExtractOptions();
            options.setFileInfo(fileInfo);

            // Step 5: Perform metadata extraction
            ExtractRequest request = new ExtractRequest(options);
            ExtractResult result = metadataApi.extract(request);

            // Step 6: Print simplified metadata tree
            System.out.println("DOCX Metadata Properties:");
            if (result.getMetadataTree() != null &&
                result.getMetadataTree().getInnerPackages() != null) {

                result.getMetadataTree().getInnerPackages().forEach(pkg -> {
                    pkg.getPackageProperties().forEach(prop -> {
                        System.out.println("- " + prop.getName() + ": " 
                                                                + prop.getValue());

                        if (prop.getTags() != null && !prop.getTags().isEmpty()) {
                            prop.getTags().forEach(tag -> System.out.println(
                                "  . Tag: " + tag.getName() +
                                " (" + tag.getCategory() + ")"
                            ));
                        }
                    });
                });

            } else {

                System.out.println("No metadata found in the DOCX file.");
            }

        } catch (Exception e) {
            System.err.println("Error extracting metadata: " + e.getMessage());
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)