DEV Community

lu liu
lu liu

Posted on

Set or Read PDF Properties in Java [Automation Guide]

Managing PDF document properties, often referred to as metadata, is a crucial aspect of document management in many applications. These properties, such as title, author, subject, and keywords, provide essential information for indexing, searching, and organizing documents. Manually updating these details for numerous PDFs can be a laborious and error-prone process. Fortunately, automation offers a robust solution, allowing developers to programmatically set and retrieve these properties with ease. This tutorial will guide you through the process of programmatically managing PDF properties using Java, specifically focusing on the capabilities of the Spire.PDF for Java library.

Introduction to Spire.PDF for Java and Installation

Spire.PDF for Java is a comprehensive library designed for creating, reading, editing, and converting PDF documents in Java applications. It offers a wide array of features for PDF manipulation, including handling document properties. To integrate Spire.PDF for Java into your project, you typically add it as a dependency in your build tool. For Maven users, the following snippet illustrates how to include the library:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.11.11</version>
    </dependency>
</dependencies>
Enter fullscreen mode Exit fullscreen mode

Setting PDF Document Properties in Java

PDF properties encapsulate valuable metadata about a document, such as its title, author, subject, keywords, creator, producer, and creation/modification dates. Programmatically setting these properties ensures consistency, accuracy, and improved discoverability for your PDF files.

To set PDF properties using Spire.PDF for Java, follow these steps:

  • Load an existing PDF document or create a new one: You can either open an existing PDF file or instantiate a new PdfDocument object if you're creating a PDF from scratch.
  • Access the PdfDocumentInformation object: This object, accessible via PdfDocument.getDocumentInformation(), provides methods to interact with the document's metadata.
  • Use setter methods to modify properties: The PdfDocumentInformation object offers various setter methods like setAuthor(), setTitle(), setSubject(), setKeywords(), setCreator(), and setProducer(). You can also set custom properties using setCustomProperty().
  • Save the modified PDF: After setting the desired properties, save the document to apply the changes.

Here's a complete Java code example demonstrating how to set multiple standard and custom properties for a PDF document:

import com.spire.pdf.*;
import java.util.Date;

public class setPDFProperties {
    public static void main(String[] args) {
        //Create an object of PdfDocument
        PdfDocument pdfDocument = new PdfDocument();

        //Load a PDF document from disk
        pdfDocument.loadFromFile("D:/Samples/Sample.pdf");

        //Set the title
        pdfDocument.getDocumentInformation().setTitle("PDF(Portable Document Format)");

        //Set the author
        pdfDocument.getDocumentInformation().setAuthor("John");

        //Set the subject
        pdfDocument.getDocumentInformation().setSubject("Introduction of PDF");

        //Set the keywords
        pdfDocument.getDocumentInformation().setKeywords("PDF, document format");

        //Set the creation time
        pdfDocument.getDocumentInformation().setCreationDate(new Date());

        //Set the creator name
        pdfDocument.getDocumentInformation().setCreator("John");

        //Set the modification time
        pdfDocument.getDocumentInformation().setModificationDate(new Date());

        //Set the producer name
        pdfDocument.getDocumentInformation().setProducer("Spire.PDF for Java");

        //Save the document
        pdfDocument.saveToFile("output/setPDFProperties.pdf");
    }
}
Enter fullscreen mode Exit fullscreen mode

The table below summarizes common standard PDF properties and their data types:

Property Name Data Type Description
Title String The title of the document.
Author String The name of the person who created the document.
Subject String The subject of the document.
Keywords String Keywords associated with the document.
Creator String The name of the application that created the document.
Producer String The name of the application that converted the document to PDF.
Creation Date String The date and time the document was created.
Modification Date String The date and time the document was last modified.

Getting PDF Document Properties in Java

Retrieving PDF properties is equally important. It allows applications to read metadata for various purposes, such as content indexing, search engine optimization, document categorization, or auditing. For instance, a document management system might extract the author and creation date to display alongside the document thumbnail.

To get PDF properties using Spire.PDF for Java, follow these steps:

  • Load an existing PDF document: You must load an existing PDF file to read its properties.
  • Access the PdfDocumentInformation object: This object provides getter methods for retrieving the metadata.
  • Use getter methods to retrieve properties: Methods like getAuthor(), getTitle(), getSubject(), getKeywords(), getCreator(), getProducer(), getCreationDate(), and getModificationDate() are available. You can also retrieve custom properties using getCustomProperties() or getCustomProperty().
  • Handle potential null values: It's good practice to check for null values as not all properties may be set in every PDF.

Here's a complete Java code example demonstrating how to retrieve and print various properties from a PDF document:

import com.spire.pdf.*;
import java.io.*;

public class getPDFProperties {
    public static void main(String[] args) throws IOException {
        //Create an object of PdfDocument class
        PdfDocument pdf = new PdfDocument();

        //Load a PDF document from disk
        pdf.loadFromFile("D:/Samples/Sample.pdf");

        //Create a StringBuilder instance to store the values of document properties
        StringBuilder stringBuilder = new StringBuilder();

        //Retrieve property values and put them in the StringBuilder
        stringBuilder.append("Title: " + pdf.getDocumentInformation().getTitle() + "\r\n");
        stringBuilder.append("Author: " + pdf.getDocumentInformation().getAuthor() + "\r\n");
        stringBuilder.append("Subject: " + pdf.getDocumentInformation().getSubject() + "\r\n");
        stringBuilder.append("Keywords: " + pdf.getDocumentInformation().getKeywords() + "\r\n");
        stringBuilder.append("Creator: " + pdf.getDocumentInformation().getCreator() + "\r\n");
        stringBuilder.append("Creation Date: " + pdf.getDocumentInformation().getCreationDate() + "\r\n");
        stringBuilder.append("Producer: " + pdf.getDocumentInformation().getProducer() + "\r\n");

        //Create a new TXT file
        File file = new File("D:/output/getPDFProperties.txt");
        file.createNewFile();

        //Write the StringBuilder to the TXT file
        FileWriter fileWriter = new FileWriter(file, true);
        BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
        bufferedWriter.write(stringBuilder.toString());
        bufferedWriter.flush();
    }
}
Enter fullscreen mode Exit fullscreen mode

When retrieving properties, it's common to find that getCreationDate() and getModificationDate() return string representations of dates. You might need to parse these strings into java.time.LocalDateTime or java.util.Date objects for further manipulation. Always ensure the input PDF exists and is accessible to avoid FileNotFoundException or similar errors.

Conclusion

This tutorial has demonstrated the straightforward process of managing PDF document properties programmatically using Java and the Spire.PDF for Java library. By leveraging its PdfDocumentInformation object, developers can efficiently set and retrieve crucial metadata such as titles, authors, subjects, and keywords. This capability is invaluable for automating document workflows, ensuring metadata accuracy, and enhancing the overall discoverability and organization of PDF files. We encourage you to explore the extensive features of Spire.PDF for Java to further streamline your PDF manipulation tasks.

Top comments (0)