Leon Davis

Posted on Jun 5

Convert PDF to PDF/A or PDF/A to PDF in Java: Complete Guide

#java #pdftopdfa #pdfatopdf

When working with enterprise documents, government archives, or files that need long-term preservation, you've probably heard of "PDF/A." Many people know about PDF, but aren't familiar with PDF/A—what exactly is it? Why convert to it? And how do you implement it in Java?

Today, let's explore this topic and share some practical development experiences with code examples.

Understanding PDF/A

PDF/A (Portable Document Format/Archive) is an archival version of PDF specifically designed for long-term preservation of electronic documents. Compared to regular PDF, it has these characteristics:

PDF/A Restrictions:

❌ No external dependencies except embedded fonts
❌ No JavaScript, audio, video, or other dynamic content
❌ No encryption (partially allowed in some levels)
❌ All colors must be explicitly defined (no device-dependent colors)

PDF/A Advantages:

✅ Ensures documents display correctly even decades later
✅ Self-contained, no external resource dependencies
✅ Complies with ISO standards (ISO 19005)
✅ Widely adopted by governments, courts, and archives

Common PDF/A Standards:

PDF/A-1 (2005): The earliest standard, based on PDF 1.4
PDF/A-2 (2011): Supports transparency effects and JPEG 2000 compression
PDF/A-3 (2012): Allows embedding arbitrary file formats (XML, CSV, etc.)

Each standard has two conformance levels:

Level A (Accessible): Preserves structural information for accessibility
Level B (Basic): Guarantees consistent visual rendering only

Environment Setup

Maven Dependency

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>12.6.1</version>
    </dependency>
</dependencies>

Gradle Configuration

repositories {
    maven {
        url 'https://repo.e-iceblue.cn/repository/maven-public/'
    }
}

dependencies {
    implementation 'e-iceblue:spire.pdf:12.6.1'
}

1. Basic Conversion: PDF to Various PDF/A Formats

The most straightforward approach—using the PdfStandardsConverter class:

import com.spire.pdf.conversion.PdfStandardsConverter;

public class BasicPdfToPdfA {
    public static void main(String[] args) {
        // Create converter instance
        PdfStandardsConverter converter = new PdfStandardsConverter("sample.pdf");

        // Convert to different PDF/A levels
        converter.toPdfA1A("output/PdfA1A.pdf");   // PDF/A-1A
        converter.toPdfA1B("output/PdfA1B.pdf");   // PDF/A-1B
        converter.toPdfA2A("output/PdfA2A.pdf");   // PDF/A-2A
        converter.toPdfA2B("output/PdfA2B.pdf");   // PDF/A-2B
        converter.toPdfA3A("output/PdfA3A.pdf");   // PDF/A-3A
        converter.toPdfA3B("output/PdfA3B.pdf");   // PDF/A-3B

        System.out.println("Conversion complete!");
    }
}

That's it! One line of code completes each format conversion.

How to Choose PDF/A Level?

Format	Use Case	Characteristics
PDF/A-1B	General archiving	Best compatibility, most conservative
PDF/A-2B	Modern documents	Supports transparency and layers
PDF/A-3B	Data embedding	Can embed XML, Excel, and other attachments
Level A	Accessibility needs	Preserves tag structure for disabled users
Level B	General purpose	Only guarantees visual consistency

Practical Recommendations:

Government/legal documents → PDF/A-1B (strictest)
Enterprise internal archiving → PDF/A-2B (balance of compatibility and features)
Need to embed data → PDF/A-3B (highest flexibility)

2. Handling Encrypted PDFs

If the source PDF is password-protected, decrypt it first before conversion:

import com.spire.pdf.conversion.PdfStandardsConverter;

public class EncryptedPdfToPdfA {
    public static void main(String[] args) {
        String inputFile = "data/encrypted.pdf";
        String password = "your_password";

        // Pass password when creating converter
        PdfStandardsConverter converter = new PdfStandardsConverter(inputFile, password);

        // Convert to PDF/A-2A
        converter.toPdfA2A("output/decrypted_pdfa.pdf");

        System.out.println("Encrypted PDF conversion complete!");
    }
}

Important Notes:

The converted PDF/A file is no longer encrypted (PDF/A standard restriction)
If you need to re-encrypt after archiving, handle it separately

3. Preserving Metadata

By default, conversion might lose some metadata. To preserve it, configure as follows:

import com.spire.pdf.conversion.PdfStandardsConverter;

public class PdfToPdfAWithMetadata {
    public static void main(String[] args) {
        String input = "data/document_with_metadata.pdf";
        String output = "output/pdfa_with_metadata.pdf";

        PdfStandardsConverter converter = new PdfStandardsConverter(input);

        // Key setting: preserve allowed metadata
        converter.getOptions().setPreserveAllowedMetadata(true);

        // Execute conversion
        converter.toPdfA1A(output);

        System.out.println("Conversion complete, metadata preserved!");
    }
}

Which Metadata Is Preserved?

✅ Title, author, subject, keywords
✅ Creation date, modification date
✅ PDF/A compliance information
❌ Some custom properties (if they don't comply with PDF/A standards)

4. Converting PDF/A Back to Regular PDF

Sometimes you need the reverse operation—converting PDF/A back to regular PDF (e.g., to add interactive features):

import com.spire.pdf.*;
import com.spire.pdf.graphics.PdfMargins;
import java.awt.geom.Dimension2D;

public class PdfAToPdf {
    public static void main(String[] args) {
        String input = "data/sample_pdfa.pdf";
        String output = "output/regular_pdf.pdf";

        // Load PDF/A file
        PdfDocument doc = new PdfDocument();
        doc.loadFromFile(input);

        // Create new document (non-PDF/A)
        PdfNewDocument newDoc = new PdfNewDocument();
        newDoc.setCompressionLevel(PdfCompressionLevel.None);

        // Copy content page by page
        for (PdfPageBase page : (Iterable<PdfPageBase>) doc.getPages()) {
            Dimension2D size = page.getSize();
            PdfPageBase p = newDoc.getPages().add(size, new PdfMargins(0));

            // Draw page content using template
            page.createTemplate().draw(p, 0, 0);
        }

        // Save as regular PDF
        newDoc.save(output);

        // Release resources
        newDoc.close();
        newDoc.dispose();
        doc.close();
        doc.dispose();

        System.out.println("PDF/A to PDF conversion complete!");
    }
}

Core Approach:

Load the PDF/A document
Create a new regular PDF document
Copy content page by page (via templates)
Save as new file

Use Cases:

Need to add JavaScript interactivity
Want to embed multimedia content
Remove PDF/A restrictions for editing

5. Creating PDF/A with Attachments

PDF/A-3 allows embedding arbitrary files as attachments, which is very useful in archiving scenarios:

import com.spire.pdf.*;
import com.spire.pdf.attachments.PdfAttachment;
import com.spire.pdf.graphics.PdfMargins;
import java.awt.geom.Dimension2D;
import java.io.*;

public class PdfAWithAttachments {
    public static void main(String[] args) throws IOException {
        String input = "data/report.pdf";
        String output = "output/report_with_attachments.pdfa";

        // Load source PDF
        PdfDocument doc = new PdfDocument();
        doc.loadFromFile(input);

        // Create PDF/A-3B document
        PdfNewDocument newDoc = new PdfNewDocument();
        newDoc.setConformance(PdfConformanceLevel.Pdf_A_3_B);

        // Copy page content
        for (PdfPageBase page : (Iterable<PdfPageBase>) doc.getPages()) {
            Dimension2D size = page.getSize();
            PdfPageBase p = newDoc.getPages().add(size, new PdfMargins(0));
            page.createTemplate().draw(p, 0, 0);
        }

        // Read attachment data
        byte[] excelData = readBytesFromFile("data/raw_data.xlsx");
        byte[] xmlData = readBytesFromFile("data/metadata.xml");

        // Create attachment objects
        PdfAttachment attach1 = new PdfAttachment("raw_data.xlsx", excelData);
        PdfAttachment attach2 = new PdfAttachment("metadata.xml", xmlData);

        // Add attachments
        newDoc.getAttachments().add(attach1);
        newDoc.getAttachments().add(attach2);

        // Save
        newDoc.save(output, FileFormat.PDF);

        // Release resources
        doc.close();
        doc.dispose();
        newDoc.close();
        newDoc.dispose();

        System.out.println("PDF/A-3B created with 2 attachments!");
    }

    private static byte[] readBytesFromFile(String filePath) throws IOException {
        FileInputStream input = new FileInputStream(filePath);
        byte[] data = new byte[input.available()];
        input.read(data);
        input.close();
        return data;
    }
}

Typical Use Cases:

Financial reports + raw Excel data
Academic papers + research datasets
Contract documents + signing record XML
Technical documentation + source code packages

6. Practical Example: Batch Conversion Tool

In real projects, you often need to process files in batch. Here's a complete utility class:

import com.spire.pdf.conversion.PdfStandardsConverter;
import java.io.File;
import java.util.ArrayList;
import java.util.List;

public class BatchPdfToPdfAConverter {

    /**
     * Batch convert all PDFs in a folder to PDF/A
     * 
     * @param inputDir Input folder path
     * @param outputDir Output folder path
     * @param pdfALevel PDF/A level (e.g., "1B", "2B", "3B")
     */
    public static void batchConvert(String inputDir, String outputDir, String pdfALevel) {
        File dir = new File(inputDir);

        if (!dir.exists() || !dir.isDirectory()) {
            System.err.println("Error: Input directory does not exist - " + inputDir);
            return;
        }

        // Create output directory
        new File(outputDir).mkdirs();

        // Get all PDF files
        File[] pdfFiles = dir.listFiles((d, name) -> 
            name.toLowerCase().endsWith(".pdf") && !name.toLowerCase().contains("pdfa")
        );

        if (pdfFiles == null || pdfFiles.length == 0) {
            System.out.println("No PDF files found");
            return;
        }

        int successCount = 0;
        int failCount = 0;
        List<String> errors = new ArrayList<>();

        System.out.println("Starting batch conversion, total " + pdfFiles.length + " files...\n");

        for (File pdfFile : pdfFiles) {
            try {
                String outputFileName = pdfFile.getName().replace(".pdf", "_PDFA-" + pdfALevel + ".pdf");
                String outputPath = outputDir + File.separator + outputFileName;

                PdfStandardsConverter converter = new PdfStandardsConverter(pdfFile.getAbsolutePath());

                // Convert according to specified level
                switch (pdfALevel.toUpperCase()) {
                    case "1A":
                        converter.toPdfA1A(outputPath);
                        break;
                    case "1B":
                        converter.toPdfA1B(outputPath);
                        break;
                    case "2A":
                        converter.toPdfA2A(outputPath);
                        break;
                    case "2B":
                        converter.toPdfA2B(outputPath);
                        break;
                    case "3A":
                        converter.toPdfA3A(outputPath);
                        break;
                    case "3B":
                        converter.toPdfA3B(outputPath);
                        break;
                    default:
                        throw new IllegalArgumentException("Unsupported PDF/A level: " + pdfALevel);
                }

                successCount++;
                System.out.println("✓ " + pdfFile.getName() + " -> " + outputFileName);

            } catch (Exception e) {
                failCount++;
                String errorMsg = pdfFile.getName() + ": " + e.getMessage();
                errors.add(errorMsg);
                System.err.println("✗ " + errorMsg);
            }
        }

        // Output statistics
        System.out.println("\n========== Conversion Complete ==========");
        System.out.println("Successful: " + successCount);
        System.out.println("Failed: " + failCount);

        if (!errors.isEmpty()) {
            System.out.println("\nError Details:");
            for (String error : errors) {
                System.out.println("  - " + error);
            }
        }
    }

    public static void main(String[] args) {
        // Batch convert to PDF/A-2B
        batchConvert("input/pdfs", "output/pdfa", "2B");
    }
}

Features:

✅ Automatically scans all PDFs in folder
✅ Supports all PDF/A levels
✅ Detailed progress feedback and error reporting
✅ Skips already converted files (filename doesn't contain "pdfa")
✅ Automatically creates output directory

7. Common Issues and Solutions

Issue 1: Conversion Fails Due to Font Problems

Cause: PDF uses fonts that are not embedded.

Solution:

// Spire.PDF automatically handles font embedding
// If still failing, check if source PDF is corrupted
PdfStandardsConverter converter = new PdfStandardsConverter(inputFile);
converter.getOptions().setDisableFontSubstitution(false); // Allow font substitution
converter.toPdfA1B(outputFile);

Issue 2: File Size Increases Dramatically After Conversion

Cause: PDF/A requires embedding all fonts and resources.

Optimization Suggestions:

// 1. Compress source PDF before conversion
// 2. Use more efficient compression algorithms
// 3. Remove unnecessary metadata

// For large files, consider batch processing
Runtime runtime = Runtime.getRuntime();
long freeMemory = runtime.freeMemory();
if (freeMemory < 100 * 1024 * 1024) { // Less than 100MB
    System.gc(); // Trigger garbage collection
}

Issue 3: How to Verify Generated PDF/A Compliance?

Method 1: Use Online Validation Tools

veraPDF - Open-source PDF/A validator
Adobe Acrobat Pro - Built-in validation feature

Method 2: Programmatic Validation (Requires Additional Library)

// Can use Apache PDFBox preflight module
// Or call third-party API for validation

Issue 4: Slow Conversion Speed

Optimization Strategies:

// 1. Process multiple files in parallel
ExecutorService executor = Executors.newFixedThreadPool(4);
for (File file : files) {
    executor.submit(() -> convertSingleFile(file));
}
executor.shutdown();

// 2. Use SSD storage to improve I/O speed
// 3. Increase JVM heap memory: -Xmx4g

8. Best Practices Summary

1. Choose the Right PDF/A Level

Legal/Government Documents → PDF/A-1B (strictest, best compatibility)
Enterprise Internal Archiving → PDF/A-2B (balance of features and compatibility)
Research Data Archiving → PDF/A-3B (can embed datasets)
Accessibility Requirements → Level A series (preserves structural information)

2. Resource Management

// Always release resources in finally block
PdfStandardsConverter converter = null;
try {
    converter = new PdfStandardsConverter(inputFile);
    converter.toPdfA1B(outputFile);
} finally {
    if (converter != null) {
        converter.dispose();
    }
}

3. Error Handling

try {
    converter.toPdfA1B(outputFile);
} catch (Exception e) {
    // Log detailed error information
    logger.error("PDF conversion failed: " + inputFile, e);

    // Provide user-friendly messages
    if (e.getMessage().contains("font")) {
        System.err.println("Font issue, please check if source PDF uses special fonts");
    } else if (e.getMessage().contains("corrupt")) {
        System.err.println("File corrupted, please regenerate source PDF");
    }
}

4. Performance Monitoring

long startTime = System.currentTimeMillis();

// Execute conversion
converter.toPdfA1B(outputFile);

long endTime = System.currentTimeMillis();
System.out.println("Conversion time: " + (endTime - startTime) + " ms");

// Monitor memory usage
Runtime runtime = Runtime.getRuntime();
long usedMemory = (runtime.totalMemory() - runtime.freeMemory()) / (1024 * 1024);
System.out.println("Memory usage: " + usedMemory + " MB");

9. Comparison with Alternative Solutions

Spire.PDF vs Apache PDFBox

Feature	Spire.PDF	Apache PDFBox
API Simplicity	✅ One-line conversion	⚠️ Requires multi-step operations
PDF/A Support	✅ Full support for all levels	⚠️ Partial support
Learning Curve	✅ Low	⚠️ Moderate
License	Commercial (free tier available)	Apache 2.0 (free)
Chinese Language Support	✅ Excellent	⚠️ Requires extra configuration
Technical Support	✅ Official support	Community support

Selection Advice:

Sufficient budget, need rapid development → Spire.PDF
Open-source project, limited budget → Apache PDFBox
Need enterprise-level support → Spire.PDF

Conclusion

Converting PDF to PDF/A is common in real-world projects, especially in scenarios requiring long-term archiving. With Spire.PDF for Java, the entire process becomes quite simple:

Key Takeaways:

✅ Use PdfStandardsConverter for conversion
✅ Choose appropriate PDF/A level based on requirements
✅ Pay attention to resource cleanup (call dispose())
✅ Handle encrypted files and metadata preservation
✅ Implement proper error handling and logging for batch processing

Practical Application Recommendations:

Test with small samples first to confirm conversion quality
Establish automated processes for regular document archiving
Keep both original PDF and converted PDF/A
Periodically verify PDF/A file compliance

Hope this article helps you better understand and implement PDF to PDF/A conversion. If you have specific questions, feel free to discuss in the comments!

Happy coding! 🚀

DEV Community