IronSoftware

Posted on Dec 18

PDF Metadata in C# (.NET 10)

#c #dotnet

PDF metadata — Title, Author, Subject, Keywords, creation dates — enables document management systems to index, search, and organize files automatically. Without metadata, your PDF is a black box. With it, systems can route invoices to accounting, filter contracts by client, search thousands of technical documents by keyword, and enforce retention policies based on creation dates. I've built document workflows processing millions of PDFs where metadata made the difference between manual sorting and automated classification.

The problem is that many PDF libraries treat metadata as an afterthought. Stack Overflow has threads from 2012 recommending PDFSharp's PdfDocument.Info.Elements.Add() approach — functional but verbose, requiring manual key-value manipulation for every property. iTextSharp requires instantiating PdfReader, PdfStamper, and managing streams just to set a Title field. These Stack Overflow answers still rank highly despite being a decade old, locked by moderators, and promoting unnecessarily complex approaches for what should be simple property assignments.

IronPDF treats metadata as first-class properties. Access pdf.MetaData.Title, pdf.MetaData.Author, pdf.MetaData.Keywords directly. Set them like any C# property. Custom metadata uses a dictionary interface. Bulk operations use SetMetaDataDictionary(). This eliminates the boilerplate required by older libraries while supporting both standard PDF properties and custom extensions.

Understanding metadata structure prevents common mistakes. Standard metadata fields (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModifiedDate) are defined in the PDF specification and visible in Adobe Reader, Preview, and other PDF viewers. Custom metadata extends beyond these standard fields — useful for internal tracking (Department, ProjectID, ApprovalStatus) but not always visible in standard viewers. XMP metadata provides structured, extensible metadata using XML, required for certain compliance scenarios like PDF/A or archival systems.

using IronPdf;
// Install via NuGet: Install-Package IronPdf

var renderer = new [ChromePdfRenderer](https://ironpdf.com/blog/videos/how-to-render-webgl-sites-to-pdf-in-csharp-ironpdf/)();
var pdf = renderer.RenderHtmlAsPdf("<h1>Invoice #12345</h1>");

pdf.MetaData.Title = "Invoice 12345 - Acme Corp";
pdf.MetaData.Author = "Accounts Receivable";
pdf.MetaData.Subject = "Monthly Invoice";
pdf.MetaData.Keywords = "invoice, billing, acme, march-2025";
pdf.MetaData.CreationDate = DateTime.UtcNow;

pdf.SaveAs("invoice-with-metadata.pdf");

This sets standard metadata fields that appear in PDF viewer properties dialogs and enable full-text search in document management systems. The Keywords field is particularly powerful — many DMS platforms index keywords for fast retrieval without parsing entire document content.

What NuGet Packages Do I Need?

Install IronPDF via NuGet Package Manager Console:

Install-Package IronPdf

Or via .NET CLI:

dotnet add package IronPdf

IronPDF includes metadata handling in the core library. No additional packages needed for standard or custom metadata. XMP support is built-in.

How Do I Read PDF Metadata in C#?

Reading metadata from existing PDFs retrieves properties without modifying the file. Useful for indexing systems, compliance audits, or migration scripts analyzing thousands of PDFs.

var existingPdf = PdfDocument.FromFile("document.pdf");

Console.WriteLine($"Title: {existingPdf.MetaData.Title}");
Console.WriteLine($"Author: {existingPdf.MetaData.Author}");
Console.WriteLine($"Subject: {existingPdf.MetaData.Subject}");
Console.WriteLine($"Keywords: {existingPdf.MetaData.Keywords}");
Console.WriteLine($"Creator: {existingPdf.MetaData.Creator}");
Console.WriteLine($"Producer: {existingPdf.MetaData.Producer}");
Console.WriteLine($"Created: {existingPdf.MetaData.CreationDate}");
Console.WriteLine($"Modified: {existingPdf.MetaData.ModifiedDate}");

Properties return null or default values if metadata fields aren't set. Always null-check when reading metadata from PDFs created by external systems — not all PDF generators populate metadata consistently.

I've audited document repositories where 40% of PDFs had no Title, 60% had no Keywords, and some had Creator set to software names ("Microsoft Word") rather than actual creators. Metadata quality varies dramatically depending on how PDFs were generated.

How Do I Set or Modify PDF Metadata?

Setting metadata works identically for newly generated PDFs or existing files. Load the PDF, modify properties, save.

var pdf = PdfDocument.FromFile("report.pdf");

pdf.MetaData.Title = "Q1 2025 Financial Report";
pdf.MetaData.Author = "Finance Department";
pdf.MetaData.Subject = "Quarterly Financial Analysis";
pdf.MetaData.Keywords = "finance, quarterly, 2025, revenue, expenses";
pdf.MetaData.ModifiedDate = DateTime.UtcNow;

pdf.SaveAs("report-with-metadata.pdf");

The CreationDate typically shouldn't be modified when updating existing PDFs — it represents original creation. Set ModifiedDate instead to track updates.

For batch processing where you're applying consistent metadata to multiple PDFs:

var files = Directory.GetFiles("invoices/", "*.pdf");

foreach (var file in files)
{
    var pdf = PdfDocument.FromFile(file);
    pdf.MetaData.Author = "Billing System";
    pdf.MetaData.Creator = "InvoiceGenerator v2.0";
    pdf.MetaData.Keywords = "invoice, billing, automated";
    pdf.SaveAs(file); // Overwrite with metadata
}

This pattern is common in migration scenarios where legacy PDFs lack metadata and need bulk updating for DMS import.

What's the Difference Between Standard Properties and Custom Properties?

Standard metadata properties are defined in the PDF specification (ISO 32000). These eight fields appear in all PDF viewers:

Title: Document title (not filename)
Author: Creator's name or organization
Subject: Document topic or description
Keywords: Comma-separated search terms
Creator: Application that created the original document
Producer: PDF library or converter used
CreationDate: When the document was created
ModifiedDate: When it was last modified

Custom metadata properties extend beyond these standards. Use custom properties for internal tracking, workflow states, business logic — anything beyond standard fields.

pdf.MetaData.CustomProperties.Add("Department", "Engineering");
pdf.MetaData.CustomProperties.Add("ProjectID", "PROJ-2025-042");
pdf.MetaData.CustomProperties.Add("ApprovalStatus", "Pending");
pdf.MetaData.CustomProperties.Add("RetentionYears", "7");

Important limitation: Custom properties may not be visible in Adobe Reader or Preview. They're embedded in the PDF and accessible programmatically but don't appear in standard Properties dialogs. Document management systems can read custom properties, but end users typically can't see them without specialized tools.

I've used custom metadata to track approval workflows in contract management systems. PDFs flow through legal review, executive approval, archival — each stage updates custom metadata fields that triggers routing logic without modifying document content.

How Do I Add Custom Metadata Properties?

Custom properties use a dictionary-style interface:

var customProps = pdf.MetaData.CustomProperties;

// Add new custom property
customProps.Add("ReviewedBy", "J. Smith");
customProps.Add("ReviewDate", "2025-01-26");
customProps.Add("ConfidentialityLevel", "Internal");

// Edit existing custom property
customProps["ReviewedBy"] = "J. Smith, A. Jones";

// Check if property exists
if (customProps.ContainsKey("ReviewedBy"))
{
    Console.WriteLine($"Reviewed by: {customProps["ReviewedBy"]}");
}

Remove custom properties using RemoveMetaDataKey() or CustomProperties.Remove():

pdf.MetaData.RemoveMetaDataKey("TemporaryFlag");
pdf.MetaData.CustomProperties.Remove("ObsoleteProperty");

Custom metadata is particularly useful for compliance tracking. Financial documents requiring 7-year retention can store RetentionUntil as custom metadata. Automated archival systems read this property to determine when deletion is permitted.

What is XMP Metadata and When Should I Use It?

XMP (Extensible Metadata Platform) is Adobe's XML-based metadata standard embedded in PDFs. While standard PDF metadata uses simple key-value pairs, XMP stores structured, extensible metadata supporting namespaces, nested properties, and standardized schemas.

XMP is required for:

PDF/A compliance (archival standard)
Digital Asset Management (DAM) systems expecting XMP
Adobe Creative Cloud workflows (Photoshop, Illustrator embed XMP)
Dublin Core metadata (library and archival standards)

For most business documents — invoices, reports, contracts — standard PDF metadata suffices. XMP adds complexity without significant benefit unless you're integrating with systems explicitly requiring it or need PDF/A compliance.

IronPDF handles XMP automatically when generating PDF/A documents:

renderer.RenderingOptions.PdfACompliant = true;
var pdf = renderer.RenderHtmlAsPdf(html);

This embeds required XMP metadata for PDF/A-2 compliance without manual configuration. For standard PDFs, use the simpler MetaData properties.

How Do I Work with Metadata Dictionaries?

For bulk metadata operations, use GetMetaDataDictionary() and SetMetaDataDictionary():

// Retrieve all metadata as dictionary
var allMetadata = pdf.MetaData.GetMetaDataDictionary();

foreach (var kvp in allMetadata)
{
    Console.WriteLine($"{kvp.Key}: {kvp.Value}");
}

// Set metadata in bulk
var newMetadata = new Dictionary<string, string>
{
    { "Title", "Bulk Import Document" },
    { "Author", "Migration Script" },
    { "Keywords", "imported, legacy, 2025" },
    { "CustomField1", "Value1" }
};

pdf.MetaData.SetMetaDataDictionary(newMetadata);

This approach is efficient when copying metadata between PDFs or applying templates. I've used dictionary operations in migration scripts transferring metadata from legacy document systems to new DMS platforms — reading metadata from JSON config files and applying to thousands of PDFs in batch jobs.

Note that SetMetaDataDictionary() sets all specified fields. Non-standard keys (not in the eight standard properties) become custom metadata. This provides a unified interface for both standard and custom properties.

Does IronPDF Support Batch Metadata Operations?

Yes. Process hundreds or thousands of PDFs efficiently:

var pdfFiles = Directory.GetFiles(@"C:\documents\archive", "*.pdf", SearchOption.AllDirectories);

foreach (var filePath in pdfFiles)
{
    try
    {
        var pdf = PdfDocument.FromFile(filePath);

        // Apply consistent metadata
        pdf.MetaData.Creator = "ArchivalSystem v3.0";
        pdf.MetaData.Keywords = "archived, legacy, imported-2025";
        pdf.MetaData.CustomProperties["ArchiveDate"] = DateTime.UtcNow.ToString("yyyy-MM-dd");

        pdf.SaveAs(filePath); // Overwrite with updated metadata
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Failed to process {filePath}: {ex.Message}");
    }
}

For large-scale operations (10,000+ PDFs), consider parallel processing:

Parallel.ForEach(pdfFiles, new ParallelOptions { MaxDegreeOfParallelism = 4 }, filePath =>
{
    var pdf = PdfDocument.FromFile(filePath);
    pdf.MetaData.Author = "Bulk Processor";
    pdf.SaveAs(filePath);
});

This parallelizes across 4 CPU cores, significantly reducing processing time for large batches. I've processed 50,000-PDF archives in under 2 hours using parallel metadata updates.

What Common Issues Should I Watch For?

Write-protected PDFs: If a PDF has security restrictions preventing modification, metadata changes fail. Check pdf.SecuritySettings before attempting updates. Remove restrictions (if you have owner password) or work with a decrypted copy.

Encoding issues: Metadata fields are text. Special characters (accented letters, non-Latin scripts) require proper UTF-8 encoding. IronPDF handles this automatically, but be aware that very old PDF viewers (pre-2010) may not display Unicode metadata correctly.

Producer field confusion: Producer is typically the PDF library name (e.g., "IronPDF 2025"), while Creator is the application or system that generated the document (e.g., "Invoice System v2"). Don't confuse these — they serve different purposes. Some compliance systems check Producer to verify PDFs were generated by approved tools.

Metadata leaking sensitive information: Metadata can expose file paths, usernames, system information. Review metadata before distributing PDFs externally. For sensitive documents, consider clearing metadata or setting it to generic values:

pdf.MetaData.Creator = "Company Document System";
pdf.MetaData.Producer = "PDF Generator";
// Don't include usernames, internal project codes, file paths

I've seen PDFs with metadata containing Creator: C:\Users\john.smith\Projects\SecretProjectX\document.docx — leaking username, project name, and file structure. Sanitize metadata for external distribution.

Quick Reference

Metadata Property	Purpose	Visibility in PDF Viewers
Title	Document title	✓ Visible
Author	Creator name	✓ Visible
Subject	Document description	✓ Visible
Keywords	Search terms	✓ Visible
Creator	Originating application	✓ Visible
Producer	PDF library used	✓ Visible
CreationDate	When created	✓ Visible
ModifiedDate	Last modified	✓ Visible
CustomProperties	Extended metadata	✗ Not visible in standard viewers

Key Principles:

Use standard metadata for document properties visible to users (Title, Author, Keywords)
Use custom metadata for internal tracking, workflow states, business logic
XMP metadata is for PDF/A compliance and specialized DAM systems — not needed for typical business documents
Batch operations with SetMetaDataDictionary() are efficient for bulk processing
Always sanitize metadata before external distribution to avoid leaking internal information
Stack Overflow recommendations for PDFSharp's Info.Elements approach are outdated — use direct property access instead

The complete PDF metadata guide includes examples for XMP manipulation and advanced custom property scenarios.

Written by Jacob Mellor, CTO at Iron Software. Jacob created IronPDF and leads a team of 50+ engineers building .NET document processing libraries.

DEV Community