DEV Community

Cover image for Reading and Editing PDF Metadata in the Browser with pdf-lib — Title, Author, Keywords, and Modification Date
Shaishav Patel
Shaishav Patel

Posted on

Reading and Editing PDF Metadata in the Browser with pdf-lib — Title, Author, Keywords, and Modification Date

Every PDF file contains metadata beyond its visible content — a title, author name, subject, keywords, creation date, and modification date. This data lives in the PDF's document information dictionary and is often invisible to the user unless they know where to look.

The PDF Metadata Viewer at Ultimate Tools lets you read all of this in one drop, and edit and save it back — entirely client-side with pdf-lib.


What PDF Metadata Fields Exist?

The PDF specification defines these standard document information dictionary entries:

Field pdf-lib getter pdf-lib setter
Title getTitle() setTitle()
Author getAuthor() setAuthor()
Subject getSubject() setSubject()
Keywords getKeywords() setKeywords()
Creator getCreator() setCreator()
Producer getProducer() setProducer()
CreationDate getCreationDate() setCreationDate()
ModDate getModificationDate() setModificationDate()

PageCount is not a metadata field — it's structural — but it's useful to show alongside metadata so users understand what they're working with.


Reading Metadata on Drop

interface PdfMetadata {
  title: string;
  author: string;
  subject: string;
  keywords: string;
  creator: string;
  producer: string;
  creationDate: string;
  modificationDate: string;
  pageCount: number;
}

const loadMetadata = async (file: File) => {
  const arrayBuffer = await file.arrayBuffer();
  const pdfDoc = await PDFDocument.load(arrayBuffer);

  setMetadata({
    title:            pdfDoc.getTitle() || '',
    author:           pdfDoc.getAuthor() || '',
    subject:          pdfDoc.getSubject() || '',
    keywords:         pdfDoc.getKeywords() || '',
    creator:          pdfDoc.getCreator() || '',
    producer:         pdfDoc.getProducer() || '',
    creationDate:     pdfDoc.getCreationDate()?.toLocaleString() || 'Unknown',
    modificationDate: pdfDoc.getModificationDate()?.toLocaleString() || 'Unknown',
    pageCount:        pdfDoc.getPageCount(),
  });
};
Enter fullscreen mode Exit fullscreen mode

The || '' fallbacks handle PDFs that have missing or empty metadata fields — which is common, especially for PDFs generated by older tools or exported from Word without metadata.

getCreationDate() returns a JavaScript Date object (or undefined). Calling .toLocaleString() on it converts it to a human-readable string in the user's local timezone.


Displaying Editable Fields

The component uses a controlled textarea for each editable field:

const handleInputChange = (field: keyof PdfMetadata, value: string) => {
  if (!metadata) return;
  setMetadata({ ...metadata, [field]: value });
};
Enter fullscreen mode Exit fullscreen mode

creationDate, modificationDate, producer, and pageCount are read-only in the UI — these are informational fields. The user can only edit title, author, subject, keywords, and creator.


Writing Metadata Back to the PDF

const savePdf = async () => {
  if (!file || !metadata) return;

  const arrayBuffer = await file.arrayBuffer();
  const pdfDoc = await PDFDocument.load(arrayBuffer);

  pdfDoc.setTitle(metadata.title);
  pdfDoc.setAuthor(metadata.author);
  pdfDoc.setSubject(metadata.subject);
  pdfDoc.setKeywords(metadata.keywords.split(','));  // keywords is an array in PDF spec
  pdfDoc.setCreator(metadata.creator);
  pdfDoc.setProducer(metadata.producer);

  // Always update modification date on save
  pdfDoc.setModificationDate(new Date());

  const pdfBytes = await pdfDoc.save();
  const blob = new Blob([pdfBytes], { type: 'application/pdf' });
  const url = URL.createObjectURL(blob);
  setNewPdfUrl(url);
};
Enter fullscreen mode Exit fullscreen mode

A few things worth noting:

Keywords are an array, not a string. The PDF spec stores keywords as an array of strings. pdf-lib's setKeywords() takes string[]. The UI shows them as a comma-separated string (e.g., "finance, quarterly, report"), so we split on , before passing to pdf-lib.

setModificationDate(new Date()) — always updating the modification date on save is the correct behavior. Any tool that writes to a PDF should update the modification date to reflect when the document was last changed.

The original file is not mutated. PDFDocument.load() creates an in-memory copy. The original File object on disk is untouched. Only when the user downloads the result does a new file get written.


Blob URL Memory Management

const [newPdfUrl, setNewPdfUrl] = useState<string | null>(null);

// On save
const url = URL.createObjectURL(blob);
setNewPdfUrl(url);

// On component unmount or reset
useEffect(() => {
  return () => {
    if (newPdfUrl) URL.revokeObjectURL(newPdfUrl);
  };
}, [newPdfUrl]);
Enter fullscreen mode Exit fullscreen mode

URL.createObjectURL() creates an in-memory reference to the blob. If you don't call URL.revokeObjectURL() when done, that memory stays allocated for the session. For a PDF that might be 10–50MB, this matters.


What the Metadata Is Actually Used For

Accessibility tools — screen readers and PDF viewers use the Title and Author fields for document identification.

Search indexing — enterprise search tools (SharePoint, Google Drive, Elasticsearch) index PDF metadata separately from body text. A well-tagged PDF is easier to find.

Print workflows — professional print environments use Subject and Keywords for job routing.

Legal/compliance — metadata can contain sensitive information (creator software version, original author name) that should be removed before publishing.


Stripping Metadata Entirely

If the goal is to strip metadata rather than edit it, you can pass empty strings:

pdfDoc.setTitle('');
pdfDoc.setAuthor('');
pdfDoc.setSubject('');
pdfDoc.setKeywords([]);
pdfDoc.setCreator('');
pdfDoc.setProducer('');
Enter fullscreen mode Exit fullscreen mode

Note: pdf-lib sets these as empty string values in the document info dictionary rather than removing the keys entirely. For complete removal, you'd need to access the raw PDF object structure — beyond what the standard api provides.


Summary

Operation pdf-lib method Note
Read title pdfDoc.getTitle() Returns `string \
Write title {% raw %}pdfDoc.setTitle(str) Pass empty string to clear
Read keywords pdfDoc.getKeywords() Returns comma-separated string
Write keywords pdfDoc.setKeywords(arr) Takes string[]
Read date pdfDoc.getCreationDate() Returns `Date \
Update mod date {% raw %}pdfDoc.setModificationDate(new Date()) Do this on every save

The PDF Metadata Viewer is live at Ultimate Tools — drop a PDF to inspect its metadata, edit the fields, and download the updated file.

Top comments (0)