DEV Community

Vincent A. Cicirello
Vincent A. Cicirello

Posted on

Add Metadata to a PDF Using pdfLaTeX

In last week's post, I explained how LaTeX is my tool of choice for all forms of writing. And I also provided a tip on how you can use it to combine multiple pdf files into one, regardless of whether or not any of them were originally produced with LaTeX. In this post, I provide another tip related to manipulating pdf files that again uses LaTeX, regardless of whether or not the pdf files were originally produced with it. This tip concerns adding or changing metadata embedded within the pdf, which may also be relevant to any web developers who find it useful to include on their sites some content in the form of pdf files (e.g., search engines do crawl and index the content of pdfs, including any embedded metadata).

So you have a pdf file that you want to add metadata to (e.g., author, title, subject, keywords) and for some reason you don't have an easy way to do so from the original source of the pdf. Here's an easy way to do this with pdfLaTeX. If you don't use LaTeX, don't worry about it. You don't really need to know LaTeX to use this trick, and I have a repository on GitHub with a LaTeX file you can edit with the details of the pdf and the metadata that you want to add to it. It doesn't matter how the original pdf was produced.

Table of Contents:

How to Add Metadata to a PDF with pdfLaTeX

Here are the steps to adding metadata to an existing pdf using pdfLaTeX.

Step 0: Install a LaTeX Distribution

If you don't have LaTeX installed on your system already, then you'll need to begin by installing a LaTeX distribution. For example, TeX Live is a good choice.

Step 1: Create a LaTex Source File

Create a LaTeX source file with a tex extension, but name it differently than the pdf you are adding metadata to. I'll assume the name metadata.tex in the example.

In that metadata.tex file (or whatever you named it), add the following with your favorite text editor.

\documentclass[11pt,letterpaper]{article}

\usepackage[final]{pdfpages}

\usepackage[pdftex, 
            pdfauthor={Your name possibly with coauthors goes here},
            pdftitle={Your title goes here},
            pdfsubject={Anything you want in the subject field goes here},
            pdfkeywords={Your keywords go here},
            pdfproducer={pdflatex or whatever you want for producer},
            pdfcreator={pdflatex or whatever you want for creator}]{hyperref}

\pagestyle{empty}

\begin{document}
\includepdf[pages=-]{originalFile.pdf}
\end{document}
Enter fullscreen mode Exit fullscreen mode

In the above example, we're using LaTeX's package hyperref, which has options that enable specifying metadata for the pdf. We first need the pdftex option of the hyperref package, which is required if we're using hyperref with pdfLaTeX, and then we can set any or all of the metadata fields inside the pdf as shown above. In the statement \includepdf[pages=-]{originalFile.pdf}, make sure you change originalFile.pdf to however your original pdf is named.

Step 2: Run pdfLaTeX.

You can now use pdfLaTeX to create a pdf with the contents of your original pdf but with your additional metadata. At the command line, in the directory containing the LaTeX source file you created above and your existing pdf, run the following (change the metadata.tex file to whatever filename you used above):

pdflatex metadata.tex
Enter fullscreen mode Exit fullscreen mode

This will produce a pdf named metadata.pdf, which you can easily rename as required. You can also start with the tex file named based on your desired target file.

Adding Metadata to a Combination of Multiple PDFs

If you want to add metadata while combining multiple pdf files into one, you can combine the above trick for the metadata with the trick from my previous post on using pdfLaTeX to combine multiple pdfs:

For example, your tex file might look something like the following:

\documentclass[11pt,letterpaper]{article}

\usepackage[final]{pdfpages}

\usepackage[pdftex, 
            pdfauthor={Your name possibly with coauthors goes here},
            pdftitle={Your title goes here},
            pdfsubject={Anything you want in the subject field goes here},
            pdfkeywords={Your keywords go here},
            pdfproducer={pdflatex or whatever you want for producer},
            pdfcreator={pdflatex or whatever you want for creator}]{hyperref}

\pagestyle{empty}

\begin{document}
\includepdf[pages=-]{file1.pdf}
\includepdf[pages=-]{file2.pdf}
\includepdf[pages=-]{file3.pdf}
\end{document}
Enter fullscreen mode Exit fullscreen mode

The above assumes you are combining the pdfs in their entirety. You can of course also specify page ranges as needed. Finally, run the following command at the command line to generate your combined pdf with your desired metadata (just change the name of the tex file to whatever you named the file):

pdflatex metadata.tex
Enter fullscreen mode Exit fullscreen mode

GitHub Repository

To get you started, I have a GitHub repository with a LaTeX file that you can download and edit with the details of your pdf and the metadata that you want to add to it.

GitHub logo cicirello / add-pdf-metadata

Add metadata to a pdf using pdflatex

add-pdf-metadata

Add metadata to a pdf using pdflatex regardless of how the original pdf was produced. Here are the steps:

  1. Make sure you have an up to date LaTeX system installed such as TeX Live.
  2. Read the comments in the file AddMetadataToPdf.tex.
  3. Edit the line in that file where indicated with the name of the source pdf that you want to add metadata to.
  4. Run pdflatex AddMetadataToPdf.tex at the command line, which will produce a file named AddMetadataToPdf.pdf with the contents of the original pdf file, but with the addition of your specified metadata.
  5. Change the name of the original pdf if you want to keep it as a backup, or delete the original if you don't.
  6. Rename AddMetadataToPdf.pdf to the name of the original pdf file.
  7. Alternatively, you could rename the original pdf before the above procedure, and then rename AddMetadataToPdf.tex based on how you want the…




Where You Can Find Me

Follow me here on DEV:

Follow me on GitHub:

GitHub logo cicirello / cicirello

My GitHub Profile

Vincent A Cicirello

Vincent A. Cicirello

Sites where you can find me or my work
Web and social media Personal Website LinkedIn DEV Profile Stack Overflow profile StackExchange profile
Software development Github Maven Central PyPI Docker Hub
Publications Google Scholar ORCID DBLP ACM Digital Library IEEE Xplore ResearchGate arXiv
View Bibliometrics for My Research Publications My bibliometrics
View My Detailed GitHub Activity My GitHub Activity

If you want to generate the equivalent to the above for your own GitHub profile, check out the cicirello/user-statistician GitHub Action.




Or visit my website:

Vincent A. Cicirello - Professor of Computer Science

Vincent A. Cicirello - Professor of Computer Science at Stockton University - is a researcher in artificial intelligence, evolutionary computation, swarm intelligence, and computational intelligence, with a Ph.D. in Robotics from Carnegie Mellon University. He is an ACM Senior Member, IEEE Senior Member, AAAI Life Member, EAI Distinguished Member, and SIAM Member.

favicon cicirello.org

Top comments (2)

Collapse
 
davidhjr profile image
RJ

I'm new to LaTeX running into difficulty getting started in your steps using hyperref with pdfLaTeX, any advise on what I am missing?

Collapse
 
cicirello profile image
Vincent A. Cicirello

What kind of difficulty? Are you getting an error message? If so, what's the error?