Why Your PDF Is 50MB and How to Fix It

#webdev #beginners #productivity #tutorial

Most oversized PDFs contain embedded images at far higher resolution than needed. A 4000x3000 pixel photo embedded in a PDF that will only be viewed on screen or printed at letter size wastes megabytes on invisible detail.

Where the bloat comes from

PDF file size breakdown for a typical 10-page document:

Images: 85-95% of file size
Fonts: 3-8% of file size
Text and vectors: 1-3% of file size
Metadata and structure: <1%

A single uncompressed 12-megapixel photo embedded at original resolution adds 10-30 MB to a PDF. The same image downsampled to 150 DPI for screen viewing takes 200-500 KB.

Compression strategies

Image downsampling. Reduce image resolution to match the output medium. Screen viewing: 72-150 DPI. Print: 300 DPI. Most PDF images are embedded at 300+ DPI even for screen-only documents.

Image recompression. Re-encode images with higher JPEG compression. Quality 85 is virtually indistinguishable from quality 100 but typically half the file size.

Font subsetting. Embed only the characters used in the document rather than the full font. A font with 2,000 glyphs that only uses 200 characters can be reduced by 90%.

Removing metadata. PDFs can contain edit history, comments, form field data, and other metadata. Stripping unused metadata reduces size modestly.

# Using Ghostscript for PDF compression
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4    -dPDFSETTINGS=/ebook    -dNOPAUSE -dBATCH -dQUIET    -sOutputFile=output.pdf input.pdf

The -dPDFSETTINGS flag controls quality:

/screen: 72 DPI, smallest size
/ebook: 150 DPI, good for screen
/printer: 300 DPI, good for printing
/prepress: 300 DPI, highest quality

For compressing PDFs without installing command-line tools, I built a compressor at zovo.one/free-tools/pdf-compressor. It runs in the browser, processes the file locally, and lets you choose the quality-size tradeoff.

I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.