DEV Community

N C
N C

Posted on

fastpdf2png: PDF to PNG at 1,500 pages/s with SIMD and PDFium

What My Project Does

I was working on a document extraction pipeline and got frustrated with how slow PDF to PNG conversion was. PyMuPDF, MuPDF,
ImageMagick, none of them were fast enough when you're processing thousands of documents.

So I wrote fastpdf2png. It uses PDFium (the PDF engine from Chrome) under the hood, with a custom PNG encoder that uses SIMD
instructions and a patched compression library. It also detects when a page is grayscale and outputs 8-bit PNGs automatically.

  pip install fastpdf2png

  import fastpdf2png
  images = fastpdf2png.to_images("doc.pdf", dpi=150, workers=4)
Enter fullscreen mode Exit fullscreen mode

Target Audience

Anyone dealing with PDFs at scale. Data pipelines, ML preprocessing, document management, that kind of thing.

Comparison

I benchmarked everything I could find at 150 DPI, single process. fastpdf2png does 323 pg/s, MuPDF does 37, PyMuPDF 30, and
ImageMagick 2.9. With 8 workers it gets to about 1,500 pg/s. Output files end up smaller too because of the grayscale detection.

https://github.com/nataell95/fastpdf2png

Top comments (0)