What My Project Does
I was working on a document extraction pipeline and got frustrated with how slow PDF to PNG conversion was. PyMuPDF, MuPDF,
ImageMagick, none of them were fast enough when you're processing thousands of documents.
So I wrote fastpdf2png. It uses PDFium (the PDF engine from Chrome) under the hood, with a custom PNG encoder that uses SIMD
instructions and a patched compression library. It also detects when a page is grayscale and outputs 8-bit PNGs automatically.
pip install fastpdf2png
import fastpdf2png
images = fastpdf2png.to_images("doc.pdf", dpi=150, workers=4)
Target Audience
Anyone dealing with PDFs at scale. Data pipelines, ML preprocessing, document management, that kind of thing.
Comparison
I benchmarked everything I could find at 150 DPI, single process. fastpdf2png does 323 pg/s, MuPDF does 37, PyMuPDF 30, and
ImageMagick 2.9. With 8 workers it gets to about 1,500 pg/s. Output files end up smaller too because of the grayscale detection.
Top comments (0)