I built a CLI tool to assemble pages from PDFs, Word docs, and PowerPoint into one file

#cli #python #productivity

The problem

Every time I needed to put together a report or proposal I was doing the
same manual process — export this PDF, convert that Word doc, grab slides 3-5 from a PowerPoint, then merge everything in some online tool that wanted me to upload confidential documents to their servers.

It's a solved problem that somehow still takes 15 minutes every time.

So I built PageFuse.

What it does

PageFuse is a CLI tool that assembles pages from multiple document formats
into a single output file.


bash
  pagefuse assemble board_pack.pdf cover.pdf:1 financials.docx:all slides.pptx:2-5

  That pulls page 1 from a PDF, all pages from a Word doc, and slides 2-5
  from a PowerPoint into one PDF. Done in seconds.

  ---
  Page specs

  pagefuse assemble out.pdf file.pdf:1        # single page
  pagefuse assemble out.pdf file.pdf:1-3      # range
  pagefuse assemble out.pdf file.pdf:1,3,5-8  # mixed
  pagefuse assemble out.pdf file.pdf:all      # all pages

  ---
  Config files for repeatable builds

  For documents you rebuild regularly — weekly reports, monthly packs,
  proposals — save a .fuse config:

  output: board_pack.pdf
  output: board_pack.docx

  from: cover.pdf          1
  from: financials.docx    all
  from: slides.pptx        2-5

  Then just:

  pagefuse assemble board_pack.fuse

  Commit the config to your repo. Run it in a Makefile or CI pipeline.
  Same output every time.

  ---
  Split works too

  pagefuse split report.pdf cover.pdf:1 body.pdf:2-10 appendix.docx:11-20

  Each output can be a different format.

  ---
  Supported formats

  ┌────────┬─────────────────────────────────────────────────────────────────────────────────────┐
  │        │                                       Formats                                       │
  ├────────┼─────────────────────────────────────────────────────────────────────────────────────┤
  │ Input  │ PDF, DOCX, DOC, PPTX, PPT, ODT, ODP, ODS, XLSX, RTF, HTML, Markdown, PNG, JPG, TIFF │
  ├────────┼─────────────────────────────────────────────────────────────────────────────────────┤
  │ Output │ PDF, DOCX, ODT, PPTX, ODP, HTML, PNG, JPG, TIFF                                     │
  └────────┴─────────────────────────────────────────────────────────────────────────────────────┘

  ---
  How it's built

  - Click — CLI framework
  - pikepdf — PDF read/write/assembly (lossless, no re-rendering)
  - LibreOffice headless — DOCX/PPTX/ODT/HTML conversion
  - img2pdf — lossless image → PDF
  - pypdfium2 — PDF → image rendering
  - Rich — terminal output
  - ThreadPoolExecutor — parallel file loading

  PDF-to-PDF assembly is lossless and fast — no rendering involved.
  Non-PDF inputs go through LibreOffice headless for conversion,
  assembled to a temp PDF first, then converted to the target format.

  ---
  Install

  pip install pagefuse
  # or
  pipx install pagefuse

  Requires LibreOffice for DOCX/PPTX/ODT/HTML conversion:

  sudo apt install libreoffice       # Ubuntu/Debian
  brew install --cask libreoffice    # macOS

  ---
  What's next

  - GUI wrapper
  - Homebrew formula
  - Watch mode for auto-rebuilding on file change

  ---
  30-day free trial, no credit card: pagefuse.net

  Would love feedback on the config format, the feature set, or anything
  else. What formats or features would make this useful for your workflow?

DEV Community

I built a CLI tool to assemble pages from PDFs, Word docs, and PowerPoint into one file

The problem

What it does

Top comments (0)