DEV Community

raptorgold
raptorgold

Posted on

I built a CLI tool to assemble pages from PDFs, Word docs, and PowerPoint into one file

The problem

Every time I needed to put together a report or proposal I was doing the
same manual process — export this PDF, convert that Word doc, grab slides 3-5 from a PowerPoint, then merge everything in some online tool that wanted me to upload confidential documents to their servers.

It's a solved problem that somehow still takes 15 minutes every time.

So I built PageFuse.


What it does

PageFuse is a CLI tool that assembles pages from multiple document formats
into a single output file.


bash
  pagefuse assemble board_pack.pdf cover.pdf:1 financials.docx:all slides.pptx:2-5

  That pulls page 1 from a PDF, all pages from a Word doc, and slides 2-5
  from a PowerPoint into one PDF. Done in seconds.

  ---
  Page specs

  pagefuse assemble out.pdf file.pdf:1        # single page
  pagefuse assemble out.pdf file.pdf:1-3      # range
  pagefuse assemble out.pdf file.pdf:1,3,5-8  # mixed
  pagefuse assemble out.pdf file.pdf:all      # all pages

  ---
  Config files for repeatable builds

  For documents you rebuild regularly — weekly reports, monthly packs,
  proposals — save a .fuse config:

  output: board_pack.pdf
  output: board_pack.docx

  from: cover.pdf          1
  from: financials.docx    all
  from: slides.pptx        2-5

  Then just:

  pagefuse assemble board_pack.fuse

  Commit the config to your repo. Run it in a Makefile or CI pipeline.
  Same output every time.

  ---
  Split works too

  pagefuse split report.pdf cover.pdf:1 body.pdf:2-10 appendix.docx:11-20

  Each output can be a different format.

  ---
  Supported formats

  ┌────────┬─────────────────────────────────────────────────────────────────────────────────────┐
  │        │                                       Formats                                       │
  ├────────┼─────────────────────────────────────────────────────────────────────────────────────┤
  │ Input  │ PDF, DOCX, DOC, PPTX, PPT, ODT, ODP, ODS, XLSX, RTF, HTML, Markdown, PNG, JPG, TIFF │
  ├────────┼─────────────────────────────────────────────────────────────────────────────────────┤
  │ Output │ PDF, DOCX, ODT, PPTX, ODP, HTML, PNG, JPG, TIFF                                     │
  └────────┴─────────────────────────────────────────────────────────────────────────────────────┘

  ---
  How it's built

  - Click — CLI framework
  - pikepdf — PDF read/write/assembly (lossless, no re-rendering)
  - LibreOffice headless — DOCX/PPTX/ODT/HTML conversion
  - img2pdf — lossless image → PDF
  - pypdfium2 — PDF → image rendering
  - Rich — terminal output
  - ThreadPoolExecutor — parallel file loading

  PDF-to-PDF assembly is lossless and fast — no rendering involved.
  Non-PDF inputs go through LibreOffice headless for conversion,
  assembled to a temp PDF first, then converted to the target format.

  ---
  Install

  pip install pagefuse
  # or
  pipx install pagefuse

  Requires LibreOffice for DOCX/PPTX/ODT/HTML conversion:

  sudo apt install libreoffice       # Ubuntu/Debian
  brew install --cask libreoffice    # macOS

  ---
  What's next

  - GUI wrapper
  - Homebrew formula
  - Watch mode for auto-rebuilding on file change

  ---
  30-day free trial, no credit card: pagefuse.net

  Would love feedback on the config format, the feature set, or anything
  else. What formats or features would make this useful for your workflow?
Enter fullscreen mode Exit fullscreen mode

Top comments (0)