The Problem
You've been there: someone sends you a beautifully designed PDF deck and asks you to "just edit a few slides." You try an online converter, and what do you get? A PowerPoint full of flat images. Every icon, every text box—just a screenshot you can't touch.
I kept hitting this wall, so I built PDFtoDeck—an open-source tool that converts PDF to truly editable PowerPoint files.
What Makes It Different?
Most converters take the easy route: render each PDF page as an image and embed it into a slide. Done. But useless if you need to edit anything.
PDFtoDeck does three things differently:
- Text → Editable text boxes with correct positioning, font size, and color
- Vector icons → Editable freeform shapes (not bitmaps!)
- Layout preservation—spacing, alignment, and visual hierarchy stay intact
The vector icon extraction is the part I'm most proud of—and the hardest to get right.
The Technical Challenge: PDF Vector Paths → PowerPoint Shapes
How PDF Stores Vector Graphics
A PDF doesn't have a concept of "icons." It has drawing operators—sequences of path commands like:
m 100 200 % moveto
l 150 250 % lineto
c 160 260 170 270 180 280 % curveto
h % closepath
f % fill
A single icon might be composed of dozens of these path fragments, potentially scattered across the page content stream.
Step 1: Extracting Paths with PyMuPDF
I use PyMuPDF (fitz) to extract all vector drawings from a page:
import fitz
doc = fitz.open("slides.pdf")
page = doc[0]
drawings = page.get_drawings()
for path in drawings:
# Each path has: items (line/curve/rect commands),
# fill color, stroke color, width, rect (bounding box)
print(f"Path with {len(path['items'])} segments, "
f"fill={path['fill']}, rect={path['rect']}")
A typical slide might yield 200+ individual paths. Most of them are just background rectangles or decorative lines. The challenge is figuring out which paths form an icon.
Step 2: Grouping Paths into Icons
This is where it gets interesting. My approach:
- Filter out large shapes — anything spanning more than 5% of the page area is likely a background element, not an icon
- Spatial clustering — paths whose bounding boxes overlap or are very close together probably belong to the same icon
- Size thresholds — icons typically fall within a certain size range (16×16 to 128×128 points in PDF coordinates)
def cluster_icon_paths(paths, proximity=5.0):
"""Group nearby small paths into icon candidates."""
clusters = []
used = set()
for i, p in enumerate(paths):
if i in used or is_background(p):
continue
cluster = [p]
used.add(i)
bbox = fitz.Rect(p["rect"])
# Find nearby paths that belong to this icon
for j, q in enumerate(paths):
if j in used or is_background(q):
continue
q_rect = fitz.Rect(q["rect"])
if bbox.intersects(q_rect) or distance(bbox, q_rect) < proximity:
cluster.append(q)
used.add(j)
bbox |= q_rect # expand bounding box
if len(cluster) >= 2: # single paths are usually not icons
clusters.append(cluster)
return clusters
Step 3: Converting to PowerPoint Freeform Shapes
This is the final piece. PowerPoint has a FreeformBuilder API in python-pptx that lets you construct arbitrary shapes from line and curve segments:
from pptx.util import Emu
def add_icon_to_slide(slide, paths, offset_x, offset_y):
"""Convert PDF paths to a PowerPoint freeform shape."""
builder = slide.shapes.build_freeform(
start_x=Emu(0), start_y=Emu(0)
)
for path in paths:
for item in path["items"]:
cmd = item[0]
if cmd == "l": # line
_, p1, p2 = item
builder.add_line_segment(
Emu(int(p2.x - offset_x) * 12700),
Emu(int(p2.y - offset_y) * 12700),
)
elif cmd == "c": # cubic bezier curve
_, p1, p2, p3, p4 = item
# python-pptx doesn't directly support cubic beziers,
# so we approximate with line segments
points = approximate_bezier(p1, p2, p3, p4, segments=8)
for pt in points:
builder.add_line_segment(
Emu(int(pt.x - offset_x) * 12700),
Emu(int(pt.y - offset_y) * 12700),
)
shape = builder.convert_to_shape()
shape.fill.solid()
shape.fill.fore_color.rgb = RGBColor(*paths[0]["fill"])
return shape
The result: icons in your PowerPoint that you can resize, recolor, and edit—just like native shapes.
Architecture
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Browser │────▶│ FastAPI Backend │────▶│ .pptx file │
│ (Next.js) │◀────│ pymupdf + pptx │◀────│ download │
└─────────────┘ └──────────────────┘ └─────────────┘
- Frontend: Next.js 15 (App Router) + Tailwind CSS—drag-and-drop upload with real-time progress
- Backend: Python FastAPI—PDF parsing, icon extraction, PPTX generation
- Auth: Google OAuth via next-auth
- Payments: PayPal (pay-per-use credits)
- Infra: VPS + Nginx + Cloudflare CDN + Let's Encrypt SSL
Try It / Contribute
🔗 Live demo: pdf2deck.xyz — free tier, no sign-up needed (5 pages max)
📦 GitHub: github.com/LotusWang0723/PDFtoDeck
The free tier lets you convert PDFs up to 5 pages long with no registration required. Signed-in users get 5 conversions per day. For heavier usage, credit packs start at $1.99.
What's Next
- [ ] Batch conversion (upload multiple PDFs)
- [ ] Better font matching (map PDF fonts to system fonts)
- [ ] SVG icon detection using ML-based classification
- [ ] API endpoint for programmatic access
I'd love to hear feedback—especially from anyone who's worked with PDF internals or python-pptx. The vector path grouping algorithm is still pretty naive, and I'm sure there are smarter approaches.
*If this is useful, a ⭐ on GitHub would mean a lot!*p
Top comments (0)