I built a Windows tool that turns screenshots into one searchable PDF — here's what I learned

CapDrop — Thu, 04 Jun 2026 06:55:04 +0000

For months I had the same annoying problem: folders full of screenshots I couldn't actually use. Lecture slides, PDFs I own, scanned pages — all just images. I couldn't Ctrl-F them, couldn't copy a line out, couldn't get my OS to index them. A picture of text is useless the moment you need to find something in it.

So I built CapDrop to automate the whole chain on Windows. This is a write-up of how it works under the hood and the bugs that nearly broke me.

The core idea

You draw a capture box over a page, pick a page key (Page Down, arrow keys), set an interval, and walk away. CapDrop then:

Captures each page on the interval
Presses the page key for you to advance
Auto-crops margins and toolbars out of every shot
Runs OCR locally
Binds everything into a single PDF with a real text layer

The result is one document you can search, not a pile of images.

The stack

Electron for the app shell and capture/UI (I already had window management, hotkeys, and floating-bubble export working — no reason to rewrite).
A Python OCR sidecar (RapidOCR) spawned as a child process. OCR runs 100% locally; nothing is ever uploaded.
jimp for auto-crop, with a 12px safety pad so edge text never gets clipped.
pdf-lib to bind the pages and inject the OCR text layer.

The Electron + Python-sidecar split was a deliberate choice. People kept telling me to rewrite the whole thing in Python "for the OCR," but the Electron app already had everything except OCR. Adding a sidecar was a few hundred lines; a rewrite would've been months.

The bug that cost me two days

After adding the OCR pipeline, my global capture hotkey developed a 4-second delay on the first press. Cold, every time.

I guessed wrong twice — thumbnail size, then a race condition. Both were dead ends. The only thing that actually found it was instrumenting the hot path with timing logs.

The culprit: a fs.readFile of a tiny 749-byte settings.json on every hotkey press. On a cold start that read was taking 2–4 seconds — Windows Defender's real-time scanning + libuv's threadpool warming up. A 749-byte file.

The fix was to cache settings in memory synchronously so the hot path never touches disk. Instant after that.

Lesson I keep relearning: measure, don't guess. My two guesses cost me a day each. The timing log found it in ten minutes.

The OCR alignment detail nobody sees

Getting Ctrl-F to highlight the right spot is harder than getting the text layer to exist. The search works as long as the words are there — but the highlight box only lands correctly if each word's bounding box is positioned right, including font size and baseline descent. I ended up tuning the per-word font size to the box height plus a ~0.18em descent so highlights sit on the word instead of floating above it.

What I deliberately didn't do

No DRM bypass. It's for content you own — your slides, DRM-free PDFs and e-books, your scans. Protected players capture as black frames by design, and I'm not interested in fighting that.
No cloud. The whole pitch is local. The only network call the app makes is license validation.

The boring final boss: code signing

The installer is ~260MB (bundled Python + OCR models), and without code signing Windows SmartScreen throws a scary warning. Signing certs are expensive for a solo dev; registering on the Microsoft Store turns out to be the cheapest path to a trusted signature. Still working through that.

Try it / break it

There's a real searchable sample PDF on the site you can download and Ctrl-F yourself before installing anything: capdrop.app

It's Windows-only, $19 one-time with a 7-day trial — but I mostly wrote this up because the debugging stories felt worth sharing. Happy to answer anything about the Electron/Python sidecar split or the OCR pipeline in the comments.

DEV Community: CapDrop