DEV Community

CapDrop
CapDrop

Posted on

I built a Windows tool that turns screenshots into one searchable PDF — here's what I learned

For months I had the same annoying problem: folders full of screenshots I couldn't actually use. Lecture slides, PDFs I own, scanned pages — all just images. I couldn't Ctrl-F them, couldn't copy a line out, couldn't get my OS to index them. A picture of text is useless the moment you need to find something in it.

So I built CapDrop to automate the whole chain on Windows. This is a write-up of how it works under the hood and the bugs that nearly broke me.

The core idea

You draw a capture box over a page, pick a page key (Page Down, arrow keys), set an interval, and walk away. CapDrop then:

  1. Captures each page on the interval
  2. Presses the page key for you to advance
  3. Auto-crops margins and toolbars out of every shot
  4. Runs OCR locally
  5. Binds everything into a single PDF with a real text layer

The result is one document you can search, not a pile of images.

The stack

  • Electron for the app shell and capture/UI (I already had window management, hotkeys, and floating-bubble export working — no reason to rewrite).
  • A Python OCR sidecar (RapidOCR) spawned as a child process. OCR runs 100% locally; nothing is ever uploaded.
  • jimp for auto-crop, with a 12px safety pad so edge text never gets clipped.
  • pdf-lib to bind the pages and inject the OCR text layer.

The Electron + Python-sidecar split was a deliberate choice. People kept telling me to rewrite the whole thing in Python "for the OCR," but the Electron app already had everything except OCR. Adding a sidecar was a few hundred lines; a rewrite would've been months.

The bug that cost me two days

After adding the OCR pipeline, my global capture hotkey developed a 4-second delay on the first press. Cold, every time.

I guessed wrong twice — thumbnail size, then a race condition. Both were dead ends. The only thing that actually found it was instrumenting the hot path with timing logs.

The culprit: a fs.readFile of a tiny 749-byte settings.json on every hotkey press. On a cold start that read was taking 2–4 seconds — Windows Defender's real-time scanning + libuv's threadpool warming up. A 749-byte file.

The fix was to cache settings in memory synchronously so the hot path never touches disk. Instant after that.

Lesson I keep relearning: measure, don't guess. My two guesses cost me a day each. The timing log found it in ten minutes.

The OCR alignment detail nobody sees

Getting Ctrl-F to highlight the right spot is harder than getting the text layer to exist. The search works as long as the words are there — but the highlight box only lands correctly if each word's bounding box is positioned right, including font size and baseline descent. I ended up tuning the per-word font size to the box height plus a ~0.18em descent so highlights sit on the word instead of floating above it.

What I deliberately didn't do

  • No DRM bypass. It's for content you own — your slides, DRM-free PDFs and e-books, your scans. Protected players capture as black frames by design, and I'm not interested in fighting that.
  • No cloud. The whole pitch is local. The only network call the app makes is license validation.

The boring final boss: code signing

The installer is ~260MB (bundled Python + OCR models), and without code signing Windows SmartScreen throws a scary warning. Signing certs are expensive for a solo dev; registering on the Microsoft Store turns out to be the cheapest path to a trusted signature. Still working through that.

Try it / break it

There's a real searchable sample PDF on the site you can download and Ctrl-F yourself before installing anything: capdrop.app

It's Windows-only, $19 one-time with a 7-day trial — but I mostly wrote this up because the debugging stories felt worth sharing. Happy to answer anything about the Electron/Python sidecar split or the OCR pipeline in the comments.

Top comments (0)