For months I had the same annoying problem: folders full of screenshots I couldn't actually use. Lecture slides, PDFs I own, scanned pages — all just images. I couldn't Ctrl-F them, couldn't copy a line out, couldn't get my OS to index them. A picture of text is useless the moment you need to find something in it.
So I built CapDrop to automate the whole chain on Windows. This is a write-up of how it works under the hood and the bugs that nearly broke me.
The core idea
You draw a capture box over a page, pick a page key (Page Down, arrow keys), set an interval, and walk away. CapDrop then:
- Captures each page on the interval
- Presses the page key for you to advance
- Auto-crops margins and toolbars out of every shot
- Runs OCR locally
- Binds everything into a single PDF with a real text layer
The result is one document you can search, not a pile of images.
The stack
- Electron for the app shell and capture/UI (I already had window management, hotkeys, and floating-bubble export working — no reason to rewrite).
- A Python OCR sidecar (RapidOCR) spawned as a child process. OCR runs 100% locally; nothing is ever uploaded.
- jimp for auto-crop, with a 12px safety pad so edge text never gets clipped.
- pdf-lib to bind the pages and inject the OCR text layer.
The Electron + Python-sidecar split was a deliberate choice. People kept telling me to rewrite the whole thing in Python "for the OCR," but the Electron app already had everything except OCR. Adding a sidecar was a few hundred lines; a rewrite would've been months.
The bug that cost me two days
After adding the OCR pipeline, my global capture hotkey developed a 4-second delay on the first press. Cold, every time.
I guessed wrong twice — thumbnail size, then a race condition. Both were dead ends. The only thing that actually found it was instrumenting the hot path with timing logs.
The culprit: a fs.readFile of a tiny 749-byte settings.json on every hotkey press. On a cold start that read was taking 2–4 seconds — Windows Defender's real-time scanning + libuv's threadpool warming up. A 749-byte file.
The fix was to cache settings in memory synchronously so the hot path never touches disk. Instant after that.
Lesson I keep relearning: measure, don't guess. My two guesses cost me a day each. The timing log found it in ten minutes.
The OCR alignment detail nobody sees
Getting Ctrl-F to highlight the right spot is harder than getting the text layer to exist. The search works as long as the words are there — but the highlight box only lands correctly if each word's bounding box is positioned right, including font size and baseline descent. I ended up tuning the per-word font size to the box height plus a ~0.18em descent so highlights sit on the word instead of floating above it.
What I deliberately didn't do
- No DRM bypass. It's for content you own — your slides, DRM-free PDFs and e-books, your scans. Protected players capture as black frames by design, and I'm not interested in fighting that.
- No cloud. The whole pitch is local. The only network call the app makes is license validation.
The boring final boss: code signing
The installer is ~260MB (bundled Python + OCR models), and without code signing Windows SmartScreen throws a scary warning. Signing certs are expensive for a solo dev; registering on the Microsoft Store turns out to be the cheapest path to a trusted signature. Still working through that.
Try it / break it
There's a real searchable sample PDF on the site you can download and Ctrl-F yourself before installing anything: capdrop.app
It's Windows-only, $19 one-time with a 7-day trial — but I mostly wrote this up because the debugging stories felt worth sharing. Happy to answer anything about the Electron/Python sidecar split or the OCR pipeline in the comments.
Top comments (0)