How I built on-device AI background removal in a native Windows app

#windows #ai #machinelearning #buildinpublic

Every "remove the background" tool I tried uploads your image to a server first. For a screenshot tool that's backwards — your screenshots are the most private thing on your screen. So I built mine to run 100% on your machine. Here's how — and the honest part: the AI isn't the hard bit.

The thing nobody says about "AI background removal"

Most tools that cut out a background send your image to their server, run the model there, and send the result back. Fine for a stock photo. Not fine for a screenshot — which is usually a half-finished feature, a customer's data, an API key you forgot to blur, or a DM you only meant one person to see. I didn't want any of that leaving the machine. So in ShotsGlow the cutout runs locally, on your own hardware. Nothing is uploaded. There's no server to send it to in the first place.

(Demo: the one-click cutout — and the same thing running with the Wi-Fi off — at https://shotsglow.com/blog/on-device-background-removal-windows )

The honest part: the model isn't mine

Let me kill the "proprietary AI" thing right away. The model is BiRefNet — an open background-removal model anyone can download. A handful of other local tools use the exact same one. If I told you the AI was my secret sauce, you'd be right to close the tab.

The actual work was the unglamorous part: getting that model to run natively inside a Windows app, in the capture flow, with nothing leaving the machine — and to keep working on a normal person's PC, not just on my dev box.

How it actually runs

ONNX Runtime, on the CPU. No GPU acceleration — and that's a deliberate, hard-won choice (the next section is the whole story). It runs on the processor every machine already has.
The model downloads once on first run (~890 MB), sized to your PC. Instead of shipping one giant file for everyone, the app checks your RAM and pulls the variant that fits — full resolution on a roomy machine, a lighter one on a 4–8 GB laptop — so it runs without choking a modest PC. One chunky download, once; after that it's local forever.
No NPU claims. I scaffolded support for the Copilot+ neural chips and then left it switched off, because it isn't shipping yet. What ships is plain CPU. I'd rather under-claim than put a spec on the box that isn't real.

And because there's no server in the loop, it keeps working with the Wi-Fi off (after that one-time download) — same one-click cutout, nothing uploaded, because there's nowhere to upload it to.

The part I got wrong first: I shipped it on the GPU

Here's the honest version. I shipped the cutout on the GPU first (via DirectML), with a CPU fallback. It was faster on my machine. Then it broke on other people's: integrated GPUs that hung mid-inference, drivers that handed back a garbled mask, low-memory laptops that simply ran out of RAM. "Fast on mine" is worthless when it's broken on a reviewer's Surface or a customer's five-year-old laptop.

So I tore the GPU path out and went CPU-only. Slower — a few seconds, more on a weak chip — but it produces the same result on every Windows machine I could find, down to a 4 GB Atom tablet. Then I made the model adaptive to the machine's memory so the weakest PCs don't choke. "Works everywhere" beat "fast on mine," and it wasn't close.

Why "local" is also why it's $9 once

Here's the part I didn't see coming. Going on-device didn't just solve privacy — it's the whole reason I can sell this once instead of renting it to you forever.

Cloud AI means GPU servers. GPU servers mean a bill every month, for as long as anyone uses the feature. The only way to cover that bill is a subscription — which is why every cloud tool eventually starts renting itself to you.

On-device, your hardware does the work. No server, so no monthly bill, so I don't have to charge you monthly. The privacy decision and the pricing decision turned out to be the same decision: keep it on your machine, and "$9 once, no subscription" stops being a marketing line and just becomes the math.

If you're building the same thing

Download the model on first run and be honest about the size, rather than hiding a ~900 MB surprise — or, if you bundle it, own the installer size. Either way, don't pretend it's small.
DirectML looks like the pragmatic Windows path — until you ship it to strangers. On real hardware it's a lottery: integrated GPUs hang, drivers return garbage, low-RAM machines OOM. If the feature has to work for everyone, CPU-only is more reliable than GPU-fast. (And don't chase the NPU until it's real for enough users.)
The model is a commodity. The product is everything around it — the native capture, the editor, redaction that survives a crop, the export sizes. That's where the work, and the actual moat, lives.

I built this into ShotsGlow — a native Windows screenshot tool that captures, polishes, annotates, redacts and cuts out backgrounds on-device, then exports to 30 social and App-Store sizes. $9 once, no subscription, nothing uploaded. I'm building it in public as a solo Windows dev → shotsglow.com