DEV Community

Jackson Ly
Jackson Ly

Posted on

Why I built my Mac assistant to run 100% on-device (and what local-first actually cost me)

I'm building a proactive personal assistant for the Mac called recal. It watches how I work, learns my patterns, and starts doing the repetitive busywork for me. The one constraint I refused to bend on: it runs entirely on-device. No cloud, no account, no server. 0 bytes of my activity leave the machine.

This is the honest engineering version of that decision: why I made it, what it bought me, and what it genuinely cost. I'm the founder, building in public, and the product is pre-launch, so this is about the architecture, not a sales pitch.

Privacy by architecture, not by policy

Every AI productivity tool I tried wanted the same thing: my screen, my files, my activity history, living on someone else's server. For a tool whose entire job is to watch how I work all day, "trust our privacy policy" was never going to be enough. A policy is a promise. Architecture is a guarantee.

If the data never leaves the device, there is no server to breach, no account to leak, no policy to quietly change next quarter. You can pull the ethernet cable and it still works. That is a fundamentally different security model than "we encrypt it in transit."

The catch is that you have to actually build for it, and "local-first" stops being a marketing word the moment you hit the real engineering.

Capture is the scariest part, so it is default-deny

A tool that observes your work is one bad decision away from being spyware. The Microsoft Recall backlash made that vivid for everyone. So the capture layer is built default-deny: it does not record unless a specific, allowed signal says it is safe to.

The model I kept coming back to is Helen Nissenbaum's contextual integrity: information is not simply private or public, it is appropriate or inappropriate to a context. A password field, a banking tab, an incognito window are contexts where capture is never appropriate, no matter how useful the data would be. So sensitivity is a first-class tag on every observed event, decided at capture time, and anything uncertain gets dropped rather than stored. The expensive, paranoid default is the correct one here.

Local retrieval without a cloud index

The fun objection to local-first AI is "but the good models live in data centers." True, and I am not pretending a Mac runs a frontier model. But most of what a personal assistant needs is not a giant model, it is your own context, retrievable fast.

That part runs entirely on-device: content gets embedded with a small ONNX model running locally, and retrieval is a mix of embeddings, recency, and plain filters. For personal re-finding ("what was I looking at last Tuesday before the meeting"), that combination beats a cloud round-trip, and it never ships your life to an index you do not control. The thing that surprised me: you need far less model than the industry implies, as long as you are retrieving over your own data instead of trying to reason about the whole world.

The part I care about most: it proposes, I approve

"It did something on its own" is terrifying the first time it is wrong. So recal never acts on its own. It does the work in the background, then surfaces a finished result for one tap: approve, or do not.

That approve-gate is not a UX afterthought, it is the trust contract. It keeps a human as the one deciding while the machine does the doing. It also makes a wrong proposal cheap (you reject it and move on) instead of catastrophic (it already emailed the wrong person). Building the gate first, before the autonomy, was the single best product decision I made.

What it cost

Being honest, because that is the point of building in public.

  • No giant cloud model to lean on. Anything that genuinely needs frontier-scale reasoning has to be designed around, deferred, or made opt-in, and you feel that constraint daily.
  • Capture you can trust is slow to build. Default-deny means you are constantly saying no to data you would love to have, and writing the rules that decide "appropriate" is most of the work.
  • You carry the whole stack. There is no server to hotfix. The observation engine, the index, and the agent all ship to the user's machine in Swift and Rust, and a bug is a bug on their hardware, not a deploy you roll back.

Would I do it again? Yes, without hesitation. The cost is real, but what I get back is a tool I would actually trust to watch my whole working life, which is the only kind worth building.

If the on-device, local-first direction resonates, I am building recal in public and there is a waitlist at recal.so. But mostly I wanted to write the architecture down honestly, because I think more personal software should be built this way, and I would genuinely like to hear how others are drawing the same lines.

Top comments (1)

Collapse
 
unitbuilds profile image
UnitBuilds

You should look at MCP-Lite on my git (also old post on it), if you want your agent to use a browser, it's pretty good... Switching DOM scraping for AOM traversal makes a huge performance difference and improves your signal to noise.