The laptop you already own can run a language model, generate an image, transcribe your voice, and read a PDF without ever touching the internet. That hardware sits idle while you pay three different subscriptions and hand your data to three different servers. Off Grid AI Desktop is a free, open-source app that does the whole multimodal stack directly on your Mac or PC.
Free, open-source (AGPL-3.0), runs offline. No account, no telemetry.
Off Grid AI Desktop. Private AI that runs on your machine, no cloud, no account.
The choice you have been forced to make
Every other tool makes you pick one of two compromises.
You can go local and single-purpose. Ollama and LM Studio run a chat model on your machine and stop there. ComfyUI and Automatic1111 generate images on your machine and do nothing else. Each one is private. Each one does exactly one thing.
Or you can go multimodal and cloud. ChatGPT, Claude, and Gemini chat, see images, make images, and talk. They also send every prompt, every document, and every screenshot to a server you do not own.
Off Grid AI Desktop refuses the trade. It runs chat, vision, image generation, speech-to-text, text-to-speech, document RAG, screen-capture memory, meeting recording, and MCP connectors. All on device. All in one window.
What "multimodal on device" actually means here
Text in, text out is the easy part. Here is the rest of the stack, and it all runs locally.
-
Chat with a local LLM. A bundled
llama.cppserver runs models like Gemma and Qwen. You get temperature and context-window controls. It behaves like ChatGPT, but the model lives on your disk. - Vision. Hand the model an image and ask about it. The pixels never leave your machine.
-
Image generation.
stable-diffusion.cppruns SDXL, SDXL-Lightning, SD 1.5 and 2.1, and the 2026 Z-Image-Turbo model. You get txt2img, img2img, live per-step preview, a progress bar with ETA, cancel, a lightbox, and an artifacts gallery. -
Speech to text. A bundled
whisper.cppturns your mic into text in the composer. - Text to speech. Kokoro-82M, an open-weight multilingual voice model, reads any message aloud. There is an auto-speak voice mode too.
- Documents. Upload PDFs, DOCX, text, images, audio, or video. On-device embeddings and a local vector store let you chat with cited sources.
- Screen-capture memory. With your explicit opt-in and a visible recording indicator, a screen frame becomes OCR text, and the local model distills it into observations and entities. This is the part nothing else has.
You are not stitching seven apps together. You ask one assistant, and it reaches for whichever modality the task needs.
Off Grid vs the alternatives
| Capability | Off Grid AI Desktop | Ollama / LM Studio | ComfyUI / A1111 | Cloud apps (ChatGPT, etc.) |
|---|---|---|---|---|
| Local LLM chat | Yes | Yes | No | No (cloud) |
| Image vision | Yes | Limited | No | Yes (cloud) |
| Image generation | Yes | No | Yes | Yes (cloud) |
| Speech to text | Yes | No | No | Yes (cloud) |
| Text to speech | Yes | No | No | Yes (cloud) |
| Document RAG | Yes | No | No | Yes (cloud) |
| Screen-capture memory | Yes | No | No | No |
| Runs offline | Yes | Yes | Yes | No |
| No account needed | Yes | Yes | Yes | No |
| Open source | Yes (AGPL) | Mixed | Yes | No |
| Your data leaves the machine | Never | Never | Never | Always |
The cloud apps match the feature checklist. They lose on the only row that matters for private work: your data leaves the machine, every single time.
Why this is possible now
Two things changed.
Quantized GGUF models shrank. A model that needed a data center now fits in consumer RAM at q8_0 or Q4_K. The same goes for few-step image models like SDXL-Lightning and Z-Image-Turbo, which produce a usable image in a handful of steps instead of dozens.
Consumer GPUs got the acceleration. On a Mac, Off Grid uses Metal and the unified memory pool, so the model and your apps share one fast address space. On Windows, it uses CUDA on NVIDIA cards or Vulkan elsewhere, with a CPU fallback that still works. The bundled engines (llama.cpp, stable-diffusion.cpp, whisper.cpp) are built to use that hardware.
Models load on demand. You are not holding seven models in memory at once. The app loads what the current task needs and releases it.
One local gateway for everything
Off Grid ships a local OpenAI-compatible API at http://127.0.0.1:7878/v1. Point your own scripts, editor plugins, or side projects at it the way you would point them at OpenAI, except the requests stay on 127.0.0.1. Chat and vision through one local endpoint, no key, no quota.
curl http://127.0.0.1:7878/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hello"}]}'
Privacy is the architecture, not a setting
Cloud apps treat privacy as a policy you have to trust. Here it is structural.
There is no account, so there is no profile to leak. There is no telemetry, so nothing phones home. There are no API keys, because there is no third party to authenticate with. Screen capture is opt-in per device with a visible indicator, and you stop it whenever you want. The source is AGPL, so anyone can read exactly what the app does with your data, which is nothing.
Getting started
- Open the repo: github.com/off-grid-ai/desktop.
- Download or clone, then install for your OS (macOS or Windows).
- Launch the app. Open the built-in Hugging Face models browser.
- Download a GGUF chat model that fits your RAM, and start a conversation.
- Try image generation, voice, and document chat from the same window.
What is coming
- Cross-device sync over a private mesh, so your paired devices share memory without a server.
- Using the local gateway from other paired devices over that mesh.
- More bundled models as quantized formats improve.
The Off Grid AI Desktop chat, running a local model fully on-device.
FAQ
Q: Is it really free?
Yes. The app is free and open-source under AGPL-3.0. There is no account and no paywall on the local stack described here.
Q: Does it work offline?
Yes. Once a model is downloaded, chat, image generation, voice, and document RAG all run with the network off.
Q: What about my data?
It stays on your machine. No telemetry, no account, no API keys. Screen capture is opt-in with a visible indicator.
Q: Mac or Windows?
Both. Macs use Metal and unified memory. Windows uses CUDA or Vulkan, with a CPU fallback.
Q: How much RAM do I need?
Enough for the model you pick. Quantized GGUF models are sized so consumer RAM and VRAM can hold them. Smaller models run on modest machines; larger ones want more memory.
Q: Can other tools use the local models?
Yes. The OpenAI-compatible gateway at http://127.0.0.1:7878/v1 accepts the same requests you would send to a cloud API.
The whole multimodal stack, on your own machine, with nothing leaving it. Get it on GitHub →


Top comments (0)