DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at runaihome.com

Local AI Privacy Audit: What Data Actually Stays on Your Machine (2026)

This article was originally published on runaihome.com

Running AI locally is supposed to mean your data stays yours. That's true — but only if you audit the tools you're running. Most people who set up a home AI stack assume privacy is binary: local AI = private, cloud AI = not private. The real picture is messier.

There are two separate privacy questions worth asking about any local AI tool:

  1. Does the tool's own software phone home? (What does the vendor collect?)
  2. Is your inference server accidentally accessible to others? (What have you exposed on your network?)

These are distinct problems with distinct fixes. Running through both for the eight most common local AI tools turns up a few surprises.


Quick Reference: Privacy Scorecard

Tool Prompts / audio leave? Vendor telemetry default Telemetry opt-out Auth out of the box
llama.cpp No None No HTTP server by default
whisper.cpp / faster-whisper No None No HTTP server by default
LM Studio No None N/A (desktop app, local only)
Open WebUI No None (disabled in defaults) Yes (login required)
Ollama No App version + request counts Limited No
Continue.dev No Model name, token count, OS/IDE Yes — easy N/A
AnythingLLM No Event types, vector DB type, model tag Yes — one env var Yes (password required)
ComfyUI (local) No None No

No tool in this list sends your prompts, responses, images, or audio to any third party. That much is clean across the board. The differences are in what metadata gets collected and how exposed the HTTP servers are.


Tool-by-Tool: What Actually Leaves Your Machine

llama.cpp

The benchmark everything else should be measured against. llama.cpp is a C++ inference binary with no telemetry, no analytics, and no update pings. The only network activity is what you explicitly invoke — pulling a model file. When you run llama-server locally, it defaults to 127.0.0.1:8080, reachable only from your own machine.

There is no vendor relationship to speak of here. No account, no analytics backend, no opt-out needed. If you're processing documents you wouldn't want to leave your machine under any circumstances — legal filings, medical records, unreleased source code — llama.cpp with a locally-stored GGUF file is the cleanest option in the ecosystem.

whisper.cpp and faster-whisper

Same situation as llama.cpp. Audio is loaded into RAM and VRAM, transcribed, and the result written to disk or stdout. Nothing touches a network. Both are pure inference libraries with no embedded telemetry in either the C++ (whisper.cpp) or Python (faster-whisper) implementations.

The trap worth flagging: the OpenAI Whisper API does the opposite — it sends your audio to OpenAI's servers, logs it per their terms, and is subject to OpenAI's data retention policies. If you installed openai-whisper (the original Python package) and you're calling it with an API key, you're using cloud transcription. If you installed faster-whisper or compiled whisper.cpp and you're loading a model from disk with no API key, your audio never leaves hardware.

LM Studio

LM Studio's privacy policy, last revised in early 2026, is one of the clearest in this space. The key sentence: "None of your messages, chat histories, and documents are ever transmitted from your system." The app collects no user-level telemetry and does no per-session tracking.

Three categories of data do leave the machine:

  • Update checks: app version + OS type hit their CDN during the check
  • Model searches: anonymous search queries when you browse the Discover tab (these go to Hugging Face)
  • Support contact: if you email support, they see your email and message content

None of that involves your AI conversations. Once models are downloaded, LM Studio runs fully offline — you can disable network access at the OS level and it still works.

One note: LM Studio Hub (the model-publishing feature) operates under a different policy and collects email, username, IP, and session data. That's separate from local inference and only applies if you publish content there. Most users will never touch it.

Open WebUI

Open WebUI defaults to privacy-off in the right direction. The Docker Compose configuration ships with ANONYMIZED_TELEMETRY=false, DO_NOT_TRACK=true, and SCARF_NO_ANALYTICS=true already set. The OpenTelemetry integration added in late 2025 is opt-in and self-hosted only — if you enable it, traces go to your own Grafana or Jaeger instance, not to Open WebUI's servers.

Conversation data lives in a local SQLite (or PostgreSQL, if you configure it) database on your own machine or server. Open WebUI requires login by default when deployed via Docker, which is the correct default behavior — it prevents anyone who can reach the HTTP port from browsing your chat history without credentials.

The older concern about upstream Chroma telemetry (from the vector database dependency) has been resolved through the ANONYMIZED_TELEMETRY=false default.

Ollama

Ollama's privacy stance on the vendor side is good. Their updated privacy policy (March 2026) is explicit: "We do not collect, store, transmit, or have access to your prompts, responses, model interactions, or other content you process locally." They collect "limited usage metadata" — app version and request counts — and state they don't use any of it for AI training.

The privacy threat with Ollama is not vendor behavior. It's deployment misconfiguration.

By default, Ollama binds to 127.0.0.1:11434 — local only, safe. If you set OLLAMA_HOST=0.0.0.0 to reach it from another device on your LAN, you expose an unauthenticated HTTP API. No API key. No password. Anyone who can reach port 11434 can run inference on your hardware and read any response.

In January 2026, researchers from SentinelLABS and Censys ran an internet-wide scan and found 175,108 publicly accessible Ollama instances across 130 countries — roughly half of them with tool-calling enabled, meaning they could execute code, query APIs, or interact with external systems on behalf of whoever called them. That's not a software vulnerability. It's a pattern: someone sets OLLAMA_HOST=0.0.0.0, opens port 11434 on their router, and walks away thinking they've just enabled LAN access.

One more thing: Ollama stores chat history in plain text at ~/.ollama/history on macOS and Linux, or %LOCALAPPDATA%\Ollama\history on Windows. If you're running inference on sensitive documents, set OLLAMA_KEEP_HISTORY=false before starting the server. The data isn't transmitted anywhere, but unencrypted plaintext on disk is its own exposure vector if the machine is shared.

For a deeper look at how Ollama compares against vLLM for multi-user deployments, see vLLM vs Ollama in 2026: When Each One Wins.

Continue.dev

Continue.dev is the most transparent tool in this audit about what it actually collects. From their telemetry documentation (updated February 2026):

By default, the VS Code and JetBrains extensions send to PostHog:

  • Whether you accepted or rejected a suggestion (not the code itself)
  • Model name and command name used
  • Number of tokens generated
  • Your OS and IDE type

Your actual code, prompts, and completions are never transmitted. The telemetry exists to help them understand which model/command combos are popular, nothing else.

To disable: VS Code → File → Preferences → Settings → search "Continue Telemetry" → uncheck. Or set CONTINUE_TELEMETRY_ENABLED=0 in your environment before launching the IDE.

If you're running a fully local stack — Continue.dev paired with Ollama and a local model — and you disable the PostHog telemetry, zero data leaves your machine from the AI workflow. See [Setting Up a Local AI Coding Stack

Top comments (0)