Lars Winstand

Posted on May 19 • Originally published at standardcompute.com

Self-hosted agents aren’t private if OpenClaw still sends your notes to GPT-5

#ai #privacy #automation #opensource

A self-hosted agent is only private if inference stays local.

That sounds obvious, but a lot of "local-first" agent marketing quietly blurs the line.

If your OpenClaw setup runs in Docker on your laptop but forwards raw notes, emails, tasks, or calendar history to GPT-5, Claude Opus 4.6, or Gemini, you do not have private inference. You have self-hosted orchestration with cloud reasoning.

That can still be a valid setup. It just is not the same thing.

I ended up thinking about this after reading some unusually honest OpenClaw discussions. The core tension was simple:

people want agents that run on their own hardware
their hardware often cannot handle serious local reasoning
they still do not want OpenAI, Anthropic, or anyone else seeing raw personal context

That tension is the real architecture problem.

“Runs locally” is doing way too much work

OpenClaw is a good example because it is a real tool developers are using, not a straw man.

Yes, you can self-host OpenClaw.

Yes, you can run it with Ollama.

Yes, you can keep it fully offline if you are willing to accept the tradeoffs.

But OpenClaw also supports the normal cloud model path: OpenAI, Anthropic, Google Gemini, OpenRouter, and local models side by side.

That means there are really two separate decisions:

Where the agent runtime lives
Where model inference happens

Those are not the same choice.

A container on your MacBook that sends a 64k-token prompt to Claude is not private inference. It is a local wrapper around a cloud API.

That distinction matters because a lot of teams hear "self-hosted" and assume "data never leaves the box." That assumption is how leaks happen.

Context pruning helps cost and performance, not privacy

I keep seeing context pruning presented like it solves privacy.

It does not.

Context pruning is useful. Summarization, compression, memory trimming, relevance filtering — all good. They reduce prompt size, lower noise, and usually make agents behave better.

But if the cloud model saw the raw content before the summary was created, the privacy event already happened.

If your pipeline looks like this:

Pull Gmail threads, Notion docs, Slack messages, and calendar events
Send raw content to GPT-5 for summarization
Save only the condensed summary

...then step 2 is the problem.

You reduced future exposure. You did not prevent initial exposure.

That is why "we prune context" feels like a slippery answer when the real question is whether sensitive data ever left the machine.

The practical fix: preprocess locally before any cloud call

The boring answer is the right one.

If you cannot do full local inference, do local preprocessing first.

That means:

detect sensitive entities locally
redact, mask, or pseudonymize them locally
send only sanitized text to the cloud model
keep the mapping on-device if you need to restore the original values later

That is a real privacy improvement because it changes what the cloud provider ever sees.

Not perfect. But real.

Three patterns, three very different privacy stories

Inference pattern	Where reasoning runs	What leaves the machine	Privacy tradeoff
Full local with Ollama + Qwen	Local only	Nothing except optional sync metadata	Best privacy, lower capability on modest hardware
Self-hosted OpenClaw + cloud model	Cloud API like OpenAI or Anthropic	Raw notes, tasks, emails, attachments, summaries	Convenient, but provider processes sensitive content
Hybrid local preprocessing + cloud reasoning	Local first, then cloud	Sanitized text with placeholders or masked fields	Best practical balance when local hardware is limited

That hybrid pattern is the one more teams should be talking about.

A minimal local redaction step with Presidio

Microsoft Presidio is a solid starting point if you want local PII detection and anonymization.

Here is the simplest version:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine, OperatorConfig

text = "Email John Smith at john@example.com about invoice 48291."

analyzer = AnalyzerEngine()
results = analyzer.analyze(text=text, language="en")

anonymizer = AnonymizerEngine()
result = anonymizer.anonymize(
    text=text,
    analyzer_results=results,
    operators={
        "PERSON": OperatorConfig("replace", {"new_value": "<PERSON>"}),
        "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"})
    }
)

print(result.text)

Output:

Email <PERSON> at <EMAIL> about invoice 48291.

That is not enough for every workflow, but it is already much better than shipping raw text upstream.

If you need stronger masking, you can replace exact amounts, account IDs, addresses, or provider names with typed placeholders.

Example:

Original:
"Dr. Patel said my follow-up is Tuesday at 3pm. My balance is $1,284.11."

Sanitized:
"<HEALTHCARE_PROVIDER> said my follow-up is <DAY> at <TIME>. My balance is <AMOUNT_RANGE>."

Now GPT-5 or Claude can still reason over the task without seeing the exact identifiers.

A local-first pipeline with OpenClaw or n8n

Here is the architecture I would actually recommend for most teams:

Raw source -> local classifier/redactor -> sanitized prompt -> cloud model -> local rehydration if needed

That can be implemented in a few different ways.

Option 1: local model with Ollama for preprocessing

Run a small local model for entity detection or structured extraction:

ollama serve
ollama run qwen2.5:7b

Use that local model to tag names, emails, account numbers, or secrets before the main reasoning call.

Option 2: Presidio for deterministic redaction

Use Presidio locally inside your worker or sidecar service.

This is usually better when you want predictable masking rules instead of LLM-style guessing.

Option 3: combine both

Presidio for obvious PII
Ollama for workflow-specific patterns
cloud model for the hard reasoning step

That split works well in agent systems because not all sensitive data looks like classic PII.

A weird combination of client names, vendors, meeting cadence, and internal notes can still be highly revealing even if no single token looks dangerous.

What should never leave the machine by default

Teams get vague here, and vague systems leak.

I would start with four local-only categories:

Secrets: API keys, passwords, OAuth tokens, private certificates
Direct identifiers: names, phone numbers, personal emails, addresses
Regulated content: health, payroll, tax, legal, HR data
Workflow fingerprints: client names, internal process notes, vendor combinations, schedule patterns

That last one matters more than people think.

A life-tracking or ops agent does not need to leak your exact therapist, your kid’s school, your daily revenue pattern, and your recurring reminders to become invasive. The pattern itself can be sensitive.

A simple rule that actually holds up

If GPT-5, Claude, or Gemini does not need the exact value to reason correctly, replace it locally before the request.

Not after summarization.

Not after retrieval.

Before the request.

If the exact content is the task — for example, drafting a reply to a real customer email — then be honest about the tradeoff:

keep that workflow fully local, or
accept that a provider is processing the raw content under business/API terms

There is no third option where raw data goes to the cloud but somehow remains local.

Are OpenAI and Anthropic safe enough?

For many business workflows, probably yes.

Equivalent to local inference? No.

That distinction is the whole point.

OpenAI API and Anthropic API terms are materially better than pasting sensitive data into random consumer chat apps. Business/API products generally have clearer retention, security, and training policies.

That is good.

But if the provider receives the content, the provider is still processing the content.

You can reduce risk with:

API usage instead of consumer chat
enterprise or business terms
zero-retention or restricted-retention options where available
routing controls when using services like OpenRouter

Still not the same as local inference.

OpenRouter controls are useful, but they are not magic

If you route across providers, OpenRouter has request-level controls worth using.

Example:

{
  "model": "openai/gpt-4.1",
  "messages": [
    {"role": "user", "content": "sanitized prompt here"}
  ],
  "provider": {
    "data_collection": "deny",
    "zdr": true,
    "allow_fallbacks": true
  }
}

That is better than sending raw prompts with default settings.

It is still cloud processing.

The right mental model is:

local inference = strongest privacy
sanitized cloud inference = practical middle ground
raw cloud inference = convenience first

Example: n8n workflow with local sanitization before cloud reasoning

This is the pattern I wish more automation teams used.

Pull a Gmail message in n8n
Send the body to a local redaction service
Forward the sanitized text to an OpenAI-compatible endpoint
Store only sanitized context in memory
Rehydrate locally if the final action needs original values

Pseudo-flow:

const rawEmail = $json.body;

const sanitized = await fetch("http://localhost:8080/redact", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ text: rawEmail })
}).then(r => r.json());

const llmResponse = await fetch("https://api.standardcompute.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.STANDARD_COMPUTE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "openai/gpt-4.1",
    messages: [
      { role: "system", content: "Classify and draft a response." },
      { role: "user", content: sanitized.text }
    ]
  })
}).then(r => r.json());

That gets you a much saner setup:

raw customer data stays local
the cloud model still handles the reasoning
your automation can run at scale without turning into a privacy mess

Privacy and cost become the same ops problem at scale

This is the part a lot of teams discover the hard way.

Once you choose cloud reasoning for capability, you now have two operational concerns:

what data is leaving the machine
what your automation costs under load

If your n8n, Make, Zapier, or OpenClaw workflow is making hundreds or thousands of calls a day, per-token pricing starts changing behavior. People trim prompts too aggressively, avoid useful automations, or babysit usage dashboards because one long chain can spike cost.

That is why the economics matter here too.

If you are already doing the right thing architecturally — local sanitization first, cloud reasoning second — it also helps to run that cloud step through an OpenAI-compatible API with predictable monthly pricing.

That is the appeal of Standard Compute.

It is a drop-in OpenAI API replacement built for agent and automation workloads, with flat monthly pricing instead of per-token billing. So if your workflow is doing high-volume sanitized model calls from n8n, Make, Zapier, OpenClaw, or a custom client, you can keep the architecture practical without turning every run into a cost-anxiety exercise.

The privacy fix is local preprocessing.

The ops fix is predictable pricing.

Both matter if you want agents running all day without constant supervision.

What I ask now before trusting any “local-first” claim

Not "Can I self-host this?"

This:

At what exact step does raw sensitive content leave the machine, and in what form?

If the answer is fuzzy, the privacy story is fuzzy.

If the answer is "OpenClaw runs locally, but we send the full prompt to OpenAI," then you do not have private inference.

If the answer is "We classify and pseudonymize locally with Ollama or Presidio, keep the mapping local, and only send sanitized text to GPT-5 or Claude," that is a serious design.

If the answer is "Everything runs locally with Qwen, no internet, workspace-only permissions," that is the gold standard.

The checklist I would actually use

Decide what data can never leave the device
Detect and transform that data locally
Send only the minimum useful representation upstream
Use API/business terms, not consumer chat apps
Add provider privacy controls when cloud is unavoidable
If the workflow runs at scale, use an OpenAI-compatible endpoint with predictable monthly pricing so nobody has to fear every extra prompt

That is the middle ground.

Not flashy. Just correct.

And honestly, that is the line more developers need to hear:

Local control is not the same thing as private inference.

For most teams, the practical architecture is local preprocessing plus cloud reasoning. The goal is not to pretend the cloud disappeared. The goal is to make sure the cloud only sees sanitized prompts — and to do it in a way that is sustainable for both privacy and cost.

DEV Community