Dmitry Bondarchuk

Posted on Mar 6

I ran a privacy proxy on my AI traffic. Here's what it found.

#ai #chatgpt #privacy #security

When I built Velar — a local proxy that masks sensitive data before it reaches AI providers — I mostly thought of it as a tool for other people's problems.

I was wrong.

After running it on my own machine during normal browser-based interactions with ChatGPT, here's what it intercepted:

Masked Items
----------------------------------------
API_KEY:        30 ███████████████░░░░░
ORG:             9 ████░░░░░░░░░░░░░░░░
JWT:             1 ░░░░░░░░░░░░░░░░░░░░
Total:          40

40 items. Without doing anything unusual. But the story behind that API_KEY number is what really got me.

30 API keys — before I even hit Send

All 30 API_KEY detections came from a single session where I was editing a script directly inside the ChatGPT input field.

Here's the thing most people don't realize: ChatGPT sends the contents of the input field to its servers in the background as you type. Not when you hit Send — continuously, while you're still editing.

So I pasted a script that contained an API key, spent a few minutes tweaking it before sending, and ChatGPT quietly transmitted that script — and the key inside it — 30 times to OpenAI's servers before I was done.

I wasn't trying to send the key. I was just editing. That's the part that's hard to reason about intuitively.

This is a real gotcha with browser-based AI chat: the moment sensitive data touches the input field, it's potentially already in transit — regardless of whether you decide to actually send the message.

What about the other numbers

The 9 ORG detections are a good example of current limitations. These were false positives from the ONNX NER model — it flagged the Russian word "Расскажи" ("tell me") as an organization name. The model is trained on English only, so it occasionally misreads non-English text as named entities. Something I'm actively working on.

The 1 JWT is probably real — likely from a session token that ended up in a request payload somewhere.

A note on scope

This data covers browser-based interactions only — ChatGPT in the browser, routed through Velar's MITM proxy.

Intercepting IDE tools like Cursor or GitHub Copilot is a different and harder problem. They communicate over gRPC with protobuf, which requires a different interception approach than standard HTTPS traffic. That's on the roadmap, but not there yet — and honestly, that's probably where the more interesting (and scarier) data would come from, given that those tools have access to your full codebase.

What Velar does with detected values

Each value gets replaced with a deterministic placeholder before the request is forwarded:

sk-proj-abc123...  →  [API_KEY_1]
eyJhbGci...        →  [JWT_1]

The AI still gets enough context to be useful. When the response comes back, Velar restores the originals — so your tools keep working normally.

Everything runs locally. No cloud processing, no external logging, no callbacks home. MIT-licensed Go — you can read the source and verify.

The broader pattern

AI coding tools are getting more context access, not less. Cursor reads your whole codebase. Agents are being given filesystem and terminal access. The more capable these tools get, the more opportunities there are for sensitive data to end up in that context without anyone actively deciding to send it.

The input field thing is a small example of a bigger pattern: the boundary between "data I'm sharing" and "data that's being transmitted" is increasingly blurry. Most developers I've talked to haven't thought carefully about where that boundary sits.

Caveats

Velar is experimental — I'm still figuring out what it should become, and I'd be the first to say the detection isn't perfect. Regex-based detection for structured values like API keys is reasonably reliable. NER-based detection for things like names and organizations is still rough, as the false positives above show.

Also, yes — Velar is itself a MITM proxy, which is a fair thing to be skeptical about. It only intercepts domains you explicitly configure. The source is open and auditable.

Try it yourself

git clone https://github.com/ubcent/velar.git
cd velar
make build
./bin/velar ca init
./bin/velar start
./bin/velar proxy on

Run it for a few days and check velar stats. I'm curious whether other people hit the same input-field behavior — or find something I haven't seen yet.

If you try it, share your breakdown in the comments.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.