Your Coding Agent Doesn't Need Your Secrets

#ai #agents #opensource #programming

Every coding agent I use can read my .env file. Every one of them is a single prompt away from streaming its contents to a server I don't control. The fix has been obvious since the day Claude Code launched — redact on the way out, rehydrate on the way in — and a year later, no major vendor has built it into the client where it belongs.

Third-party proxies that do a version of this exist. The agents themselves don't ship it. That gap is the whole story: the one place this protection makes sense is the one place it's missing.

Start from an uncomfortable assumption — anything you send to an inference endpoint has zero security. Every major provider would object, and on paper they'd have a case. But two years of provider-side leaks have convinced me it's the safer bet to assume the worst. If you wouldn't paste a value into a public Slack channel, you shouldn't hand it to a remote model either. That covers two overlapping buckets: secrets like API tokens, private keys, and passwords; and personal data like names, phone numbers, and the occasional social security number that's somehow both at once. A proxy sitting in the middle can scan for all of it.

The Shape of the Fix

A local proxy sits between the agent and the inference endpoint. On the outbound side it scans the payload for anything that looks sensitive and replaces each match with a deterministic placeholder — [[REDACTED_PHONE:8f3a]], [[REDACTED_TOKEN:3f2a]], one per distinct value. Alongside the redacted payload it appends a short instruction telling the model what those placeholders are: opaque strings the client holds privately, to be treated as inscrutable identifiers and reproduced verbatim if the response needs them.

The redacted prompt goes up. The model does its work having never seen the real value. On the way back, the proxy runs the response through a string replace — every placeholder swapped for its original. The user sees a normal answer. The model saw nonsense tokens. The secret never left the machine.

The prompt-injection step is the part people skip, and it's the part that makes the whole thing work. As long as the model treats [[REDACTED_PHONE:8f3a]] the way it would treat a phone number — and returns the literal string unchanged, same hash on the end — rehydration is a trivial lookup. A single placeholder format can stand in for an unbounded number of distinct values within one session.

If you controlled the stack, you wouldn't need the injection at all. A vendor could carry that mapping in a structured side-channel — a JSON attachment, a field the model is trained to respect — and bake pass-through behavior far below the prompt layer. As an outsider with no access to the stack, prompt injection is the lever I have, and it's good enough to prove the idea works. It is not the version that should ship.

The placeholder-to-value map lives encrypted in memory. Pick a delimiter unlikely to appear in real code and collisions on the rehydration pass round to zero. I'm overstating the simplicity, slightly — there are edge cases. Secrets the model legitimately needs to modify (rare). Secrets that span multiple chunks of streaming output (annoying). False positives on the redaction pass (manageable). None of these are research problems. They're engineering work.

Why "Just Use Presidio" Isn't Enough

Microsoft's Presidio identifies and redacts PII well, including a fair number of international and borderline-uncommon formats. Yelp's detect-secrets is the other obvious building block — but it's a detector, not a redactor. It finds credentials so a pre-commit baseline can block them; it doesn't rewrite anything on the wire. Wire either into a proxy like LiteLLM or Bifrost and you get detection plus outbound redaction.

Two things you still don't get. The first is a clean rehydration path. Presidio technically has a reversible mode, but it emits an AES-encrypted blob rather than a readable placeholder, and reversing it is a separate decrypt pass you have to orchestrate yourself. The mapping that lets you put the original value back was never designed to survive a round trip through a language model — and detect-secrets, being detection-only, offers nothing here at all.

The second, and the one nobody mentions, is the prompt-injection layer that tells the model what the placeholder is. Without it, the model treats [REDACTED_TOKEN] as junk or as a fill-in-the-blank exercise. You get "I'm not sure what REDACTED_TOKEN refers to" instead of clean pass-through. The model has to be told, explicitly, that this is a placeholder and it should leave it alone. Presidio won't do that for you. Neither will detect-secrets.

LiteLLM and Bifrost will both let you script all of this by hand, if you're willing to write the integration. Most developers won't — and more to the point, most developers don't run a local inference proxy at all. Standing one up is a pain, and keeping it working is worse: every few weeks the upstream APIs shift and I'm back tweaking shims to hold the seam together. I don't mind that. The person dipping into Claude Code through something like Cowork has neither the time nor the inclination. A protection only advanced users can stand up isn't a protection. That's the honest limit of my own project, too: a proxy bolted onto someone else's stack will always have seams. The durable version lives inside the tool.

Why This Belongs Inside the Coding Agent

Coding agents are where the problem is structural, not incidental. The entire purpose of a .env file is to hold values the application needs and the developer should never paste anywhere. An agent that reads project files reads .env. An agent that writes new code references what's in it. The agent's job and the secret's purpose are in tension by design.

The sandboxing has genuinely improved. Auto mode and tighter default permissions mean Claude Code goes off-script far less than it did nine months ago. But those are shims on the symptom. No amount of prompt engineering — however clever — can guarantee a secret is never read and sent, and at the volume of inference requests a working developer generates in a day, a one-percent slip stops being a tail risk and becomes a near-certainty that eventually visits everyone. Redact-rehydrate is the first thing in this space that's a fix rather than a fence.

Every major vendor knows the gap is there. The reasons it's still open are the usual ones: it's fiddly, it's lower priority than the next demo, users haven't complained loudly enough yet. Those reasons are real, but none of them are good — and this is the rare case where the hard part is deciding to do it, not doing it.

The Differentiation Nobody Has Claimed

A vendor who shipped this could say something none of its competitors can: we are actively engineering so that we never see your secrets or your personal data — securing what does reach us, and building the tooling so most of it never arrives in the first place. Back that with a code path anyone can audit and it's a real claim, not the "we take your privacy seriously" wallpaper that every settings page already carries.

This sits squarely in Anthropic's lane. Their differentiation from OpenAI, Google, and the rest has always been trust. Whatever you make of any single company, more people right now hand their data to Anthropic with less hesitation than they would to Google or Meta — and compounding that reputation with a feature you can actually verify is about as close to a no-brainer as product strategy gets. Trust you can read in the source beats trust you have to take on faith.

What I'm Building

On evenings and weekends I've been reimplementing the parts of Presidio I care about — entity detection, structured placeholder generation, deterministic replacement, ergonomic encryption — as a Rust library called octarine, wrapped in a local proxy that runs the redact-inject-rehydrate round trip transparently in front of Anthropic, OpenAI, or anything else speaking the same API. It's open source. You can read every line.

It's sizable — the Rust alone runs to roughly 200,000 lines, close to half of it tests, with some nine thousand of them running on every change — and it's also the first substantial thing I've built in Rust, roughly 95% written by the agent with me reviewing as time allowed. I don't offer that as a humblebrag or a confession. I offer it because it's the thesis in miniature. I know where the bodies are buried in a redaction pipeline — the architecture, the validation, the failure modes — and the agent handles the syntax I'd otherwise be looking up. This is attempt three: the first died in Python, the second on a bad Rust architecture, and the third survived because I pulled the core logic into a clean library and let well-worn patterns carry the structure. Along the way I built a handful of my own agents and skills to catch tech debt before it set.

In my own testing it works. The model handles placeholder tokens as opaque strings without further prodding, the round trip is fast enough not to notice, and the false-positive rate drops to tolerable once the patterns are tuned. I'm under no illusion that it's the answer. It's an experiment — useful to advanced users, a good way to learn what Rust can do, and a standing existence proof that the hard parts are tractable. The real answer, when it comes, gets built by someone who controls the whole stack and can do it better than any proxy ever will.

What Anthropic Should Do

Of the handful of companies positioned to ship this, Anthropic is the one I'd bet on — not because it's a small team (it isn't), but because it has the culture and the wherewithal to treat this as a priority. They've had a year. That I can install Claude Code this afternoon and, with one reasonable-sounding request, stream my .env to a remote endpoint is a gap I'd like to see closed — and one I'd happily help close.

It is not a huge lift. A team with their resources could do everything I've done, and more, and better, in a couple of weeks. I'm working around a token budget, a Rust learning curve, and whatever evenings happen to be free; they have none of those constraints. Until someone ships it in the client, anyone building tooling around coding agents should treat secrets-on-the-wire as a first-class concern, not an afterthought. The fix is small. The cost of leaving it unbuilt isn't.