DEV Community: Genevieve Breton

PromptCape vs PromptBase: similar names, different products

Genevieve Breton — Tue, 26 May 2026 14:40:32 +0000

I keep getting the same question: "Is PromptCape the same thing as PromptBase?"

No. They're different products solving different problems for different audiences. The names look alike, the spaces overlap (both are AI-adjacent), and Google sometimes autocorrects one to the other. This short article exists to make the distinction explicit — for humans, and for search engines.

If you're here because you typed "promptcape" into Google and landed on PromptBase, this article is the bridge.

PromptBase, in one paragraph

PromptBase is a marketplace for AI prompts. Designers, copywriters, and AI hobbyists list prompts they've engineered for image models (Midjourney, Stable Diffusion, DALL·E) and for text models (ChatGPT, Claude). Buyers download the prompts and use them to generate their own content. Think of it as Etsy for prompt engineering. It launched in 2022 and has been growing steadily as the "selling prompts as digital products" niche matured.

Target user: anyone who uses AI tools to generate content (visual, marketing, copy) and wants higher-quality prompts than they could write themselves.

Problem solved: distribution and monetization of prompt engineering as craft.

PromptCape, in one paragraph

PromptCape is a Java code obfuscation proxy for AI coding assistants. It sits between your IDE (Claude Code, Cursor, Mistral) and the AI API, renames every identifier in your source code — InvoiceService becomes Cls_a1b2c3d4, customerName becomes fld_e5f6a7b8 — sends the obfuscated version to the AI, then reverses the rename on the way back. The AI works on the obfuscated code without ever seeing your real class names, package structure, or business domain language. It targets developers and teams who have IP protection clauses, NDAs, or compliance requirements that constrain what can be sent to cloud-based AI tools.

Target user: Java developers and engineering teams who want to use AI coding assistants without exposing proprietary source code to AI training corpora or third-party logs.

Problem solved: keeping source code IP private while still benefiting from AI coding assistance.

Side by side

	PromptBase	PromptCape
What it is	Marketplace for AI prompts	Code obfuscation proxy for AI assistants
Audience	Content creators, designers, marketers	Software developers and engineering teams
Primary value	Buy/sell pre-engineered prompts	Protect source code from AI cloud APIs
Used for	Image generation, copywriting	Java coding with Claude Code / Cursor / Mistral
Touches your code?	No	Yes — that's the whole point
Pricing	Per-prompt purchases	Free for 3 months, then paid license
Founded	2022	2026

The overlap is zero. PromptBase is about the inputs (prompts as digital products). PromptCape is about the inputs and the outputs of an AI coding loop, with a strong focus on what leaves your machine.

Why the names look so close

Both names start with "Prompt" because both are in the AI space. The follow-on word makes the difference:

PromptBase — "Base" as in database, foundation, the collection where prompts live and are exchanged.
PromptCape — "Cape" as in the garment that shields; a cape over your code before it travels to the AI.

I picked "Cape" for the protection metaphor, knowing the proximity to existing names was a risk. The metaphor is the whole product positioning: your code is protected when it goes out and comes back.

Which one do you actually need?

"I want to find a great Midjourney prompt for a vaporwave cityscape" → PromptBase.
"I want to find a great ChatGPT prompt for cold sales emails" → PromptBase.
"I want to use Claude Code on a private Java repo without sending real class names to Anthropic" → PromptCape.
"My client added a no-AI-assistants clause and I want to comply without giving up AI productivity" → PromptCape.
"I work in a regulated industry (banking, health, defense) and need to obfuscate source identifiers before AI calls" → PromptCape.

If you're a developer and the latter sounds relevant, the landing page is at promptcape.com. If you're looking for prompts to buy, you want promptbase.com.

No rivalry, no overlap. Just two products with names that happen to start with the same five letters.

Building a transparent terminal-based proxy for Claude Code in Cursor (or any IDE)

Genevieve Breton — Thu, 21 May 2026 16:18:14 +0000

The previous two articles in this series (part 1: obfuscation, part 2: the 3-way merge) were about what happens to your code. This one is about what happens to your developer.

I had a CLI that could obfuscate a Java project, send it to Claude, and merge the changes back. The pipeline worked. But the actual day-to-day flow was: run a CLI command to obfuscate, copy the obfuscated workspace path, paste it into Claude Code, work in Claude, copy the AI's output back, run another CLI command to merge. Five context switches per AI interaction. Nobody — including me — was going to use it twice.

The friction was the integration. Every IDE has its own way of talking to Claude or to OpenAI. Cursor has its own Claude pane, JetBrains has its own AI assistant, VS Code has Copilot. I was not going to build a plugin for each one, maintain it, watch them break every release.

The shortcut that solved it: a transparent localhost HTTP proxy. About 200 lines of code, no IDE plugin, no Cursor extension, no fork of anything. The developer types claude in Cursor's built-in terminal and PromptCape is silently between them and the API.

This article is the how of that proxy: the architectural choice, the five traps that made it harder than I expected, and why this approach generalizes to almost anything that talks to an LLM.

The decision: don't wrap the IDE, wrap the network

When you set out to integrate a tool into an IDE, the obvious-looking path is to write a plugin. JetBrains has its plugin API, VS Code has its extension model, Cursor has its own integrations. You quickly realize:

Each IDE has its own API, packaging, and review process.
AI features inside each IDE evolve fast — every release threatens to move where the conversation hooks live.
For a tool that has to see every prompt and every response, you end up reimplementing the wire protocol per IDE anyway.

The shortcut nobody mentions: every modern AI coding assistant respects a base URL environment variable. Claude Code uses ANTHROPIC_BASE_URL. The OpenAI ecosystem (which Cursor and many others speak) uses OPENAI_BASE_URL. Set it, and the client points at your server instead of api.anthropic.com or api.openai.com.

That collapses the integration problem from "write N IDE plugins" to "run a reverse proxy on localhost." One code path. Every IDE that respects the env var works for free.

The mental model:

 Cursor terminal               PromptCape proxy        Anthropic API
 ┌─────────────┐  obfuscation  ┌──────────────┐  HTTPS  ┌──────────┐
 │   claude    │ ────────────► │  localhost   │ ───────►│  real    │
 │   (CLI)     │ ◄──────────── │   :8077      │ ◄───────│  API     │
 └─────────────┘   de-obf'd    └──────────────┘   obf'd └──────────┘

From Cursor's point of view, the user opened a terminal and ran claude. There is no extension. There is no patched binary. The proxy is invisible to the IDE because the IDE was never the integration point — the network was.

The bare minimum

Stripped of the obfuscation logic, the proxy is uncomfortably simple. A Javalin-based catch-all that takes any POST, rewrites the body, forwards it to the real API, and pipes the response back:

app.post("/*", ctx -> {
    String body = ctx.body();
    String rewritten = interceptRequest(body);
    HttpRequest req = HttpRequest.newBuilder()
        .uri(URI.create(targetBaseUrl + ctx.path()))
        .POST(HttpRequest.BodyPublishers.ofString(rewritten))
        // ... forward headers, minus hop-by-hop
        .build();
    HttpResponse<String> resp = httpClient.send(req,
                             BodyHandlers.ofString());
    ctx.status(resp.statusCode());
    ctx.result(interceptResponse(resp.body()));
});

If your "interception" is a no-op, that's a transparent proxy. The two interception methods are where obfuscation happens — translating real names → obfuscated names on the way out, and obfuscated → real on the way back.

The thing that surprised me is how little IDE knowledge is needed. The IDE never sees the proxy, never knows the URL was rewritten, never knows the conversation passed through anything. The contract is HTTP and a base URL.

Trap 1: streaming responses

The first version handled responses with BodyHandlers.ofString() — buffer the whole response, transform, return. Claude Code uses streaming responses (SSE — server-sent events). The first time I tested under real load, the user-visible behavior was: silence for 8 seconds, then the entire answer dumped at once.

Streaming isn't a nice-to-have. Developers expect tokens to flow as they're generated; that's a big chunk of what "feels like AI" is. You have to forward chunks as they arrive and de-obfuscate them on the fly.

The Java HTTP client supports BodyHandlers.ofInputStream(), which gives you an open socket. You read SSE events line by line, run each one through the de-obfuscation pass, write it back to the client's output stream, flush after each event boundary:

HttpResponse<InputStream> resp = httpClient.send(req,
                         BodyHandlers.ofInputStream());
try (BufferedReader r = new BufferedReader(new 
                InputStreamReader(resp.body(), UTF_8));
     OutputStream out = ctx.outputStream()) {
    String line;
    while ((line = r.readLine()) != null) {
        String processed = processor.processLine(line);
        out.write(processed.getBytes(UTF_8));
        out.write('\n');
        if (line.isEmpty()) out.flush();// SSE event boundary
    }
}

The subtlety is in processor.processLine. SSE events look like:

event: content_block_delta
data: {"type":"content_block_delta",
       "index":0,
       "delta":{
          "type":"text_delta",
          "text":"InvoiceService"}}

You can't just regex-replace on the raw line — InvoiceService might be split across two chunks (Invoice in one, Service in the next) by the server's tokenizer. The processor maintains a small carry-over buffer that holds the trailing bit of the previous chunk, joins it with the new chunk, runs the replacement, then writes everything except a tail of length max-mapping-length back out.

This is the kind of thing that doesn't show up in unit tests with full strings but breaks the moment a real API tokenizes mid-identifier. The fix is mechanical once you see it — but you'll only see it if you test against the real API, not a mocked one.

Trap 2: accept-encoding

This one cost me a day. The proxy was buffering responses fine, but the de-obfuscation logic was matching zero identifiers. The response body looked like binary garbage in the logs.

The cause: I was faithfully forwarding the IDE's request headers — including accept-encoding: gzip, br. The real API obliged and returned a gzipped response. My text-based interceptor parsed the gzipped bytes as if they were JSON, found no identifiers to replace, and forwarded the still-gzipped bytes to the client. The client decompressed them on its end, so the user saw a plausible response — but with no obfuscation reversal.

The fix is one line: strip accept-encoding from the forwarded request. Now the API returns uncompressed JSON, the interceptor sees text, the round trip works.

private static final Set<String> HOP_BY_HOP_HEADERS = Set.of(
    "host", "connection", "keep-alive", "transfer-encoding",
    "te", "trailer", "upgrade", "content-length",
    "accept-encoding" // ← critical: keep responses uncompressed
);

Worth a half-line comment in the code. It's the kind of single-character mistake that produces a silently wrong system, not a noisy crash.

Trap 3: don't translate tool blocks

Claude's API content isn't a flat string. It's a list of typed blocks:

{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", 
       "text": "Refactor InvoiceService to use Optional"},
      {"type": "tool_result", 
       "tool_use_id": "...", 
       "content": "package com.acme; ..."},
      {"type": "tool_use", 
       "id": "...", 
       "name": "read_file", 
       "input": {"path": "..."}}
    ]
  }]
}

The user-typed text needs translation: InvoiceService → Cls_a1b2c3d4. But the tool_result block contains the contents of a file the AI just read — from the obfuscated workspace. It's already obfuscated. If I run it through the translator, nothing visibly happens (the obfuscated names don't match the real-name patterns), but the moment a real name accidentally appears in a comment that survived stripping, you've now obfuscated something inside a string that came from an already-obfuscated context. It rapidly gets harder to round-trip.

The fix: walk the content array, look at the type field, only translate "text" blocks. Leave "tool_result" and "tool_use" blocks untouched.

for (JsonNode block : contentArray) {
    String type = block.path("type").asText("");
    if ("text".equals(type) && block.has("text")) {
        ((ObjectNode) block).put("text", 
          translateText(block.get("text").asText()));
    }
    //tool_use, tool_result → leave alone, already in obfuscated space
}

This is the corollary of the bigger architectural choice: the AI works in an obfuscated workspace, not just on obfuscated prompts. The file system the AI sees through read_file is the obfuscated cache directory. Everything it reads is already obfuscated. The proxy only needs to translate the human-readable channel: what the user types, and what the AI replies in text.

Trap 4: HTTP/2 pseudo-headers

This was the obscure one. The Java HTTP client speaks HTTP/2 to modern APIs. HTTP/2 has pseudo-headers — :status, :method, :path — that are legal at the protocol layer but illegal in HTTP/1.1 responses. My proxy was happily copying every response header from the API back to the Cursor terminal, including :status. Some clients tolerate this; some (Claude Code) reject the response.

apiResponse.headers().map().forEach((name, values) -> {
    String lower = name.toLowerCase();
    if (lower.startsWith(":")) return;// skip HTTP/2 pseudo-headers
    // ... forward the rest
});

One of those bugs that exists at the protocol seam between two HTTP versions. The Java HTTP client gives you the HTTP/2 headers in their HTTP/2 form, and you're shipping them to a client that may or may not be reading HTTP/2 framing. Filter aggressively.

Trap 5: making it forget-about-it-able

A foreground proxy in a terminal works for a demo. For daily use, developers want the proxy running quietly in the background so they can open a new terminal and claude immediately. So the CLI grew a --detach mode:

Spawn a child JVM running the proxy in the foreground.
Inherit env (so the license key propagates).
Redirect stdout/stderr to ~/.promptcape/proxy.log.
Write the child PID to ~/.promptcape/proxy.pid.
Wait up to 5 seconds for the port to come up, then exit.

ProcessBuilder pb = new ProcessBuilder(cmd);
pb.redirectErrorStream(true);
pb.redirectOutput(ProcessBuilder.Redirect.appendTo(logFile.toFile()));
pb.redirectInput(new File(isWin ? "NUL" : "/dev/null"));

Process child = pb.start();
Files.writeString(pidFile, String.valueOf(child.pid()));

Plus a --stop that's idempotent (returns 0 if the proxy is already gone — a stale PID file isn't an error), and a --logs that tails the log file with the rotation handling you'd expect.

These are the kinds of features users discover they need three days in. "How do I see what the proxy is doing without restarting it in the foreground?" — --logs. "I don't remember if the proxy is running, can I just run --stop to be safe?" — yes, it's idempotent. None of this is technically deep, but skipping it makes the tool feel rough.

The Cursor angle: there is no Cursor angle

Here's the punchline. Once you have a localhost reverse proxy that respects ANTHROPIC_BASE_URL, integrating with Cursor isn't a feature. It's the absence of one.

The workflow inside Cursor:

Open Cursor.
Open the built-in terminal (Ctrl+`).
Run promptcape proxy --detach (or have it running already from a startup script).
Run ANTHROPIC_BASE_URL=http://localhost:8077 claude — or just claude if you exported the env var.
Use Claude Code normally.

There is no Cursor plugin to install. There is no JSON config to edit. There is no .cursorrules file to set up. The terminal is just a shell, the shell respects environment variables, the env var changes the API endpoint, the proxy does the rest.

That's the win. The integration cost — for me, for the user, for every future IDE — collapsed to nothing.

You can wrap it up as a small launcher script. I called mine pcc (PromptCape Claude). It does export ANTHROPIC_BASE_URL=...; exec claude "$@". Three lines. The user types pcc instead of claude and everything is obfuscated end to end.

Why this generalizes

I think the broader takeaway is worth more than the specific implementation:

If a tool you want to integrate with reads an HTTP endpoint, write a reverse proxy before you write a plugin. The endpoint is the integration point. The plugin is at best a config helper around the same indirection.

This applies far beyond AI tooling. Anything that talks to a SaaS API and respects a base URL — analytics, observability, payments — can be sandboxed, intercepted, transformed, or replayed with the same pattern. Plugins are per-IDE; proxies are per-protocol. Per-protocol wins.

The specific lesson for AI tooling: the prompt and the workspace are different channels. Translating the workspace (the file system the AI reads through tools) and translating the prompt (the human-typed text) are two different problems. Conflate them and you double-obfuscate. Keep them separate, type-tagged content blocks make this trivial, and the proxy stays small.

If you want to see the proxy code in full, the streaming SSE processor, and the conversation samples (real-name in, obfuscated-name on the wire, real-name back), the worked examples are in gitlab.com/gbreton7/promptcape-docs. This is the third and last article of the PromptCape series — obfuscation pipeline, 3-way merge, transparent proxy. MRs welcome on the docs repo if you've integrated this pattern with an IDE I haven't tried.

Building a transparent terminal-based proxy for Claude Code in Cursor (or any IDE)

Genevieve Breton — Thu, 21 May 2026 16:18:14 +0000

The previous two articles in this series (part 1: obfuscation, part 2: the 3-way merge) were about what happens to your code. This one is about what happens to your developer.

This article is the how of that proxy: the architectural choice, the five traps that made it harder than I expected, and why this approach generalizes to almost anything that talks to an LLM.

The decision: don't wrap the IDE, wrap the network

Each IDE has its own API, packaging, and review process.
AI features inside each IDE evolve fast — every release threatens to move where the conversation hooks live.
For a tool that has to see every prompt and every response, you end up reimplementing the wire protocol per IDE anyway.

That collapses the integration problem from "write N IDE plugins" to "run a reverse proxy on localhost." One code path. Every IDE that respects the env var works for free.

The mental model:

 Cursor terminal               PromptCape proxy        Anthropic API
 ┌─────────────┐  obfuscation  ┌──────────────┐  HTTPS  ┌──────────┐
 │   claude    │ ────────────► │  localhost   │ ───────►│  real    │
 │   (CLI)     │ ◄──────────── │   :8077      │ ◄───────│  API     │
 └─────────────┘   de-obf'd    └──────────────┘   obf'd └──────────┘

The bare minimum

Stripped of the obfuscation logic, the proxy is uncomfortably simple. A Javalin-based catch-all that takes any POST, rewrites the body, forwards it to the real API, and pipes the response back:

app.post("/*", ctx -> {
    String body = ctx.body();
    String rewritten = interceptRequest(body);
    HttpRequest req = HttpRequest.newBuilder()
        .uri(URI.create(targetBaseUrl + ctx.path()))
        .POST(HttpRequest.BodyPublishers.ofString(rewritten))
        // ... forward headers, minus hop-by-hop
        .build();
    HttpResponse<String> resp = httpClient.send(req,
                             BodyHandlers.ofString());
    ctx.status(resp.statusCode());
    ctx.result(interceptResponse(resp.body()));
});

Trap 1: streaming responses

HttpResponse<InputStream> resp = httpClient.send(req,
                         BodyHandlers.ofInputStream());
try (BufferedReader r = new BufferedReader(new 
                InputStreamReader(resp.body(), UTF_8));
     OutputStream out = ctx.outputStream()) {
    String line;
    while ((line = r.readLine()) != null) {
        String processed = processor.processLine(line);
        out.write(processed.getBytes(UTF_8));
        out.write('\n');
        if (line.isEmpty()) out.flush();// SSE event boundary
    }
}

The subtlety is in processor.processLine. SSE events look like:

event: content_block_delta
data: {"type":"content_block_delta",
       "index":0,
       "delta":{
          "type":"text_delta",
          "text":"InvoiceService"}}

Trap 2: accept-encoding

This one cost me a day. The proxy was buffering responses fine, but the de-obfuscation logic was matching zero identifiers. The response body looked like binary garbage in the logs.

The fix is one line: strip accept-encoding from the forwarded request. Now the API returns uncompressed JSON, the interceptor sees text, the round trip works.

private static final Set<String> HOP_BY_HOP_HEADERS = Set.of(
    "host", "connection", "keep-alive", "transfer-encoding",
    "te", "trailer", "upgrade", "content-length",
    "accept-encoding" // ← critical: keep responses uncompressed
);

Worth a half-line comment in the code. It's the kind of single-character mistake that produces a silently wrong system, not a noisy crash.

Trap 3: don't translate tool blocks

Claude's API content isn't a flat string. It's a list of typed blocks:

{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", 
       "text": "Refactor InvoiceService to use Optional"},
      {"type": "tool_result", 
       "tool_use_id": "...", 
       "content": "package com.acme; ..."},
      {"type": "tool_use", 
       "id": "...", 
       "name": "read_file", 
       "input": {"path": "..."}}
    ]
  }]
}

The fix: walk the content array, look at the type field, only translate "text" blocks. Leave "tool_result" and "tool_use" blocks untouched.

for (JsonNode block : contentArray) {
    String type = block.path("type").asText("");
    if ("text".equals(type) && block.has("text")) {
        ((ObjectNode) block).put("text", 
          translateText(block.get("text").asText()));
    }
    //tool_use, tool_result → leave alone, already in obfuscated space
}

Trap 4: HTTP/2 pseudo-headers

apiResponse.headers().map().forEach((name, values) -> {
    String lower = name.toLowerCase();
    if (lower.startsWith(":")) return;// skip HTTP/2 pseudo-headers
    // ... forward the rest
});

Trap 5: making it forget-about-it-able

Spawn a child JVM running the proxy in the foreground.
Inherit env (so the license key propagates).
Redirect stdout/stderr to ~/.promptcape/proxy.log.
Write the child PID to ~/.promptcape/proxy.pid.
Wait up to 5 seconds for the port to come up, then exit.

ProcessBuilder pb = new ProcessBuilder(cmd);
pb.redirectErrorStream(true);
pb.redirectOutput(ProcessBuilder.Redirect.appendTo(logFile.toFile()));
pb.redirectInput(new File(isWin ? "NUL" : "/dev/null"));

Process child = pb.start();
Files.writeString(pidFile, String.valueOf(child.pid()));

Plus a --stop that's idempotent (returns 0 if the proxy is already gone — a stale PID file isn't an error), and a --logs that tails the log file with the rotation handling you'd expect.

The Cursor angle: there is no Cursor angle

Here's the punchline. Once you have a localhost reverse proxy that respects ANTHROPIC_BASE_URL, integrating with Cursor isn't a feature. It's the absence of one.

The workflow inside Cursor:

Open Cursor.
Open the built-in terminal (Ctrl+`).
Run promptcape proxy --detach (or have it running already from a startup script).
Run ANTHROPIC_BASE_URL=http://localhost:8077 claude — or just claude if you exported the env var.
Use Claude Code normally.

That's the win. The integration cost — for me, for the user, for every future IDE — collapsed to nothing.

Why this generalizes

I think the broader takeaway is worth more than the specific implementation:

Reverse-applying AI changes to obfuscated code: a 3-way merge that actually works

Genevieve Breton — Tue, 19 May 2026 20:02:23 +0000

In the last article I went through what breaks when you obfuscate Java code before sending it to an AI assistant — Spring Data, JPA, Lombok, the whole framework iceberg. That was about getting the obfuscated source out in a state the AI can work on.

This one is about the much subtler half: getting the AI's changes back in.

It looks trivial. You sent Cls_a1b2c3d4 to the AI, the AI returned a modified Cls_a1b2c3d4, you have a mapping table, just walk the file and replace each obfuscated identifier with its original. Done in twenty lines of code.

Except your real file — the one a human will read tomorrow morning — now has no comments, no Javadoc, no blank lines between methods, no formatting choices you made over six months. The obfuscation pipeline stripped all of that on the way out. Reversing the rename doesn't bring it back.

This is the story of why naive reverse-translation is wrong, why it's a 3-way merge problem, not a translation problem, and what the merge actually has to handle in practice.

The naive reverse

Here's what most people try first:

String aiOutput = readAiResponse();
String realSource = aiOutput;
for (Mapping m : mappings) {
    realSource = realSource.replace(m.obfuscated(), m.real());
}
writeRealFile(realSource);

Set aside that you also need word-boundary regex and longest-match-first ordering to avoid prefix collisions — assume you handled all of that. The output is still wrong.

Why? Because the file you sent to the AI was not just renamed. It was also:

Comment-stripped. Sending Javadoc and inline comments to the AI is gratuitous leakage — they contain plain-English domain language. So they get replaced with blank-equivalent lines before transmission.
Reformatted in subtle ways. Multi-line string literals get sanitized. Annotations on separate lines get coalesced. Blank lines are preserved but only by accident.

When you reverse-translate the AI's output and write it back, you're overwriting your real source with the obfuscation-pipeline-shaped version of itself, plus the AI's changes. Every comment you wrote is gone. Every formatting choice. Every blank line at the right place.

The first time I ran this end-to-end on a real project, I tested it on a service class. The AI added one method. I diffed the result against my source: 312 lines changed. One of them was the AI's new method. The other 311 were comments and formatting I had just nuked.

The mental shift: it's a merge, not a translation

Here's the model that finally clicked. The obfuscated file is not the canonical version of your source. It is a projection of your source — one that lost information on purpose. You can't reconstruct your source from the projection alone. You need both.

In git terms: this is a 3-way merge.

Three inputs:

Snapshot — the obfuscated version of your code before the AI touched it. (Your "common ancestor.")
Cache — the obfuscated version after the AI's changes. (The "their" side.)
Real — your actual source file, with all comments and formatting intact. (The "ours" side.)

The output is your real file, with only the AI's changes applied.

The merge logic, in one sentence: for each line, if the AI didn't change it (snapshot line == cache line), keep your real line; if the AI changed it, de-obfuscate the cache line and use that.

Stated like that, it's almost obvious. The implementation has interesting corners.

The easy case: same line count

When the AI modifies lines without adding or removing any, the line indices line up across all three files. The merge is one pass:

String[] snapshotLines = snapshot.split("\n", -1);
String[] cacheLines = cache.split("\n", -1);
String[] realLines = real.split("\n", -1);

StringBuilder out = new StringBuilder();
for (int i = 0; i < cacheLines.length; i++) {
    if (i > 0) out.append('\n');
    if (snapshotLines[i].equals(cacheLines[i])) {
        // AI didn't touch this line — keep the real version (with comments, formatting)
        out.append(realLines[i]);
    } else {
        // AI changed this line — de-obfuscate it
        out.append(deobfuscate(cacheLines[i]));
    }
}

That's it. The whole trick is the snapshotLines[i].equals(cacheLines[i]) check. Equal obfuscated lines mean the AI didn't write here, so your real line — comments, blank lines, formatting — survives untouched. Only the cells the AI actually changed get the de-obfuscated translation.

This single trick made the merge usable. On a typical edit (the AI adds a parameter, changes a return type, inserts a guard clause), it touches 5–20 lines and the rest of the file stays bit-for-bit identical to my source. No phantom formatting changes, no destroyed Javadoc.

The hard case: AI added or removed lines

When the AI adds an if block or removes a redundant method, line counts diverge between snapshot and cache. Now indices don't line up — line N of the cache might correspond to line N+3 of the snapshot, or to nothing at all.

You can pull in java-diff-utils and run a real LCS-based diff here. I tried that first. It works, but it adds a dependency, the diff format needs translation, and for the size of edits the AI typically makes (5–50 lines), a homegrown linear walker is faster and easier to reason about.

The walker keeps three indices — one per file — and decides at each step whether the current cache line is an unchanged line (advance all three), a modification (advance all three, but de-obfuscate the cache line), or an insertion (advance only the cache index):

int si = 0, ci = 0, ri = 0;
while (ci < cacheLines.length) {
    if (si < snapshotLines.length && snapshotLines[si].equals(cacheLines[ci])) {
        // unchanged → keep real
        out.append(realLines[ri]);
        si++; ci++; ri++;
    } else {
        // changed: modification or insertion?
        boolean isInsertion = false;
        if (si < snapshotLines.length) {
            for (int look = ci + 1; look < cacheLines.length && look < ci + 50; look++) {
                if (snapshotLines[si].equals(cacheLines[look])) {
                    isInsertion = true;
                    break;
                }
            }
        }
        if (isInsertion) {
            // AI inserted a new line before the next snapshot line
            out.append(deobfuscate(cacheLines[ci]));
            ci++;
        } else {
            // AI modified or replaced this line
            out.append(deobfuscate(cacheLines[ci]));
            si++; ci++; ri++;
        }
    }
}

The 50-line look-ahead window deserves a comment. It's the heuristic that decides "did the AI insert new code before this snapshot line, or did it modify this snapshot line?" If the next snapshot line shows up within the next 50 cache lines, treat the current cache line as an insertion. Otherwise treat it as a modification.

Why 50? Two reasons:

Cost. A full O(N²) LCS on a 2000-line file does ~4M comparisons. Bounded look-ahead caps each step at 50 comparisons → 100k total. On a real file this is microseconds.
Realism about edit shapes. AI assistants rarely insert 50+ contiguous lines without also modifying surrounding code. When they do, you're outside "merge a small edit" territory and you should be re-running the obfuscation pipeline on the result anyway.

It is a heuristic. On pathological inputs (the AI rewrites the entire file), it degrades to "treat everything as modification" which produces a usable but heavily de-obfuscated file. That's the right failure mode — you'll lose formatting on the affected stretch, but you won't lose data.

Three traps that the test suite found

The merge in the previous sections is the version that works. Getting there involved walking face-first into a few traps that aren't obvious from the algorithm.

Trap 1: snapshot and cache disagree on line count even when the AI didn't add lines

Early on I was confused by failures that looked like the AI had inserted lines, when the diff in the assistant's output clearly hadn't.

What was happening: the snapshot had been written months earlier, when the obfuscation pipeline's comment-stripping pass replaced multi-line /* ... */ comments with a single empty line. The current version replaces each line of the comment with its own empty line — preserving line count. So a snapshot from version v1 and a cache from version v2 could disagree by dozens of lines for the same source file, just because of comment-stripping format drift.

The fix: when snapshot and cache disagree on line count, re-obfuscate the real file on the fly to get a fresh snapshot in the current format, and use that as the merge ancestor. Only .java files — sanitizers for .properties, .yml, and pom.xml preserve line count by construction, so any line-count drift on those files is a genuine AI edit, not a format mismatch.

if (snapshotLines.length != cacheLines.length && obfRelativePath.endsWith(".java")) {
    String freshObfuscated = engine.obfuscateContent(realContent);
    if (freshObfuscated != null) {
        snapshotContent = freshObfuscated;
    }
}

I did not gate this on the obfuscation pipeline version — the cost of re-obfuscating one file on demand is negligible, and the alternative (storing version metadata per snapshot and migrating on read) was complexity I didn't want.

Trap 2: don't run the Java obfuscation pipeline on a .properties file

There's a sharp corner in the fix above. The re-obfuscation call is engine.obfuscateContent(realContent). That method runs the Java pipeline — JavaParser AST walk, identifier replacement, comment stripping, reflection-string post-processing.

If I run it on a .properties file, it produces a near-identity transformation (no Java identifiers to rename, no comments to strip the same way). The output is almost-but-not-quite the same as the real file. Now I have a "snapshot" that diverges from the cache on every single line, because the properties sanitizer (a different pipeline) produced the cache, while the Java pipeline produced this fresh "snapshot."

The merge then concludes that the AI rewrote every line of the properties file, and helpfully writes the sanitized (REDACTED placeholder) values back into the real application.properties. That's not a corrupted file — that's a data exfiltration risk inverted: the redaction now overwrites the real secret.

The .endsWith(".java") guard above isn't a perf optimization. It's a correctness boundary.

Trap 3: AI-created files don't have a snapshot at all

When the AI creates a new file — Cls_a1b2c3d4Test.java, say — there's no snapshot to merge against. There's no real file either. You just have the cache.

This case is simpler in some ways (full de-obfuscation of the content, no merge) but it has its own corner: the filename itself contains obfuscated identifiers. Cls_a1b2c3d4Test.java needs to become InvoiceServiceTest.java — the AI used the obfuscated class name as a prefix to a new identifier, and the path resolver has to recognize the embedded mapping.

The strategy: try a full-filename match against known class mappings first (Cls_a1b2c3d4.java → InvoiceService.java). If that fails, run the standard line de-obfuscation on the filename without extension and treat whatever comes out as the real name (Cls_a1b2c3d4Test → InvoiceServiceTest). Same for package path segments.

if (!matched && fileName.endsWith(".java")) {
    String stem = fileName.substring(0, fileName.length() - 5);
    fileName = engine.deobfuscateLine(stem) + ".java";
}

The same machinery you wrote to de-obfuscate file contents solves the file path problem if you feed it the path as a string. Once I noticed this, several other corner cases I'd been hand-rolling (paths in stack traces, file references in error messages) collapsed into the same call.

What about deletions?

The AI sometimes "cleans up" by deleting a file. I do not auto-apply deletions. The merge reports them — applied: 3, created: 1, deletedByAi: 1 — and the developer decides whether to follow through.

This is not a technical limitation. It's a deliberate asymmetry. The cost of accidentally creating a file is a git rm away. The cost of accidentally deleting a file the developer hadn't checked in yet is unrecoverable. The merge plays defense on the irreversible side.

Why this generalizes

I started building this for obfuscation because I had to. But the pattern — projection, transformation in the projected space, merge back into the original — shows up in a lot of places once you look for it:

Source maps in JavaScript bundlers. The bundled file is a projection; the original sources are the real version; you map errors back via the source map.
AST-based refactoring tools. The AST is a projection; the textual source has comments and formatting the AST doesn't; round-tripping requires a 3-way merge.
Notebook → script extraction and back. Strip cells to a script for review; merge edits back into the notebook.
Anything that asks an LLM to edit code with stripped context. Hide secrets, hide proprietary names, hide internal comments — and now you own a merge problem.

The takeaway from six months of breaking my own merge: don't think of the projection as a translation. Think of it as a branch. Once you call it a branch, you stop trying to invent a clever inverse and you start writing a merge — which is a problem the industry has spent decades solving.

If you want to see the merge running on real Java edits — including the tricky cases — the example diffs and test fixtures live in gitlab.com/gbreton7/promptcape-docs. It's the docs and worked-examples companion to **PromptCape*, the obfuscation proxy I'm building for Claude Code and Cursor. MRs welcome if you've run into a merge case I haven't.

Java Code Obfuscation for AI Assistants: Ensuring the Full Cycle Works

Genevieve Breton — Mon, 04 May 2026 18:14:55 +0000

How to obfuscate Java code for AI coding tools while guaranteeing that compilation, tests, and reverse-application all succeed.

The problem

AI coding assistants (Claude Code, Cursor, GitHub Copilot) need access to your source code to help you. But sending proprietary code to an LLM means exposing your business domain, architecture, and intellectual property, and configuration data, even personal data.

Code obfuscation can solve this: rename identifiers before the AI sees the code, let the AI work on the obfuscated version, then reverse the changes back. Simple in theory. In practice, Java's rich ecosystem of frameworks, annotations, and conventions makes this a minefield.

This article describes what a Java obfuscation tool must handle to guarantee the full cycle:

Source compiles & tests pass
    -> Obfuscation
        -> AI modifies code
            -> Obfuscated code compiles & tests pass
                -> De-obfuscation (apply)
                    -> Source compiles & tests pass

Each transition can break. Here is what you need to address at each step, and how PromptCape solves it.

Step 1: Source -> Obfuscation

1.1 What to rename

A Java obfuscator for AI must rename:

Element	Example	Why
Package names	`com.acme.billing` -> `pkg_a1b2c3d4`	Reveals company and domain
Class names	`InvoiceService` -> `Cls_e5f6a7b8`	Reveals business concepts
Method names	`calculateDiscount` -> `mtd_1a2b3c4d`	Reveals business logic
Field names	`customerName` -> `fld_9e8d7c6b`	Reveals data model
Comments	`// Apply VAT to invoice` -> `// Processed.`	Reveals business context
Javadoc	`/** Calculates the total with tax /` -> `/* Processed. */`	Same
Config values	`jdbc:postgresql://prod.acme.com` -> `REDACTED`	Reveals infrastructure

1.2 What NOT to rename

This is where most naive approaches fail. The following must be preserved:

JDK types and methods: String, List, Map, Optional, toString, equals, hashCode, main, stream, forEach...

Framework annotations: @Autowired, @Entity, @RestController, @GetMapping, @JsonProperty, @Data, @Builder...

Framework-specific identifiers that carry semantic meaning for the framework at runtime:

Framework	What breaks if renamed	Example
Spring Data JPA	Derived query methods	`findByActiveTrue()` -> the method name IS the query. Renaming it to `mtd_xxx` makes Spring fail with "No property mtd found"
JPA/Hibernate	Entity names in JPQL	`@Query("SELECT e FROM Invoice e")` — the string `Invoice` must match the entity class name
Lombok	Generated accessor names	`@Data` generates `getName()` from field `name`. If `name` is renamed to `fld_xxx`, Lombok generates `getFld_xxx()` — but code calling `getName()` is also renamed to `getMtd_xxx()`
Jackson	JSON field mapping	`@JsonProperty` fields, or fields in DTOs in `model`/`dto` packages — renaming breaks serialization/deserialization
Spring Config	Property binding	`@ConfigurationProperties` binds YAML keys to field names
Bean Validation	Field references	`@NotBlank` on a field — the constraint message references the field name

The solution: framework detection (Pass 0). Before collecting identifiers, scan the entire project for framework annotations and produce exclusion rules. Each framework has a dedicated detector:

Project scan -> LombokDetector       -> exclude fields + get/set/is accessors
             -> SpringDataDetector   -> exclude findByXxx, countByXxx, existsByXxx methods
             -> JacksonDetector      -> exclude @Entity/@JsonProperty fields
             -> JpaHibernateDetector -> exclude @MappedSuperclass/@Embeddable fields
             -> SpringConfigDetector -> exclude @ConfigurationProperties fields
             -> ValidationDetector   -> exclude @NotBlank/@Min/@Size fields
             -> OpenApiDetector      -> exclude @Schema/@Operation fields and methods
             -> SpringBootDetector   -> track @SpringBootApplication for test fixing

1.3 String literals: a hidden trap

Code replacement must skip string literals to avoid breaking values like "Hello World" or "/api/v1/users". But some strings DO reference identifiers:

Context	String content	Must be updated?
`@Query("SELECT e FROM Invoice e")`	JPQL entity name	Yes
`Class.forName("com.acme.InvoiceService")`	Fully qualified class name	Yes
`getMethod("calculateTotal")`	Reflection method name	Yes
`@ComponentScan("com.acme.service")`	Package name	Yes
`"Hello World"`	User-facing string	No
`"/api/v1/invoices"`	REST endpoint	No

The obfuscator must apply identifier replacement INSIDE specific string contexts while leaving general strings untouched. This requires post-processing passes for @Query, reflection calls, and package annotations.

1.4 Comment stripping and special characters

Comments contain business context that reveals your domain. But stripping them introduces two problems:

Line count changes: A multi-line Javadoc becomes a single-line /** Processed. */, breaking line-number correspondence between obfuscated and original files.
Special characters in comments: French (and other languages) comments contain apostrophes (// Service d'injection), accented characters, and other non-ASCII text. A character-by-character scanner that treats ' as a Java char literal delimiter will be confused by l'injection, potentially skipping code after the comment.

Solution: Process comments before string/char literal scanning. Replace line comments (//) in-place (one line in, one line out). For multi-line Javadoc and block comments, accept the line count change and handle it during the reverse-apply step with a 3-way merge.

Step 2: Obfuscated code -> AI modification -> Compilation & tests

2.1 The obfuscated code must compile

This seems obvious but is surprisingly hard. Even with framework detection, some identifiers cause compilation failures that can only be detected by actually compiling. Examples:

A method name that collides with a JDK method after obfuscation
A field name that matches a Java keyword
An annotation processor that generates code based on identifier names

Solution: auto-fix loop. Compile the obfuscated code. If it fails, parse the compiler errors, reverse-map the broken identifiers, add them to an exclusion list, and re-obfuscate. Repeat until green or max iterations reached. Persist exclusions for future runs.

Obfuscate -> Compile -> Parse errors -> Exclude broken identifiers -> Re-obfuscate -> Compile -> ...

2.2 Tests must pass on obfuscated code

Compilation is necessary but not sufficient. Tests exercise the runtime behavior where framework conventions matter most:

Spring context loading: @SpringBootTest boots the full application context. A broken repository method or missing bean crashes the entire test suite.
Spring Data query derivation: happens at context startup, not at compile time.
JPA schema generation: Hibernate creates tables from @Entity classes. If JPQL @Query strings reference the original entity name but the class is renamed, the context fails.
H2 compatibility: Test profiles often use H2 instead of PostgreSQL. Database-specific types (JSONB, ARRAY) in column definitions fail on H2 regardless of obfuscation.

Key insight: If the source tests pass and the obfuscated tests don't, the obfuscation broke something. The auto-fix loop should use mvn test-compile (or even mvn test) as the build command to catch these failures.

2.3 The AI must be able to work effectively

The AI needs to:

Read and understand the code structure (even with obfuscated names)
Create new files, classes, and methods
Modify existing code
Run builds and tests to verify its work

The obfuscated names should be deterministic (same input always produces the same hash) so the AI can learn patterns across files. Prefixes (Cls_, mtd_, fld_, pkg_) help the AI understand the identifier type.

Step 3: De-obfuscation (apply) -> Source compiles & tests pass

This is where most obfuscation tools stop — they handle the forward direction but not the reverse. For AI coding, the reverse is just as critical.

3.1 Only apply what the AI changed

The naive approach: read the obfuscated file, de-obfuscate all identifiers, overwrite the real file. This breaks because:

Comments were stripped during obfuscation. The de-obfuscated file has /** Processed. */ where the original had full Javadoc. If the AI didn't touch that line, the original comment should be preserved.
Formatting may differ. The obfuscated file may have different whitespace or line endings.

Solution: 3-way merge. Compare the snapshot (obfuscated, pre-AI) with the cache (obfuscated, post-AI) line by line:

Lines unchanged by the AI -> keep the original source line
Lines modified by the AI -> de-obfuscate the new version

Snapshot line == Cache line?
    Yes -> keep original source line (preserves comments, formatting)
    No  -> de-obfuscate cache line (AI changed it)

For added/removed lines, use chunk-based alignment to find sync points and apply the changes surgically.

3.2 Handle AI-generated variable names

When the AI creates a new variable for an obfuscated class, it invents a name based on what it sees:

// AI writes:
private Cls_f45371c4 fld_f45371c4;

// Standard de-obfuscation produces:
private ZipBuilderService fld_f45371c4;  // class de-obfuscated, but variable name is unreadable

The variable name fld_f45371c4 is not in the mapping registry — the AI invented it. But the hash f45371c4 matches the known class ZipBuilderService.

Solution: After standard de-obfuscation, scan for remaining fld_XXXXXXXX/cls_XXXXXXXX/mtd_XXXXXXXX patterns. If the hash matches a known entry, generate a camelCase variable name:

private ZipBuilderService zipBuilderService;  // readable

Track each unique token across the file to ensure consistent renaming (declaration and all usages get the same name).

3.3 Don't apply build artifacts

The AI may run mvn package in the obfuscated workspace, creating target/ with compiled .class files, .jar archives, and test reports. These must be excluded from the diff detection:

Skip directories: target/, build/, node_modules/, .idea/
Skip binary files: .class, .jar, .war, images, fonts
These patterns match what the obfuscation engine already skips

3.4 Snapshot management

The apply command needs a "before" snapshot to detect what the AI changed. After a successful apply, the snapshot is updated. But if the apply fails or the user reverts with git restore, the snapshot is out of sync.

Solution:

Don't update the snapshot when the apply has errors
Provide a --reset-snapshot option that re-obfuscates the source into the snapshot directory without touching the cache

The complete cycle

Here is what must work end-to-end:

1. mvn test                       -> GREEN (source is healthy)
2. promptcape obfuscate --verify  -> Obfuscated workspace created
3. mvn test (in workspace)        -> GREEN (obfuscation didn't break anything)
4. AI modifies obfuscated code
5. mvn test (in workspace)        -> GREEN (AI changes work)
6. promptcape apply               -> Changes applied to source
7. mvn test                       -> GREEN (de-obfuscated changes work)

Each transition requires specific handling:

Transition	Challenge	Solution
1 -> 2	Framework identifiers break	Framework detection (8 detectors)
1 -> 2	Some identifiers cause compile errors	Auto-fix loop with exclusion persistence
2 -> 3	JPQL strings reference original names	Post-processing: replace entity names in `@Query`
2 -> 3	Reflection strings reference original names	Post-processing: replace in `getMethod()`, `forName()`
2 -> 3	Spring Data query derivation fails	Repository method name protection
4 -> 5	AI must understand the code	Deterministic naming, type prefixes
5 -> 6	Comments stripped during obfuscation	3-way merge (only apply AI-changed lines)
5 -> 6	AI invents unreadable variable names	Hash-based name resolution
5 -> 6	Build artifacts in workspace	Directory and binary file filtering
6 -> 7	Applied changes don't compile	User review + re-apply capability

What PromptCape implements

PromptCape is a Java-first obfuscation tool designed for this exact cycle. Here is what it covers today:

Obfuscation engine:

AST-based identifier collection via JavaParser (packages, classes, methods, fields, enums, records)
Deterministic HMAC-SHA256 naming with type prefixes
Package hierarchy flattening
Word-boundary replacement (\b) with longest-match-first ordering
String literal preservation with post-processing for @Query, reflection, @ComponentScan
Full comment stripping (Javadoc, block, and line comments)
POM, properties, YAML, and XML file sanitization

Framework detection (8 detectors):

Lombok: field + accessor protection
Spring Boot: application class tracking, test annotation fixing
Spring Data: repository derived query method protection
JPA/Hibernate: entity field protection, JPQL entity name replacement
Jackson: DTO/entity field protection
Spring Config: property-bound field protection
Validation: constraint field protection
OpenAPI: schema field and method protection

Auto-fix:

Compile-and-fix loop with configurable build command
Compiler error parsing and reverse mapping
Persistent exclusion lists across runs
Source verification option

Reverse application:

3-way merge (preserve original lines for unchanged content)
AI-generated variable name resolution (hash-based)
Build artifact and binary file exclusion
Snapshot management with reset capability

Two modes:

CLI workspace (obfuscate -> AI works -> apply)
HTTP proxy (transparent interception for IDE-based tools — see below)

Metrics:

Final identifier and duration counters at the end of every run, for instance:

+-------------------------------+----------+
| Final Summary                 |          |
+-------------------------------+----------+
| Iterations                    |       4  |
| Identifiers obfuscated        |    3287  |
| Packages (flattened)          |      74  |
| Exclusions loaded (previous)  |       0  |
| Exclusions added (this run)   |     152  |
| Exclusions total              |     152  |
| Verification time             |  106,1s  |
| Total time                    |  224,5s  |
+-------------------------------+----------+
| Compilation                   |    OK    |
+-------------------------------+----------+

Seamless IDE integration

The obfuscation cycle described above can run as a one-shot CLI workflow, but friction kills adoption. Developers don't want to leave their IDE, run promptcape obfuscate, switch to a workspace folder, ask the AI to do something, then run promptcape apply and switch back. They want the assistant they already use, in the IDE they already use, with the obfuscation invisible.

PromptCape provides this via an HTTP proxy mode that intercepts traffic to the AI provider and applies the same forward/reverse cycle on the fly:

IDE -> Claude Code -> [PromptCape proxy] -> Anthropic API
                          obfuscates the prompt going out
                          de-obfuscates the response coming back

The "PromptCape Claude" terminal in Cursor

The simplest integration is a dedicated terminal profile. In Cursor (and equally in VS Code or any IDE that supports terminal profiles), you create a profile named PromptCape Claude that:

Starts the proxy in the background if it is not already running
Sets ANTHROPIC_BASE_URL (and equivalent variables) to point Claude Code at the local proxy
Launches claude (the Claude Code CLI) inside that environment

From the developer's perspective, this is just another terminal in the IDE sidebar. They open the PromptCape Claude terminal instead of the default one, type their request to Claude as usual, and watch the AI work on their codebase. Behind the scenes:

Outbound prompt: identifiers, comments, and config values are obfuscated before leaving the machine
Inbound response: file edits, suggestions, and explanations are de-obfuscated before reaching the IDE
Build artifacts and binaries are filtered out of the cycle

No workflow change. No obfuscate or apply command to remember. The same Claude Code experience, with the obfuscation guaranteeing that what reaches the provider is not your real source code.

Why a terminal profile is the right shape for this

The CLI workspace is the right primitive — it gives full control and fits CI/CD or one-shot review use cases. But for daily AI-assisted coding, friction wins or loses the security battle. A proxy that hooks into the existing tool's trust chain (env vars, ANTHROPIC_BASE_URL) gives:

Zero training cost: developers keep using Claude Code exactly as before — same commands, same outputs
Zero forgotten steps: there is no apply to forget — the response is reverse-mapped on the wire
Per-project configuration: terminal profiles ship in .vscode/settings.json, .cursor/, or JetBrains run configurations, so opening a project pre-configures the secure terminal automatically
Auditability by default: every prompt and response transits the proxy, which can log, redact, or block on policy

The same pattern extends to any AI tool that respects a base-URL override (Cursor's built-in chat, Aider, Continue.dev, OpenAI-compatible clients, etc.). The IDE doesn't need a plugin and the AI tool doesn't need to know the proxy exists — the integration is just a terminal away.

Conclusion

Java obfuscation for AI coding assistants is not just about renaming identifiers. It requires deep understanding of how Java frameworks use naming conventions, how annotation processors derive behavior from names, and how to surgically apply AI changes without losing information that was stripped during obfuscation.

The key insight: framework detection before obfuscation is more effective than reactive error fixing after. Proactively protecting Spring Data repository methods, JPA entity fields, and Lombok-generated accessors eliminates most compilation failures before they happen.

The second insight: the reverse direction is just as hard as the forward. A 3-way merge that only applies AI-changed lines, combined with hash-based resolution of AI-invented names, makes the de-obfuscated code readable and correct.

The third insight: friction kills adoption, so the obfuscation has to disappear into the IDE. A dedicated terminal profile (the PromptCape Claude terminal in Cursor) that boots Claude Code through the proxy turns the entire cycle into a transparent operation — same tool, same commands, no extra steps. Security that requires discipline gets bypassed; security that ships as a terminal in the sidebar gets used.

PromptCape is open for trial at promptcape.com

Why Your Source Code Is at Risk When Using AI Coding Assistants, but no dev future without AI coding!

Genevieve Breton — Fri, 01 May 2026 16:41:29 +0000

Every line you send to an AI coding tool leaves your control. Here's what that means for your business, your clients, and your legal obligations.

You are sending your source code to a foreign server

When you use Claude Code, Cursor, GitHub Copilot, ChatGPT, Mistral Vibe, or any LLM-based coding assistant, your source code is sent over HTTPS to a remote API. That API runs on servers you don't control, in a jurisdiction you didn't choose, operated by a company whose data practices you've accepted by clicking "I agree."

Let's be specific about where your code goes:

Tool	API provider	Server locations
Claude Code / Cursor (Claude)	Anthropic	US (AWS us-east, us-west)
GitHub Copilot	Microsoft / OpenAI	US (Azure data centers)
ChatGPT	OpenAI	US (Azure data centers)
Cursor (OpenAI mode)	OpenAI	US
Mistral Vibe / Le Chat	Mistral AI	EU (France, via cloud providers)
DeepSeek	DeepSeek	China
Gemini Code Assist	Google	US (GCP data centers)

Most developers don't think twice about this. They open their IDE, the AI suggests code, they accept. Behind the scenes, the IDE sent the contents of the current file — and often surrounding files, imports, and project context — to a server thousands of kilometers away.

What exactly is being sent?

It's not just "a few lines of code." Modern AI coding tools send rich context to produce better suggestions:

The current file — full content, not just the cursor position
Open tabs and imported files — the AI reads your project structure
File paths — revealing your package hierarchy (com.acme.billing.service.InvoiceService)
Configuration files — application.yml, pom.xml, .env with database URLs, API keys, internal hostnames
Comments and Javadoc — containing business logic descriptions, TODO items, bug references
Test files — revealing edge cases, business rules, validation logic
Git context — commit messages, branch names, sometimes diffs

A single prompt to an AI coding assistant can contain more context about your business than a 10-page architecture document.

The risks are real and specific

1. Source code leakage

Your code is transmitted to and processed on third-party infrastructure. Even if the provider promises not to train on your data (and many do), the code still:

Transits through networks you don't control — intermediate proxies, load balancers, logging systems
Is stored temporarily for processing — cache layers, request logs, debugging infrastructure
May be retained for abuse detection — most providers log requests for safety monitoring
Could be subpoenaed — US providers are subject to US law enforcement requests, including the CLOUD Act which allows cross-border data access

The question is not "will the provider deliberately steal my code?" It's "how many systems touch my code between my IDE and the model, and who has access to those systems?"

2. Intellectual property exposure

Source code is a trade secret. Once exposed, trade secret protection can be lost permanently — unlike patents or copyrights, trade secrets only have value as long as they remain secret.

What your code reveals:

Element	What it exposes
Class and method names	Your business domain and capabilities (`FraudDetector`, `TaxCalculator`, `PatentAnalyzer`)
Package structure	Your architecture and module boundaries
Algorithm implementations	Your competitive advantage (pricing logic, recommendation engines, risk models)
Database schema	Your data model and relationships
API endpoints	Your service surface and capabilities
Configuration	Your infrastructure topology
Comments	Your business rules in plain language

A competitor with access to your AI provider's logs could reconstruct your product's architecture, business rules, and technical approach without ever seeing your actual repository.

3. Client code exposure (integrators and freelancers)

If you're a consulting firm, systems integrator, or freelance developer, the risk multiplies. You're not just exposing your own code — you're exposing your client's code.

Consider the scenarios:

You customize an ERP for a bank. You send controller code to Claude that contains transaction processing logic, compliance rules, and internal API endpoints. That code belongs to the bank, not to you.
You build a SaaS platform for a healthcare company. You use Copilot while working on patient data models. HIPAA-regulated data structures are now on Microsoft's servers.
You maintain a defense contractor's codebase. You use an AI to debug a networking module. The code may be subject to ITAR export controls — sending it to a US cloud provider may technically comply, but sending it to a Chinese provider (DeepSeek) would be a violation.

Most client contracts include clauses about code confidentiality and data handling. Using AI coding tools on client code may violate these contracts — and the client may never know until a breach occurs. But if it occurs and you are the one in charge of the code, this may a very bad stone in your shoe.

4. Regulatory and compliance risks

Depending on your industry and jurisdiction, sending source code to external AI services can create compliance issues:

Regulation	Risk
GDPR (EU)	If your code processes personal data and the code itself contains PII patterns, field names, or test data, sending it to a US server may violate data transfer rules
SOC 2	Requires documented controls over data access. Using AI tools without DLP controls may fail audit
ISO 27001	Requires risk assessment for third-party data processing. AI coding tools are a new attack vector
HIPAA (US healthcare)	Code containing PHI field names, validation rules, or test fixtures with patient data patterns
PCI DSS	Code handling payment card data, encryption keys, or tokenization logic
ITAR (US defense)	Export-controlled technical data cannot be shared with foreign persons or servers
NIS2 (EU)	Critical infrastructure operators must control their software supply chain

Even if you're not in a regulated industry, your clients might be. And their auditors will ask how their code is protected.

5. The training data question

Most AI providers now offer policies like "we don't train on your data." But:

Policies change. OpenAI initially trained on API data, then reversed course after backlash. What's the policy today may not be tomorrow's policy.
Policies have exceptions. Abuse detection, safety monitoring, and model evaluation may still use your data.
Free tiers have different rules. ChatGPT Free explicitly trains on your conversations. Many developers prototype with the free tier before switching to paid.
Subprocessors matter. The AI provider may not train on your data, but what about their cloud provider? Their logging vendor? Their CDN?
Data breaches happen. Samsung's semiconductor division leaked proprietary chip designs through ChatGPT in 2023. OpenAI suffered a data breach in March 2023 where users could see other users' chat titles. Even claude code has recently leaked!

The safest assumption: anything you send to an AI service should be treated as if it could become public.

The false sense of security

"But we use the enterprise plan"

Enterprise plans typically offer:

No training on your data
Data processing agreements (DPAs)
SOC 2 compliance of the provider

What they don't offer:

Control over where the data is processed
Guarantees about intermediate systems
Protection against subpoenas or government data requests
Deletion verification (you can't audit what you can't see)

"But we use a self-hosted model"

Self-hosted models (Llama, Mistral, CodeLlama) solve the data residency problem but introduce others:

Dramatically lower code quality compared to frontier models
Significant infrastructure costs
No access to the latest model capabilities (Claude Opus, GPT-4o)
Still requires GPU infrastructure that someone must maintain

"But we only send small snippets"

AI coding tools send more context than you think. And even small snippets reveal information:

// "Just a small function"
public BigDecimal calculateRoyalty(Contract contract, SalesReport report) {
    BigDecimal baseRate = contract.getRoyaltyRate();
    BigDecimal sales = report.getNetSales().subtract(report.getReturns());
    if (contract.hasMinimumGuarantee()) {
        return sales.multiply(baseRate).max(contract.getMinimumGuarantee());
    }
    return sales.multiply(baseRate);
}

This "small snippet" reveals: you have a royalty calculation business, contracts have minimum guarantees, you track returns separately from net sales, and your financial model uses BigDecimal precision. A competitor now knows your pricing model structure.

The solution: pseudonimyse and obfuscate before sending

The principle is simple: rename everything that reveals business meaning before the AI sees it, then reverse the renaming when applying the AI's changes.

Your code:                          What the AI sees:
calculateRoyalty()          ->      mtd_a1b2c3d4()
Contract contract           ->      Cls_e5f6a7b8 fld_9c8d7e6f
getRoyaltyRate()            ->      mtd_1a2b3c4d()
hasMinimumGuarantee()       ->      mtd_5e6f7a8b()

The AI can still:

Understand the code structure (types, control flow, patterns)
Suggest refactorings and bug fixes
Add new functionality
Write tests

What it cannot do:

Infer your business domain
Reconstruct your architecture from meaningful names
Extract business rules from comments (stripped)
Identify your company from package names (flattened)

What a proper obfuscation tool must handle

It's not as simple as find-and-replace. Java's framework ecosystem means certain identifiers carry semantic meaning for the runtime:

Spring Data repository methods (findByName) derive SQL queries from the method name
Lombok generates accessor methods from field names
JPA uses entity class names in JPQL query strings
Jackson derives JSON field names from Java field names
Spring Config binds YAML keys to field names

A good obfuscation tool detects these frameworks and protects the identifiers that would break. Everything else gets renamed.

The full cycle must work

Obfuscation is only useful if the cycle is complete:

Source compiles     -> Obfuscate -> Obfuscated compiles
                                 -> AI modifies -> Still compiles
                                                -> Apply back -> Source still compiles

Every transition can break. Framework detection, JPQL string updating, comment stripping, 3-way merge for reverse-application — all are necessary for a production-ready workflow.

What you should do today

Immediate steps

Audit what your AI tools send. Enable request logging or use a proxy to see what context is transmitted. You'll likely be surprised.
Check your client contracts. Look for clauses about code confidentiality, data processing, and third-party tools. Many contracts written before 2023 don't explicitly address AI coding tools — which doesn't mean they allow them.
Establish an AI coding policy. Define which projects can use AI tools, which cannot (client code, regulated code), and what safeguards are required.
Consider obfuscation. For projects where AI assistance is valuable but code exposure is unacceptable, obfuscation provides the best of both worlds: AI productivity without IP exposure.

For regulated industries

Document your AI tool usage in your risk register. Auditors will ask.
Include AI tools in your data processing agreements with clients.
Evaluate data residency requirements. If your data must stay in the EU, most US-based AI providers don't qualify without additional safeguards.

For integrators and freelancers

Get explicit written consent from clients before using AI tools on their code.
Use obfuscation by default on client projects. It's a competitive advantage: "We use AI to deliver faster, and we protect your code while doing it."
Include AI tool policies in your contracts. Define what tools you use, how code is protected, and what the client's options are.

Conclusion

AI coding assistants are transformative tools. They make developers faster, reduce boilerplate, and help navigate unfamiliar codebases. But they come with a fundamental trade-off: to help you, the AI needs to see your code. And "seeing your code" means transmitting it to infrastructure you don't control, in jurisdictions you didn't choose, with data handling practices you can't verify.

The answer is not to stop using AI tools. The answer is to stop sending your code in clear text.

Obfuscate your identifiers. Strip your comments. Sanitize your configuration. Let the AI work on the structure of your code without knowing what your code does. You get the productivity benefits. Your intellectual property stays yours.

PromptCape is a Java code obfuscation tool designed for AI coding workflows. It handles framework detection, compilation verification, and smart reverse-application. Free trial at promptcape.com.

Why Your Source Code Is at Risk When Using AI Coding Assistants

Genevieve Breton — Fri, 10 Apr 2026 06:35:02 +0000

Every line you send to an AI coding tool leaves your control. Here's what that means for your business, your clients, and your legal obligations.

You are sending your source code to a foreign server

Let's be specific about where your code goes:

Tool	API provider	Server locations
Claude Code / Cursor (Claude)	Anthropic	US (AWS us-east, us-west)
GitHub Copilot	Microsoft / OpenAI	US (Azure data centers)
ChatGPT	OpenAI	US (Azure data centers)
Cursor (OpenAI mode)	OpenAI	US
Mistral Vibe / Le Chat	Mistral AI	EU (France, via cloud providers)
DeepSeek	DeepSeek	China
Gemini Code Assist	Google	US (GCP data centers)

What exactly is being sent?

It's not just "a few lines of code." Modern AI coding tools send rich context to produce better suggestions:

The current file — full content, not just the cursor position
Open tabs and imported files — the AI reads your project structure
File paths — revealing your package hierarchy (com.acme.billing.service.InvoiceService)
Configuration files — application.yml, pom.xml, .env with database URLs, API keys, internal hostnames
Comments and Javadoc — containing business logic descriptions, TODO items, bug references
Test files — revealing edge cases, business rules, validation logic
Git context — commit messages, branch names, sometimes diffs

A single prompt to an AI coding assistant can contain more context about your business than a 10-page architecture document.

The risks are real and specific

1. Source code leakage

Your code is transmitted to and processed on third-party infrastructure. Even if the provider promises not to train on your data (and many do), the code still:

Transits through networks you don't control — intermediate proxies, load balancers, logging systems
Is stored temporarily for processing — cache layers, request logs, debugging infrastructure
May be retained for abuse detection — most providers log requests for safety monitoring
Could be subpoenaed — US providers are subject to US law enforcement requests, including the CLOUD Act which allows cross-border data access

The question is not "will the provider deliberately steal my code?" It's "how many systems touch my code between my IDE and the model, and who has access to those systems?"

2. Intellectual property exposure

Source code is a trade secret. Once exposed, trade secret protection can be lost permanently — unlike patents or copyrights, trade secrets only have value as long as they remain secret.

What your code reveals:

Element	What it exposes
Class and method names	Your business domain and capabilities (`FraudDetector`, `TaxCalculator`, `PatentAnalyzer`)
Package structure	Your architecture and module boundaries
Algorithm implementations	Your competitive advantage (pricing logic, recommendation engines, risk models)
Database schema	Your data model and relationships
API endpoints	Your service surface and capabilities
Configuration	Your infrastructure topology
Comments	Your business rules in plain language

A competitor with access to your AI provider's logs could reconstruct your product's architecture, business rules, and technical approach without ever seeing your actual repository.

3. Client code exposure (integrators and freelancers)

If you're a consulting firm, systems integrator, or freelance developer, the risk multiplies. You're not just exposing your own code — you're exposing your client's code.

Consider the scenarios:

You customize an ERP for a bank. You send controller code to Claude that contains transaction processing logic, compliance rules, and internal API endpoints. That code belongs to the bank, not to you.
You build a SaaS platform for a healthcare company. You use Copilot while working on patient data models. HIPAA-regulated data structures are now on Microsoft's servers.
You maintain a defense contractor's codebase. You use an AI to debug a networking module. The code may be subject to ITAR export controls — sending it to a US cloud provider may technically comply, but sending it to a Chinese provider (DeepSeek) would be a violation.

4. Regulatory and compliance risks

Depending on your industry and jurisdiction, sending source code to external AI services can create compliance issues:

Regulation	Risk
GDPR (EU)	If your code processes personal data and the code itself contains PII patterns, field names, or test data, sending it to a US server may violate data transfer rules
SOC 2	Requires documented controls over data access. Using AI tools without DLP controls may fail audit
ISO 27001	Requires risk assessment for third-party data processing. AI coding tools are a new attack vector
HIPAA (US healthcare)	Code containing PHI field names, validation rules, or test fixtures with patient data patterns
PCI DSS	Code handling payment card data, encryption keys, or tokenization logic
ITAR (US defense)	Export-controlled technical data cannot be shared with foreign persons or servers
NIS2 (EU)	Critical infrastructure operators must control their software supply chain

Even if you're not in a regulated industry, your clients might be. And their auditors will ask how their code is protected.

5. The training data question

Most AI providers now offer policies like "we don't train on your data." But:

Policies change. OpenAI initially trained on API data, then reversed course after backlash. What's the policy today may not be tomorrow's policy.
Policies have exceptions. Abuse detection, safety monitoring, and model evaluation may still use your data.
Free tiers have different rules. ChatGPT Free explicitly trains on your conversations. Many developers prototype with the free tier before switching to paid.
Subprocessors matter. The AI provider may not train on your data, but what about their cloud provider? Their logging vendor? Their CDN?
Data breaches happen. Samsung's semiconductor division leaked proprietary chip designs through ChatGPT in 2023. OpenAI suffered a data breach in March 2023 where users could see other users' chat titles. Even claude code has recently leaked!

The safest assumption: anything you send to an AI service should be treated as if it could become public.

The false sense of security

"But we use the enterprise plan"

Enterprise plans typically offer:

No training on your data
Data processing agreements (DPAs)
SOC 2 compliance of the provider

What they don't offer:

Control over where the data is processed
Guarantees about intermediate systems
Protection against subpoenas or government data requests
Deletion verification (you can't audit what you can't see)

"But we use a self-hosted model"

Self-hosted models (Llama, Mistral, CodeLlama) solve the data residency problem but introduce others:

Dramatically lower code quality compared to frontier models
Significant infrastructure costs
No access to the latest model capabilities (Claude Opus, GPT-4o)
Still requires GPU infrastructure that someone must maintain

"But we only send small snippets"

AI coding tools send more context than you think. And even small snippets reveal information:

// "Just a small function"
public BigDecimal calculateRoyalty(Contract contract, SalesReport report) {
    BigDecimal baseRate = contract.getRoyaltyRate();
    BigDecimal sales = report.getNetSales().subtract(report.getReturns());
    if (contract.hasMinimumGuarantee()) {
        return sales.multiply(baseRate).max(contract.getMinimumGuarantee());
    }
    return sales.multiply(baseRate);
}

The solution: obfuscate before sending

The principle is simple: rename everything that reveals business meaning before the AI sees it, then reverse the renaming when applying the AI's changes.

Your code:                          What the AI sees:
calculateRoyalty()          ->      mtd_a1b2c3d4()
Contract contract           ->      Cls_e5f6a7b8 fld_9c8d7e6f
getRoyaltyRate()            ->      mtd_1a2b3c4d()
hasMinimumGuarantee()       ->      mtd_5e6f7a8b()

The AI can still:

Understand the code structure (types, control flow, patterns)
Suggest refactorings and bug fixes
Add new functionality
Write tests

What it cannot do:

Infer your business domain
Reconstruct your architecture from meaningful names
Extract business rules from comments (stripped)
Identify your company from package names (flattened)

What a proper obfuscation tool must handle

It's not as simple as find-and-replace. Java's framework ecosystem means certain identifiers carry semantic meaning for the runtime:

Spring Data repository methods (findByName) derive SQL queries from the method name
Lombok generates accessor methods from field names
JPA uses entity class names in JPQL query strings
Jackson derives JSON field names from Java field names
Spring Config binds YAML keys to field names

A good obfuscation tool detects these frameworks and protects the identifiers that would break. Everything else gets renamed.

The full cycle must work

Obfuscation is only useful if the cycle is complete:

Source compiles     -> Obfuscate -> Obfuscated compiles
                                 -> AI modifies -> Still compiles
                                                -> Apply back -> Source still compiles

Every transition can break. Framework detection, JPQL string updating, comment stripping, 3-way merge for reverse-application — all are necessary for a production-ready workflow.

What you should do today

Immediate steps

Audit what your AI tools send. Enable request logging or use a proxy to see what context is transmitted. You'll likely be surprised.
Check your client contracts. Look for clauses about code confidentiality, data processing, and third-party tools. Many contracts written before 2023 don't explicitly address AI coding tools — which doesn't mean they allow them.
Establish an AI coding policy. Define which projects can use AI tools, which cannot (client code, regulated code), and what safeguards are required.
Consider obfuscation. For projects where AI assistance is valuable but code exposure is unacceptable, obfuscation provides the best of both worlds: AI productivity without IP exposure.

For regulated industries

Document your AI tool usage in your risk register. Auditors will ask.
Include AI tools in your data processing agreements with clients.
Evaluate data residency requirements. If your data must stay in the EU, most US-based AI providers don't qualify without additional safeguards.

For integrators and freelancers

Get explicit written consent from clients before using AI tools on their code.
Use obfuscation by default on client projects. It's a competitive advantage: "We use AI to deliver faster, and we protect your code while doing it."
Include AI tool policies in your contracts. Define what tools you use, how code is protected, and what the client's options are.

Conclusion

The answer is not to stop using AI tools. The answer is to stop sending your code in clear text.

PromptCape is a Java code obfuscation tool designed for AI coding workflows. It handles framework detection, compilation verification, and smart reverse-application. Free trial at PromptCape.