Genevieve Breton

Posted on May 4 • Edited on May 21

Java Code Obfuscation for AI Assistants: Ensuring the Full Cycle Works

#ai #java #privacy #security

How to obfuscate Java code for AI coding tools while guaranteeing that compilation, tests, and reverse-application all succeed.

The problem

AI coding assistants (Claude Code, Cursor, GitHub Copilot) need access to your source code to help you. But sending proprietary code to an LLM means exposing your business domain, architecture, and intellectual property, and configuration data, even personal data.

Code obfuscation can solve this: rename identifiers before the AI sees the code, let the AI work on the obfuscated version, then reverse the changes back. Simple in theory. In practice, Java's rich ecosystem of frameworks, annotations, and conventions makes this a minefield.

This article describes what a Java obfuscation tool must handle to guarantee the full cycle:

Source compiles & tests pass
    -> Obfuscation
        -> AI modifies code
            -> Obfuscated code compiles & tests pass
                -> De-obfuscation (apply)
                    -> Source compiles & tests pass

Each transition can break. Here is what you need to address at each step, and how PromptCape solves it.

Step 1: Source -> Obfuscation

1.1 What to rename

A Java obfuscator for AI must rename:

Element	Example	Why
Package names	`com.acme.billing` -> `pkg_a1b2c3d4`	Reveals company and domain
Class names	`InvoiceService` -> `Cls_e5f6a7b8`	Reveals business concepts
Method names	`calculateDiscount` -> `mtd_1a2b3c4d`	Reveals business logic
Field names	`customerName` -> `fld_9e8d7c6b`	Reveals data model
Comments	`// Apply VAT to invoice` -> `// Processed.`	Reveals business context
Javadoc	`/** Calculates the total with tax /` -> `/* Processed. */`	Same
Config values	`jdbc:postgresql://prod.acme.com` -> `REDACTED`	Reveals infrastructure

1.2 What NOT to rename

This is where most naive approaches fail. The following must be preserved:

JDK types and methods: String, List, Map, Optional, toString, equals, hashCode, main, stream, forEach...

Framework annotations: @Autowired, @Entity, @RestController, @GetMapping, @JsonProperty, @Data, @Builder...

Framework-specific identifiers that carry semantic meaning for the framework at runtime:

Framework	What breaks if renamed	Example
Spring Data JPA	Derived query methods	`findByActiveTrue()` -> the method name IS the query. Renaming it to `mtd_xxx` makes Spring fail with "No property mtd found"
JPA/Hibernate	Entity names in JPQL	`@Query("SELECT e FROM Invoice e")` — the string `Invoice` must match the entity class name
Lombok	Generated accessor names	`@Data` generates `getName()` from field `name`. If `name` is renamed to `fld_xxx`, Lombok generates `getFld_xxx()` — but code calling `getName()` is also renamed to `getMtd_xxx()`
Jackson	JSON field mapping	`@JsonProperty` fields, or fields in DTOs in `model`/`dto` packages — renaming breaks serialization/deserialization
Spring Config	Property binding	`@ConfigurationProperties` binds YAML keys to field names
Bean Validation	Field references	`@NotBlank` on a field — the constraint message references the field name

The solution: framework detection (Pass 0). Before collecting identifiers, scan the entire project for framework annotations and produce exclusion rules. Each framework has a dedicated detector:

Project scan -> LombokDetector       -> exclude fields + get/set/is accessors
             -> SpringDataDetector   -> exclude findByXxx, countByXxx, existsByXxx methods
             -> JacksonDetector      -> exclude @Entity/@JsonProperty fields
             -> JpaHibernateDetector -> exclude @MappedSuperclass/@Embeddable fields
             -> SpringConfigDetector -> exclude @ConfigurationProperties fields
             -> ValidationDetector   -> exclude @NotBlank/@Min/@Size fields
             -> OpenApiDetector      -> exclude @Schema/@Operation fields and methods
             -> SpringBootDetector   -> track @SpringBootApplication for test fixing

1.3 String literals: a hidden trap

Code replacement must skip string literals to avoid breaking values like "Hello World" or "/api/v1/users". But some strings DO reference identifiers:

Context	String content	Must be updated?
`@Query("SELECT e FROM Invoice e")`	JPQL entity name	Yes
`Class.forName("com.acme.InvoiceService")`	Fully qualified class name	Yes
`getMethod("calculateTotal")`	Reflection method name	Yes
`@ComponentScan("com.acme.service")`	Package name	Yes
`"Hello World"`	User-facing string	No
`"/api/v1/invoices"`	REST endpoint	No

The obfuscator must apply identifier replacement INSIDE specific string contexts while leaving general strings untouched. This requires post-processing passes for @Query, reflection calls, and package annotations.

1.4 Comment stripping and special characters

Comments contain business context that reveals your domain. But stripping them introduces two problems:

Line count changes: A multi-line Javadoc becomes a single-line /** Processed. */, breaking line-number correspondence between obfuscated and original files.
Special characters in comments: French (and other languages) comments contain apostrophes (// Service d'injection), accented characters, and other non-ASCII text. A character-by-character scanner that treats ' as a Java char literal delimiter will be confused by l'injection, potentially skipping code after the comment.

Solution: Process comments before string/char literal scanning. Replace line comments (//) in-place (one line in, one line out). For multi-line Javadoc and block comments, accept the line count change and handle it during the reverse-apply step with a 3-way merge.

Step 2: Obfuscated code -> AI modification -> Compilation & tests

2.1 The obfuscated code must compile

This seems obvious but is surprisingly hard. Even with framework detection, some identifiers cause compilation failures that can only be detected by actually compiling. Examples:

A method name that collides with a JDK method after obfuscation
A field name that matches a Java keyword
An annotation processor that generates code based on identifier names

Solution: auto-fix loop. Compile the obfuscated code. If it fails, parse the compiler errors, reverse-map the broken identifiers, add them to an exclusion list, and re-obfuscate. Repeat until green or max iterations reached. Persist exclusions for future runs.

Obfuscate -> Compile -> Parse errors -> Exclude broken identifiers -> Re-obfuscate -> Compile -> ...

2.2 Tests must pass on obfuscated code

Compilation is necessary but not sufficient. Tests exercise the runtime behavior where framework conventions matter most:

Spring context loading: @SpringBootTest boots the full application context. A broken repository method or missing bean crashes the entire test suite.
Spring Data query derivation: happens at context startup, not at compile time.
JPA schema generation: Hibernate creates tables from @Entity classes. If JPQL @Query strings reference the original entity name but the class is renamed, the context fails.
H2 compatibility: Test profiles often use H2 instead of PostgreSQL. Database-specific types (JSONB, ARRAY) in column definitions fail on H2 regardless of obfuscation.

Key insight: If the source tests pass and the obfuscated tests don't, the obfuscation broke something. The auto-fix loop should use mvn test-compile (or even mvn test) as the build command to catch these failures.

2.3 The AI must be able to work effectively

The AI needs to:

Read and understand the code structure (even with obfuscated names)
Create new files, classes, and methods
Modify existing code
Run builds and tests to verify its work

The obfuscated names should be deterministic (same input always produces the same hash) so the AI can learn patterns across files. Prefixes (Cls_, mtd_, fld_, pkg_) help the AI understand the identifier type.

Step 3: De-obfuscation (apply) -> Source compiles & tests pass

This is where most obfuscation tools stop — they handle the forward direction but not the reverse. For AI coding, the reverse is just as critical.

3.1 Only apply what the AI changed

The naive approach: read the obfuscated file, de-obfuscate all identifiers, overwrite the real file. This breaks because:

Comments were stripped during obfuscation. The de-obfuscated file has /** Processed. */ where the original had full Javadoc. If the AI didn't touch that line, the original comment should be preserved.
Formatting may differ. The obfuscated file may have different whitespace or line endings.

Solution: 3-way merge. Compare the snapshot (obfuscated, pre-AI) with the cache (obfuscated, post-AI) line by line:

Lines unchanged by the AI -> keep the original source line
Lines modified by the AI -> de-obfuscate the new version

Snapshot line == Cache line?
    Yes -> keep original source line (preserves comments, formatting)
    No  -> de-obfuscate cache line (AI changed it)

For added/removed lines, use chunk-based alignment to find sync points and apply the changes surgically.

3.2 Handle AI-generated variable names

When the AI creates a new variable for an obfuscated class, it invents a name based on what it sees:

// AI writes:
private Cls_f45371c4 fld_f45371c4;

// Standard de-obfuscation produces:
private ZipBuilderService fld_f45371c4;  // class de-obfuscated, but variable name is unreadable

The variable name fld_f45371c4 is not in the mapping registry — the AI invented it. But the hash f45371c4 matches the known class ZipBuilderService.

Solution: After standard de-obfuscation, scan for remaining fld_XXXXXXXX/cls_XXXXXXXX/mtd_XXXXXXXX patterns. If the hash matches a known entry, generate a camelCase variable name:

private ZipBuilderService zipBuilderService;  // readable

Track each unique token across the file to ensure consistent renaming (declaration and all usages get the same name).

3.3 Don't apply build artifacts

The AI may run mvn package in the obfuscated workspace, creating target/ with compiled .class files, .jar archives, and test reports. These must be excluded from the diff detection:

Skip directories: target/, build/, node_modules/, .idea/
Skip binary files: .class, .jar, .war, images, fonts
These patterns match what the obfuscation engine already skips

3.4 Snapshot management

The apply command needs a "before" snapshot to detect what the AI changed. After a successful apply, the snapshot is updated. But if the apply fails or the user reverts with git restore, the snapshot is out of sync.

Solution:

Don't update the snapshot when the apply has errors
Provide a --reset-snapshot option that re-obfuscates the source into the snapshot directory without touching the cache

The complete cycle

Here is what must work end-to-end:

1. mvn test                       -> GREEN (source is healthy)
2. promptcape obfuscate --verify  -> Obfuscated workspace created
3. mvn test (in workspace)        -> GREEN (obfuscation didn't break anything)
4. AI modifies obfuscated code
5. mvn test (in workspace)        -> GREEN (AI changes work)
6. promptcape apply               -> Changes applied to source
7. mvn test                       -> GREEN (de-obfuscated changes work)

Each transition requires specific handling:

Transition	Challenge	Solution
1 -> 2	Framework identifiers break	Framework detection (8 detectors)
1 -> 2	Some identifiers cause compile errors	Auto-fix loop with exclusion persistence
2 -> 3	JPQL strings reference original names	Post-processing: replace entity names in `@Query`
2 -> 3	Reflection strings reference original names	Post-processing: replace in `getMethod()`, `forName()`
2 -> 3	Spring Data query derivation fails	Repository method name protection
4 -> 5	AI must understand the code	Deterministic naming, type prefixes
5 -> 6	Comments stripped during obfuscation	3-way merge (only apply AI-changed lines)
5 -> 6	AI invents unreadable variable names	Hash-based name resolution
5 -> 6	Build artifacts in workspace	Directory and binary file filtering
6 -> 7	Applied changes don't compile	User review + re-apply capability

What PromptCape implements

PromptCape is a Java-first obfuscation tool designed for this exact cycle. Here is what it covers today:

Obfuscation engine:

AST-based identifier collection via JavaParser (packages, classes, methods, fields, enums, records)
Deterministic HMAC-SHA256 naming with type prefixes
Package hierarchy flattening
Word-boundary replacement (\b) with longest-match-first ordering
String literal preservation with post-processing for @Query, reflection, @ComponentScan
Full comment stripping (Javadoc, block, and line comments)
POM, properties, YAML, and XML file sanitization

Framework detection (8 detectors):

Lombok: field + accessor protection
Spring Boot: application class tracking, test annotation fixing
Spring Data: repository derived query method protection
JPA/Hibernate: entity field protection, JPQL entity name replacement
Jackson: DTO/entity field protection
Spring Config: property-bound field protection
Validation: constraint field protection
OpenAPI: schema field and method protection

Auto-fix:

Compile-and-fix loop with configurable build command
Compiler error parsing and reverse mapping
Persistent exclusion lists across runs
Source verification option

Reverse application:

3-way merge (preserve original lines for unchanged content)
AI-generated variable name resolution (hash-based)
Build artifact and binary file exclusion
Snapshot management with reset capability

Two modes:

CLI workspace (obfuscate -> AI works -> apply)
HTTP proxy (transparent interception for IDE-based tools — see below)

Metrics:

Final identifier and duration counters at the end of every run, for instance:

+-------------------------------+----------+
| Final Summary                 |          |
+-------------------------------+----------+
| Iterations                    |       4  |
| Identifiers obfuscated        |    3287  |
| Packages (flattened)          |      74  |
| Exclusions loaded (previous)  |       0  |
| Exclusions added (this run)   |     152  |
| Exclusions total              |     152  |
| Verification time             |  106,1s  |
| Total time                    |  224,5s  |
+-------------------------------+----------+
| Compilation                   |    OK    |
+-------------------------------+----------+

Seamless IDE integration

The obfuscation cycle described above can run as a one-shot CLI workflow, but friction kills adoption. Developers don't want to leave their IDE, run promptcape obfuscate, switch to a workspace folder, ask the AI to do something, then run promptcape apply and switch back. They want the assistant they already use, in the IDE they already use, with the obfuscation invisible.

PromptCape provides this via an HTTP proxy mode that intercepts traffic to the AI provider and applies the same forward/reverse cycle on the fly:

IDE -> Claude Code -> [PromptCape proxy] -> Anthropic API
                          obfuscates the prompt going out
                          de-obfuscates the response coming back

The "PromptCape Claude" terminal in Cursor

The simplest integration is a dedicated terminal profile. In Cursor (and equally in VS Code or any IDE that supports terminal profiles), you create a profile named PromptCape Claude that:

Starts the proxy in the background if it is not already running
Sets ANTHROPIC_BASE_URL (and equivalent variables) to point Claude Code at the local proxy
Launches claude (the Claude Code CLI) inside that environment

From the developer's perspective, this is just another terminal in the IDE sidebar. They open the PromptCape Claude terminal instead of the default one, type their request to Claude as usual, and watch the AI work on their codebase. Behind the scenes:

Outbound prompt: identifiers, comments, and config values are obfuscated before leaving the machine
Inbound response: file edits, suggestions, and explanations are de-obfuscated before reaching the IDE
Build artifacts and binaries are filtered out of the cycle

No workflow change. No obfuscate or apply command to remember. The same Claude Code experience, with the obfuscation guaranteeing that what reaches the provider is not your real source code.

Why a terminal profile is the right shape for this

The CLI workspace is the right primitive — it gives full control and fits CI/CD or one-shot review use cases. But for daily AI-assisted coding, friction wins or loses the security battle. A proxy that hooks into the existing tool's trust chain (env vars, ANTHROPIC_BASE_URL) gives:

Zero training cost: developers keep using Claude Code exactly as before — same commands, same outputs
Zero forgotten steps: there is no apply to forget — the response is reverse-mapped on the wire
Per-project configuration: terminal profiles ship in .vscode/settings.json, .cursor/, or JetBrains run configurations, so opening a project pre-configures the secure terminal automatically
Auditability by default: every prompt and response transits the proxy, which can log, redact, or block on policy

The same pattern extends to any AI tool that respects a base-URL override (Cursor's built-in chat, Aider, Continue.dev, OpenAI-compatible clients, etc.). The IDE doesn't need a plugin and the AI tool doesn't need to know the proxy exists — the integration is just a terminal away.

Conclusion

Java obfuscation for AI coding assistants is not just about renaming identifiers. It requires deep understanding of how Java frameworks use naming conventions, how annotation processors derive behavior from names, and how to surgically apply AI changes without losing information that was stripped during obfuscation.

The key insight: framework detection before obfuscation is more effective than reactive error fixing after. Proactively protecting Spring Data repository methods, JPA entity fields, and Lombok-generated accessors eliminates most compilation failures before they happen.

The second insight: the reverse direction is just as hard as the forward. A 3-way merge that only applies AI-changed lines, combined with hash-based resolution of AI-invented names, makes the de-obfuscated code readable and correct.

The third insight: friction kills adoption, so the obfuscation has to disappear into the IDE. A dedicated terminal profile (the PromptCape Claude terminal in Cursor) that boots Claude Code through the proxy turns the entire cycle into a transparent operation — same tool, same commands, no extra steps. Security that requires discipline gets bypassed; security that ships as a terminal in the sidebar gets used.

PromptCape is open for trial at promptcape.com

Top comments (3)

Saleha Mubeen • May 5

This is a really interesting angle—especially as AI assistants become part of the development workflow.

Obfuscation has traditionally been about protecting intellectual property, but with AI in the loop, it adds a new layer of complexity. If code is heavily obfuscated, AI tools may struggle with:

Understanding intent and structure
Generating meaningful suggestions or refactors
Debugging or tracing issues effectively

So the challenge isn’t just “can we obfuscate?” but “can we still maintain a usable development cycle?”

A few thoughts that come to mind:

Keeping a clear separation between development (readable code) and distribution (obfuscated code) is crucial
Ensuring mapping files / symbol tables are preserved for debugging and AI-assisted analysis
Possibly integrating AI earlier in the pipeline, before obfuscation is applied
Exploring selective obfuscation, where critical logic is protected but overall structure remains interpretable

It feels like the real goal is to strike a balance between security, maintainability, and AI usability—not an easy trade-off.

Curious to see how teams standardize this as AI-assisted coding becomes more common 🚀

Genevieve Breton • May 9

Yes it is a difficult compromise we try to address with promptCape and it is also why we specifically address frameworks and not just id obfuscation.
We are progressing in the same way with Python now.

Genevieve Breton • May 19

new article to explain the reverse way : dev.to/genevieve_breton_cb795f52/r...