Every enterprise running RAG today is doing what Samsung engineers did in 2023 — sending sensitive data to LLM providers. Except it's automated, at scale, thousands of times per day.
Samsung's problem wasn't careless employees. It was architectural. And your RAG pipeline has the same architecture.
The 4 Leak Points
Your Documents (contracts, financials, HR, strategy)
|
v
1. Chunking ✅ Local, safe
|
v
2. Embedding API call ❌ LEAK #1: raw text to provider
|
v
3. Vector DB (cloud) ❌ LEAK #2: invertible embeddings
|
v
4. User query embedding ❌ LEAK #3: query to embedding API
|
v
5. Retrieved context (your most sensitive chunks)
|
v
6. LLM generation call ❌ LEAK #4: query + context in plaintext
|
v
Response to user
Six steps. Four leak points. Every single query.
Your compliance team saw a box labeled "LLM" in the architecture diagram and assumed it was local. It isn't.
"But Embeddings Are Just Numbers"
That was conventional wisdom until Zero2Text (Feb 2026) — a zero-training inversion attack that reconstructs text from embedding vectors with only API access. 1.8x higher ROUGE-L scores vs all prior baselines.
Patient records, legal docs, proprietary code — all recoverable from vectors alone.
A Pinecone/Weaviate breach = full plaintext breach. OWASP now classifies this as a Top 10 LLM vulnerability.
Why Existing Solutions Don't Work
Redaction kills utility:
Before: "Tata Motors reported Rs 3.4L Cr revenue in Q3 2025"
After: "[REDACTED] reported [REDACTED] revenue in [REDACTED]"
Good luck getting useful embeddings from that. Your vector search returns garbage.
PII detectors (Presidio, LLM Guard):
- 50-200ms overhead per call (Python NER in hot path)
- Only catch names/emails — miss revenue figures, deal sizes, project codenames
- Stateless — different replacement each call breaks vector search
Cloud-locked tools: Bedrock guardrails = Bedrock only. Private AI = another SaaS middleman.
Consistent Beyond <10ms Self- Pipeline
mapping PII latency hosted aware
Presidio ❌ ❌ ❌ ✅ ❌
LLM Guard ❌ ❌ ❌ ✅ ❌
Bedrock Guardrails ❌ ⚠️ ✅ ❌ ❌
CloakPipe ✅ ✅ ✅ ✅ ✅
The Fix: Consistent Pseudonymization
Don't redact. Replace consistently.
Map "Tata Motors" → "ORG_7". Same token, every time, across every document and query.
Before: "Tata Motors reported Rs 3.4L Cr revenue in Q3 2025, up 12%"
After: "ORG_7 reported AMOUNT_12 revenue in DATE_3, up PCT_3"
Semantic structure preserved → embeddings still meaningful → vector search works → LLM responds with pseudonyms → rehydrate back to real values.
"What was Tata Motors' revenue last quarter?"
↓
Pseudonymize → "What was ORG_7's revenue last quarter?"
↓
Embed + Search → retrieve pseudonymized chunks
↓
LLM → "ORG_7 reported AMOUNT_12 in DATE_3..."
↓
Rehydrate → "Tata Motors reported Rs 3.4L Cr in Q3 2025..."
↓
✅ User sees real answer. Provider never saw "Tata Motors."
Going Further: Kill 3/4 Leak Points
Vectorless tree search builds a local JSON index and lets the LLM reason about relevance. No embedding API. No vector DB. No inversion risk.
VECTOR RAG (4 leaks): TREE-BASED RAG (1 leak):
Text → Embedding API ❌ Tree index built locally ✅
Vectors → Cloud DB ❌ Tree stored locally ✅
Query → Embedding API ❌ LLM navigates tree ✅
Context → LLM ❌ Pseudonymized → LLM ⚠️ (protected)
PageIndex (VectifyAI) proved 98.7% accuracy on FinanceBench vs GPT-4o's ~31% for structured docs.
CloakPipe — Drop-In Privacy Proxy
I built CloakPipe — a Rust-native proxy that sits between your app and any OpenAI-compatible API.
Your App → CloakPipe → LLM API
| |
"Tata Motors" Sees "ORG_1"
→ "ORG_1" |
| |
"ORG_1" ←----+
→ "Tata Motors"
Setup: change OPENAI_BASE_URL. That's it. Your LangChain/LlamaIndex/OpenAI SDK code works unchanged.
v0.1 features:
- Multi-layer detection (API keys, JWTs, emails, IPs, financial amounts, fiscal dates, custom TOML rules)
- AES-256-GCM encrypted vault +
zeroizememory safety - OpenAI-compatible proxy (
/v1/chat/completions,/v1/embeddings) - SSE streaming rehydration
- Single binary, <5ms overhead
Coming soon:
- 🌳 CloakTree — vectorless retrieval, eliminates 3/4 leak points
- 🔐 CloakVector — distance-preserving vector encryption
- 🧠 ONNX-based NER
- 🏗️ TEE support (AWS Nitro, Intel TDX)
The privacy-preserving AI market is $4.25B today, projected $40B by 2035. 75% of enterprise leaders cite security as #1 barrier to AI adoption.
The era of sending raw enterprise data to LLM APIs in plaintext is ending.
github.com/rohansx/cloakpipe — star it, try it, break it.
Top comments (0)