<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: x1m4x</title>
    <description>The latest articles on DEV Community by x1m4x (@x1m4x).</description>
    <link>https://dev.to/x1m4x</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837222%2F4ac76bfa-cdf1-4f0f-b583-671530c16427.jpeg</url>
      <title>DEV Community: x1m4x</title>
      <link>https://dev.to/x1m4x</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/x1m4x"/>
    <language>en</language>
    <item>
      <title>I Built an E2EE Proxy So LLM Providers Can't Read My Prompts</title>
      <dc:creator>x1m4x</dc:creator>
      <pubDate>Sat, 21 Mar 2026 15:13:10 +0000</pubDate>
      <link>https://dev.to/x1m4x/i-built-an-e2ee-proxy-so-llm-providers-cant-read-my-prompts-3onj</link>
      <guid>https://dev.to/x1m4x/i-built-an-e2ee-proxy-so-llm-providers-cant-read-my-prompts-3onj</guid>
      <description>&lt;p&gt;Every time you call an LLM API, your prompt travels in plaintext. The provider sees it. Their logs see it. Anyone with access to the infrastructure sees it.&lt;/p&gt;

&lt;p&gt;For most use cases this is fine. But when you're generating legal documents, discussing medical cases, or brainstorming business strategy — "trust us" isn't good enough.&lt;/p&gt;

&lt;p&gt;I wanted true end-to-end encryption for LLM inference. So I built a proxy that makes it happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  What if the model couldn't see your plaintext either?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://venice.ai" rel="noopener noreferrer"&gt;Venice AI&lt;/a&gt; runs E2EE models inside &lt;strong&gt;Trusted Execution Environments&lt;/strong&gt; (TEEs) — hardware-isolated enclaves where even Venice's own engineers can't access the data being processed. The model's memory is encrypted at the hardware level.&lt;/p&gt;

&lt;p&gt;The catch: the E2EE protocol requires client-side cryptography — ECDH key exchange, AES-256-GCM encryption, streaming decryption — that standard OpenAI SDKs don't support.&lt;/p&gt;

&lt;p&gt;So if you're using Python's &lt;code&gt;openai&lt;/code&gt; library, LangChain, or any other OpenAI-compatible client, you can't use E2EE models out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  The proxy approach
&lt;/h2&gt;

&lt;p&gt;Instead of forking every SDK, I built a local proxy that sits between your app and Venice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your app (plaintext) --&amp;gt; localhost:5111 --&amp;gt; [E2EE Proxy] --&amp;gt; Venice API (encrypted)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your app sends normal plaintext requests to &lt;code&gt;localhost:5111&lt;/code&gt;. The proxy handles the entire E2EE handshake transparently. From your app's perspective, it's just talking to another OpenAI-compatible endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the crypto works
&lt;/h2&gt;

&lt;p&gt;Here's the full flow for each request:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Key generation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The proxy generates an ephemeral secp256k1 key pair — the same curve Bitcoin uses. This happens once per session (sessions auto-rotate every hour).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. TEE attestation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before trusting any public key, the proxy fetches hardware attestation from Venice. This proves the model is actually running inside a TEE, not on a regular server. The attestation includes a nonce to prevent replay attacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. ECDH key exchange&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using the model's attested public key and our ephemeral private key, we compute a shared secret via Elliptic Curve Diffie-Hellman. Neither party ever transmits the shared secret — it's derived independently on both sides.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Key derivation + encryption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The shared secret is fed through HKDF-SHA256 to derive a 256-bit AES key. Each message is encrypted with AES-256-GCM using a random 12-byte nonce. The encrypted payload looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ephemeral public key (65B)] [nonce (12B)] [ciphertext + auth tag]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Streaming decryption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Venice streams SSE responses with encrypted chunks. The proxy decrypts each chunk in real-time and forwards it as standard SSE to your app. You see plaintext streaming in — but it was encrypted on the wire.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security decisions
&lt;/h2&gt;

&lt;p&gt;A few things I was deliberate about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Constant-time crypto&lt;/strong&gt; — all ECDH operations use &lt;code&gt;@noble/curves&lt;/code&gt;, an audited library with constant-time implementations. No timing side-channels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral key zeroing&lt;/strong&gt; — the private key is overwritten with zeros immediately after deriving the shared secret.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No plaintext logging&lt;/strong&gt; — prompts and responses never appear in logs. Only metadata (model name, message count) is logged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Localhost only&lt;/strong&gt; — the server binds to &lt;code&gt;127.0.0.1&lt;/code&gt; with CORS restricted to localhost origins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request size limits&lt;/strong&gt; — 10 MB max body, 60s upstream timeout.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Race condition prevention&lt;/strong&gt; — per-model session mutex prevents duplicate key exchanges under concurrent requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/x1m4x/e2ee-llm-proxy.git
&lt;span class="nb"&gt;cd &lt;/span&gt;e2ee-llm-proxy
npm &lt;span class="nb"&gt;install
&lt;/span&gt;&lt;span class="nv"&gt;VENICE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt; node server.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:5111/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;e2ee-gpt-oss-120b-p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Node.js
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://127.0.0.1:5111/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unused&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;e2ee-gpt-oss-120b-p&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Hello!&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any framework that speaks OpenAI format (LangChain, LlamaIndex, etc.) works the same way — just change &lt;code&gt;base_url&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;These are inherent to Venice's E2EE protocol, not the proxy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streaming only&lt;/strong&gt; — non-streaming is not supported&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No function calling&lt;/strong&gt; — tools, functions, structured outputs don't work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No file uploads or vision&lt;/strong&gt; — text in, text out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TEE attestation caveat&lt;/strong&gt; — attestation is verified server-side by Venice. For maximum trust, you'd want client-side verification against hardware root certificates (Intel TDX / AMD SEV-SNP). This is documented in the README.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;The whole thing is ~300 lines of JavaScript, 82 tests, zero dependencies beyond the &lt;code&gt;@noble&lt;/code&gt; crypto suite. MIT licensed.&lt;/p&gt;

&lt;p&gt;Available E2EE models include GPT-OSS 120B ($0.13/$0.65 per 1M tokens), Qwen3.5 122B, and GLM 4.7 Flash with 198K context.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/x1m4x/e2ee-llm-proxy" rel="noopener noreferrer"&gt;https://github.com/x1m4x/e2ee-llm-proxy&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback welcome.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>security</category>
      <category>privacy</category>
    </item>
  </channel>
</rss>
