Philippe Quattrociocchi

Posted on Apr 26

# How MemoraEU Cannot Read Your Memories — Even If We Wanted To

#webdev #ai #productivity #programming

How MemoraEU Cannot Read Your Memories — Even If We Wanted To

Zero-knowledge architecture of a sovereign AI memory layer

The question nobody asks enough

When Claude, ChatGPT, or Gemini "remembers" something, where does it go? To Anthropic's, OpenAI's, or Google's servers. In plaintext. Potentially used to fine-tune future models. Subject to the Cloud Act if the company is American.

That's the trade-off we implicitly accept in exchange for convenience.

MemoraEU makes a different bet: the server must never be able to read your data. Not as a policy. As an irreversible technical constraint. This post explains how we get there — and why it's harder than it sounds when you still want semantic search to work.

The architecture in one sentence

Content is encrypted on your machine before leaving your machine. The key never leaves your machine. The server stores opaque blobs and floating-point vectors.

That's it. Everything else is implementation.

Key derivation: PBKDF2-HMAC-SHA256

You configure two environment variables in your MCP server:

MEMORAEU_SECRET=your-long-unique-passphrase
MEMORAEU_SALT=one-salt-per-installation

At startup, a single derivation operation:

from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from cryptography.hazmat.primitives import hashes

kdf = PBKDF2HMAC(
    algorithm=hashes.SHA256(),
    length=32,          # AES-256 → 32 bytes
    salt=salt_bytes,
    iterations=100_000, # NIST SP 800-132 recommends ≥ 10,000
)
key = kdf.derive(password.encode())

100,000 iterations of SHA-256: enough to make brute-forcing a long passphrase prohibitively expensive, without noticeably slowing down startup (< 200 ms on a modern laptop).

The derived key is kept in RAM for the duration of the session. It is never written to disk, never transmitted, never logged.

Encryption: AES-256-GCM

import os, base64
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

def encrypt(plaintext: str, key: bytes) -> str:
    nonce = os.urandom(12)                              # 96-bit, random per message
    ciphertext = AESGCM(key).encrypt(nonce, plaintext.encode("utf-8"), None)
    return base64.b64encode(nonce + ciphertext).decode("ascii")

The format of the blob stored server-side:

base64( nonce[12 bytes] | ciphertext | auth_tag[16 bytes] )

Three key properties of GCM:

Confidentiality — without the key, the ciphertext is indistinguishable from random noise
Integrity — the 16-byte authentication tag detects any modification of the ciphertext (authenticated encryption)
Unique nonce per message — identical content produces different blobs on every encryption

What the server sees: a base64 string. What it can infer: the approximate size of the original content (± a few bytes). Nothing else.

The real challenge: searching encrypted data

Encrypting and storing is easy. But an AI memory without search is useless. And semantic search requires understanding the meaning of content — something a server cannot do on ciphertext.

The naive solution would be to decrypt server-side to compute the embedding. Obviously we don't do that.

Our approach: embeddings are computed before encryption, on your machine.

Plaintext
    │
    ├─► Mistral Embed (local) ──► float[1024] vector ──► Qdrant (server)
    │
    └─► AES-256-GCM ──► opaque blob ──► PostgreSQL (server)

When you store "I use ESP32-S3 with UART on GPIO21":

The plaintext goes to the Mistral Embed API (from your machine, via your Mistral API key)
Mistral returns a 1024-dimensional vector representing the semantics
The text is encrypted locally
Only the vector and the encrypted blob travel to our servers

When you search "UART wiring on my board":

The query is turned into a vector (same process, local)
Qdrant performs a cosine similarity search in the vector space
The matching blobs are returned
They are decrypted locally before being shown to Claude

What the server can do: find the N nearest vectors to a query. It knows that two memories are "semantically close" without knowing what they say.

What the server cannot do: read the content, understand the topic, infer anything beyond the vector structure.

Zero-knowledge deduplication

Before storing a new memory, we check whether it already exists — without ever comparing plaintext:

DEDUP_SKIP_THRESHOLD = 0.94  # exact duplicate → reject storage
DEDUP_WARN_THRESHOLD = 0.85  # very similar → warn but store

response = await api_post("/memories/search-by-vector", {
    "vector": embedding,   # vector computed locally
    "limit": 1,
    "threshold": DEDUP_WARN_THRESHOLD,
})

The comparison happens entirely in vector space. If the cosine similarity score exceeds 0.94, it's an exact duplicate: we reject the storage and return the existing ID. Between 0.85 and 0.94: we inform the user but store anyway.

Result: zero plaintext transmitted for deduplication.

Smart compression (optional)

If MISTRAL_API_KEY is configured, the MCP server compresses long memories before encrypting them:

Raw text (> 300 chars)
    │
    └─► Mistral (local): "summarize in 1-3 sentences"
            │
            └─► Compressed text ──► Embed ──► Encrypt ──► Store

Compression happens before encryption, on plaintext, on your machine. What goes to the Mistral API is your raw text — but it's your Mistral key, on your infrastructure, and Mistral does not store prompts by default. What goes to our servers is always encrypted.

What the server actually sees

In the database, a memory looks like this:

{
  "id": "mem_01HVKX9...",
  "content": "dGhpcyBpcyBub3QgcmVhZGFibGUgYXQgYWxs...",
  "category": "hardware",
  "embedding": [0.0234, -0.1823, 0.0091, ...],
  "created_at": "2026-04-25T09:14:00Z"
}

content is a base64 blob that cannot be decrypted without the key. embedding is a vector that captures semantics but not literal content. category is assigned locally by the LLM before encryption — it's the only readable metadata, and it's intentionally generic ("hardware", "personal", "project"…).

The threat model

This scenario is covered: our servers are compromised. An attacker retrieves the entire database and the Qdrant vectors. They see base64 blobs and coordinates in a 1024-dimensional space. Without your passphrase, there's nothing they can do. Even we can't.

This scenario is not covered: your machine is compromised. If an attacker has access to your local environment, they can read MEMORAEU_SECRET from your .env or intercept content before encryption. No zero-knowledge architecture can protect against client-side compromise — this is a fundamental limitation, not specific to MemoraEU.

This scenario is partially covered: passphrase reuse. If you use the same passphrase across multiple installations, compromising one machine affects all others. A different MEMORAEU_SALT per installation mitigates this risk.

Cryptographic roadmap

The current v1 uses a fixed salt per installation (stored in .env). This is pragmatic but imperfect:

Phase 2: unique salt per user, stored server-side (the server provides the salt, not the key — this doesn't break zero-knowledge)
Phase 3: per-memory-pair encryption to prevent temporal correlation
Phase 4: HSM support for enterprise deployments (key in hardware, never in RAM)

Why this matters now

LLMs are becoming permanent assistants. They will know more and more about you — your projects, your decisions, your family, your health, your finances. Where that memory is stored and who can access it is a question of personal sovereignty, not just product preference.

Zero-knowledge is not a marketing argument. It's an architectural constraint we impose on ourselves so we are never in the position of having to choose between our commercial interests and your privacy.

MemoraEU is open source. The encryption code is available on GitHub.

Technical questions: contact@memoraeu.com

DEV Community