Every session,
the LLM starts fresh. The user re-explains their role, their constraints, their preferences, what they were doing last time. Then the session ends, and next time: same thing.
The industry has diagnosed this correctly — statelessness is a real limitation. But the solutions being built mostly share the same premise: that memory is a service you connect to. I think that premise is wrong, and it shapes everything downstream.
The actual cost of statelessness
This isn't just a UX annoyance. A 2026 study by Pichay measuring 857 production AI sessions found that 21.8% of input tokens are "structural waste" — context that has to be re-established on every session because nothing persists. Nearly a quarter of your token budget, on every call, going toward re-explaining what should already be known.
For casual chat, that's tolerable. For workflows where context is dense and high-stakes — a lawyer switching between matters, a developer moving between codebases, a clinician picking up a patient thread — the cost compounds. And it's paid on every session, indefinitely.
What everyone else built
The market's answer has been centralized memory stores. Mem0 just closed $24M in funding (October 2025) to build "the memory layer for AI." Letta/MemGPT persists agent state in a server-side database. Zep builds a temporal knowledge graph of user interactions. SAMEP and MemTrust add encryption layers on top of server-side storage.
These are all genuinely useful tools. They solve the statelessness problem for most use cases. But they share an architecture: your context lives on their infrastructure, retrieval is query-scoped, and access is controlled by the service provider.
Even the solutions that advertise encryption — SAMEP, MemTrust — encrypt server-side. The data leaves the client before any cryptographic protection is applied. You've traded "AI forgets you" for "your memory is a managed cloud service." For many applications that's fine. For sensitive workflows, it's a different risk surface, not a smaller one.
The question that didn't get asked
What if memory is a file, not a service?
Not metaphorically. Literally: a single encrypted file, owned by the user, that travels with them across sessions and across models. The LLM reads it at session start, updates it at session end, and the file lives wherever the user puts it.
{
"format": "klickd/v1",
"encrypted_payload": "<AES-256-GCM ciphertext>",
"kdf": "argon2id",
"salt": "<per-file salt>",
"nonce": "<GCM nonce>"
}
The key insight: persistent context doesn't require a server. It requires a standard. A shared format that any model can read and any client can write.
What we built
We built .klickd around this premise. The architecture is deliberately minimal:
- AES-256-GCM encryption, Argon2id key derivation. Client-side only. The key is derived from a passphrase that never leaves the device. There is no server that could be subpoenaed, breached, or decommissioned.
-
Provider-agnostic. The same
.klickdfile works with GPT-4o, Claude, Gemini, Llama. It's not bound to any model provider's infrastructure or format. - Zero-server. There is no backend storing context. The file is the memory. If the file doesn't exist on your machine, the context doesn't exist anywhere.
On personalization quality: our LLM-judge benchmark (Zenodo, DOI: 10.5281/zenodo.20320480) — run across 23 test lots and 115 profiles, using qwen3-32b as judge — showed an average improvement of +13.9 points over baseline, with a range of +12.8 to +19.2. This is with llama-3.3-70b-versatile as the model under test. Results are published as-is; methodology and raw data are in the report.
For legal and regulated workflows specifically: the file-per-context model makes cross-matter contamination structurally impossible — not enforced by query scoping or ACLs, but by physical separation. Discovery compliance changes shape: you produce the file, or you don't. There's no "server logs" ambiguity.
The honest tradeoffs
This architecture gives up things that matter in other contexts.
You lose:
- Centralized governance and server-side revocation
- Query analytics and usage telemetry
- Multi-tenant management at scale
- Cross-device sync without a separate sync layer
You gain:
- Zero trust surface: there is nothing to breach on the provider side
- GDPR-native by architecture: personal data doesn't leave the client, so data residency and right-to-erasure are trivially satisfied
- Portability: the file works with any model, now and in the future
This is not a universal solution. It is the right solution for a specific class of use cases: privacy-sensitive, cross-model, user-owned context. If you're building a consumer product where the vendor needs to manage memory at scale, use Mem0 or Zep — they're well-engineered for that. If you're building for a context where the user owns the data and the service provider should have zero access, the server-side model is architecturally incompatible with that requirement, regardless of how good the encryption story is.
Is this a new standard?
The field probably needs a portable, encrypted, open context format the way it needed JWT for auth tokens or RSS for feed syndication — a shared abstraction that any tool can read and write, owned by no single vendor.
We're not claiming .klickd is that standard. It's a proof of concept that the abstraction is viable. The memory-file spec is open: https://github.com/Davincc77/klickdskill
The question I keep coming back to: if the AI ecosystem converged on server-side memory because that's what was easy to build first, not because it's the right primitive — what does the right primitive actually look like? And is the file abstraction the right level, or is there something better?
Curious what others think, especially those who've hit the limits of query-scoped retrieval in production.
Top comments (0)