DiaryVault

Posted on Feb 7

I built a cryptographic memory layer for humans in Python tags: python, opensource, security, blockchain

#python #opensource #security #blockchain

Planes have black boxes. Cars have dash cams. Companies have audit logs.
Humans have... memory. And memory is terrible.
You forget 70% of new information within 24 hours. Meanwhile, AI can now generate fake photos, voices, and text indistinguishable from reality. So I asked myself: what if there was a way to create a tamper-proof, encrypted, verifiable record of your life?
I built it over a weekend and open sourced it. Here's how.
What it does
DiaryVault Memory Layer is a Python SDK that turns any text — journal entries, notes, decisions, thoughts — into cryptographically verified, encrypted, permanent memory records.
You write → SHA-256 hashed → AES-256 encrypted → HMAC signed → optionally anchored on-chain
Five lines to get started:
pythonfrom diaryvault_memory import MemoryVault

vault = MemoryVault(encryption_key="your-secret-key")
memory = vault.create(
content="Today I decided to start a company.",
tags=["career", "milestone"]
)
print(memory.hash) # a7f3b2c1d4e5...
print(memory.verified) # True
That's it. Your memory is now hashed, encrypted, signed, and stored.
The architecture
The system has four layers:

Capture Layer — Gets data in. Manual entries now, AI agents later.
Synthesis Layer — AI enrichment. Summarization, pattern detection, emotional analysis. (Coming in v0.2)
Verification Layer — The cryptographic core. This is where it gets interesting.
Permanence Layer — Where verified hashes get anchored. Local storage, Arweave, Ethereum L2, or IPFS. The crypto decisions I made (and why) This was the part I spent the most time thinking about. Every choice here matters because if the crypto is wrong, the whole project is meaningless. Key derivation: HKDF, not raw SHA-256 My first implementation derived encryption and signing keys by doing SHA-256(master_key + purpose). It worked, but it's not how serious cryptographic systems do it. I switched to HKDF (RFC 5869), which is the industry standard used by TLS 1.3 and the Signal Protocol: pythonfrom cryptography.hazmat.primitives.kdf.hkdf import HKDF from cryptography.hazmat.primitives import hashes

def _derive_key(self, purpose: bytes) -> bytes:
hkdf = HKDF(
algorithm=hashes.SHA256(),
length=32,
salt=None,
info=purpose,
)
return hkdf.derive(self._master_key)
Why does this matter? HKDF properly separates the "extract" and "expand" phases of key derivation, making it resistant to related-key attacks. Raw SHA-256 concatenation can leak information about the master key if an attacker sees multiple derived keys.
Encryption: AES-256-GCM
I chose AES-256-GCM over alternatives like ChaCha20-Poly1305 for one reason: ubiquity. AES-GCM is hardware-accelerated on virtually every modern CPU, it's NIST-approved, and every security auditor on earth knows how to review it.
GCM mode is critical — it provides both confidentiality (nobody can read it) AND authenticity (nobody can tamper with it without detection). A unique 96-bit nonce per encryption prevents pattern analysis:
pythondef encrypt(self, plaintext: str) -> tuple[bytes, bytes]:
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

nonce = os.urandom(12)
aesgcm = AESGCM(self._enc_key)
ciphertext = aesgcm.encrypt(nonce, plaintext.encode("utf-8"), None)
return ciphertext, nonce

Signing: HMAC-SHA256
Every memory hash gets signed with HMAC-SHA256. This proves that the holder of the key created the hash — not just that the hash exists.
Batch verification: Merkle trees
If you have 1,000 memories, you don't want to anchor 1,000 hashes on-chain. Merkle trees let you compute a single root hash that verifies the entire batch:
[Root Hash] ← Anchor this ONE hash
/ \
[Hash AB] [Hash CD]
/ \ / \
[A] [B] [C] [D] ← Individual memories
One hash on-chain. Thousands of memories verified. Cost: one transaction.
Tamper detection in action
This is my favorite part. If someone changes even one character, the verification fails:
pythonmemory = vault.create(content="I said this.")
vault.verify(memory) # True

memory.content = "I NEVER said this."
vault.verify(memory) # False — hash mismatch detected
The SHA-256 hash acts as a fingerprint. Any modification, no matter how small, produces a completely different hash. Combined with the HMAC signature, you get proof of both content and authorship.
The .dvmem open format
I didn't want to lock anyone into a proprietary format. The .dvmem format is a documented JSON structure that any tool can read:
json{
"dvmem_version": "1.0",
"encoding": "utf-8",
"payload": {
"id": "550e8400-...",
"content": "...",
"hash": "a1b2c3d4...",
"encrypted_content": "...",
"signature": "...",
"created_at": "2025-02-07T14:32:01+00:00",
"metadata": {
"tags": ["daily", "career"],
"mood": "optimistic",
"source": "manual"
}
}
}
Export your data anytime. No lock-in. If this project disappears tomorrow, your memories survive.
What I learned shipping my first open source project
A few things surprised me:
The README matters more than the code. I spent as much time on the README as on the SDK itself. If someone can't understand your project in 30 seconds, they leave.
pip install has to work. Sounds obvious, but I almost launched without publishing to PyPI. A developer who can't install your package in 5 seconds will never try it.
CI signals legitimacy. Adding GitHub Actions with a green badge took 10 minutes but immediately made the project look more real.
Start small. The SDK does one thing: hash, encrypt, verify, store. I had grand visions of AI agents, blockchain anchoring, and a mobile SDK. All of that is on the roadmap, but none of it is in v0.1. Ship the core, see if anyone cares, then build what people ask for.
What's next
The roadmap is public, but honestly it depends on what the community wants:

AI capture agents (v0.2)
Arweave and Ethereum L2 anchoring (v0.3)
Photo and voice capture (v0.4)
Dead man's switch for digital legacy (v0.5)
Personal AI training export (v0.6)

Try it
bashpip install diaryvault-memory

GitHub: github.com/DiaryVault/diaryvault-memory-layer
Landing page: memory.diaryvault.com
PyPI: pypi.org/project/diaryvault-memory

MIT licensed. 28 tests passing. No VC. No tokens. Just a thing I think should exist.
Stars and feedback welcome. Especially on the crypto — I'd love eyes from security folks on the implementation.

I'm Stephen — I build AI products including DiaryVault and Crene. This is my first open source project. You can find me on Twitter and GitHub.

Top comments (18)

Mykola Kondratiuk • Feb 16

The black box analogy is perfect. I built TellMeMo (voice memo app) and hit the same trust problem, like how do users know their past recordings weren't modified? Your crypto stack is solid, HKDF is the right call over naive key derivation. Curious about the UX challenge though. How do you make verification intuitive for non-technical users? Most people won't care about SHA-256 hashes unless there's a dispute, and by then they need verification to be dead simple. What's your plan for that?

DiaryVault • Feb 21

Hey Mykola, thanks for the thoughtful comment and glad the analogy resonated. TellMeMo sounds like it runs into the exact same core problem so you get it.
You're spot on about the UX piece. Nobody is going to look at a SHA-256 hash and feel reassured. Right now the open source SDK handles all the crypto under the hood. You write, it hashes, encrypts with AES-256-GCM, signs with HMAC, and stores. The verification is a simple vault.verify(memory) call that returns true or false based on whether anything was tampered with. So the building blocks are there but it's very much a developer tool at this stage.
The plan for DiaryVault the app (which sits on top of this library) is to make all of that invisible during normal use. You journal, it does the crypto in the background, you never think about it. Verification only surfaces when you need it and when it does we want it to feel like opening a sealed envelope rather than reading a hex string. Think a simple "this entry is verified and unmodified since March 5th" with a visual indicator, and then if someone wants to go deeper they can expand into the full proof chain. Kind of like how Signal shows you the safety number but doesn't force you to verify it unless you want to.
The real unlock honestly is that most people don't need to understand the crypto, they just need to trust that it's there and that someone technical could verify it if it ever mattered. That's why we open sourced the layer. The trust isn't in us, it's in the code being auditable.
Would love to hear how you're thinking about the integrity side for voice recordings at TellMeMo. Audio has its own set of challenges there especially with AI generated voice being so good now. Always down to compare notes.

Mykola Kondratiuk • Mar 19

the open source SDK approach is smart - at least power users can verify. what we found with TellMeMo is that 90% of users never look at that, but knowing it exists changes how they talk about the app to others. "it's open source, you can check" is weirdly effective word-of-mouth even when nobody actually checks

DiaryVault • Mar 19

That's a great point actually. We see the exact same thing. Nobody has audited our crypto module yet but just having it on GitHub changes how people talk about DiaryVault to their friends. "It's open source" carries way more weight than "they say it's encrypted" even when nobody actually reads the code.

It's like a trust shortcut. People don't need to verify it themselves, they just need to know someone could.

How are you handling the integrity side for voice at TellMeMo? Text is straightforward to hash but audio feels trickier, especially now that AI generated voice is getting so good. Would love to hear your approach.

Mykola Kondratiuk • Mar 19

yeah the "open source" signal is doing so much work even when nobody actually reads the code. it is a trust shortcut basically. though i keep thinking - one day someone will read it and you need to be ready for that. having it auditable from day 1 is way less stressful than scrambling to clean it up after the fact.

DiaryVault • Mar 19

Exactly. "Open source" is a trust shortcut until someone actually opens the source. That's why the cryptographic layer in diaryvault-memory is designed to be auditable from the start. Every vault operation and encryption step is right there. Building in the open keeps you honest in a way that nothing else really does.

Mykola Kondratiuk • Mar 19

the "building in the open keeps you honest" part hits - I had a similar experience where I kept telling myself I would clean up the auth logic before publishing, then actually opened the code publicly and fixed three things within a day that had been sitting there for weeks. external eyes, even potential ones, do something to your internal standards

DiaryVault • Mar 19

Ha, that's so real. There's something about knowing the code is public that rewires your brain. You stop thinking "I'll fix it later" and start thinking "someone might read this today." It's like the best accountability partner you never asked for.

Mykola Kondratiuk • Mar 19

Exactly this. Public code just hits different. The "someone might read this" feeling is genuinely better than any code review process I've been through.

Ruqiya Arshad • Feb 15

Hi there, I worked on your code. The problem that occurred was that I could not access blockchain platforms like Arweave. But the code executed perfectly locally.
However, I need to ask a question that today we use the term, " encryption ". Is our data actually encrypted regardless of whether it is sensitive or not?

DiaryVault • Feb 15

Hey Ruqiya, glad the code ran well locally!

On the Arweave side, that's totally expected. The blockchain anchoring is optional and you need an Arweave wallet with some AR tokens to use it. The core SDK works completely on its own without any blockchain connection. Think of the anchoring as an extra layer of proof you can add later if you want, but the encryption and hashing all work locally without it.

To your encryption question, yes everything is actually encrypted. The SDK doesn't pick and choose what to encrypt based on how sensitive it is. Every single memory record goes through AES-256-GCM encryption no matter what. Even if you write something as simple as "I had coffee today" it gets the same military grade encryption as anything else. The philosophy is that you shouldn't have to decide what deserves protection. All of it does.

Your encryption key is derived using HKDF which is the same key derivation method used in TLS 1.3 and Signal messenger. So the encryption isn't just a checkbox feature, it's real and it's the same standard that secures billions of messages every day.

Really appreciate you actually running the code and testing it out. That means a lot. Let me know if you run into anything else!

Ruqiya Arshad • Feb 16

Sure, I will try to understand your
work. Because I need to understand how encryption actually works.

DiaryVault • Feb 16

That's awesome, happy to help you understand it. Here's the best way to go through the code so it makes sense step by step.

Start with the MemoryVault class. That's the main entry point and it's where everything begins. When you create a vault with an encryption key, it sets up all the crypto tools internally.

Then look at the create method. When you call vault.create(content="something"), three things happen in order. First it hashes your content with SHA-256 which gives you a unique fingerprint. Then it encrypts the content using AES-256-GCM so nobody can read it without your key. Then it signs everything with HMAC so you can prove it hasn't been tampered with.

The key thing to understand is that your encryption key never gets used directly. It goes through HKDF first which derives separate keys for encryption and signing. That way even if one key is somehow compromised the other operations stay safe.

If you want to see encryption in action, try running the tests. They show exactly what goes in and what comes out at each step. The test file is a great way to learn how each piece works.

Let me know which part you want to dig into first and I can point you to the exact lines of code!

Ruqiya Arshad • Feb 16

Sure, I will let you know this week.

Ruqiya Arshad • Feb 8

As far as I understood, from your nice article that DiaryVault is not a built-in Python object, but rather you created it. Right? Cryptographically, how will I rely on it for my life record or data on your program? Are there going to be any security issues?

DiaryVault • Feb 8

Thanks Ruqiya! Great questions.
Yep, DiaryVault Memory Layer is something I built and open sourced. You install it with pip install diaryvault-memory.
On the trust side, this is exactly why I made it open source. You don't have to trust me at all. The entire codebase is on GitHub for anyone to read and audit. And I'm not rolling my own crypto. Everything uses battle tested standards from Python's cryptography library. AES-256-GCM for encryption, HKDF for key derivation (the same thing TLS 1.3 and Signal use), and SHA-256 for hashing.
The most important design decision is that encryption happens entirely on your device. Your key never leaves your machine. Even if someone got hold of the encrypted files, they'd be useless without your key.
That said, it's v0.1 and I would genuinely love for security minded developers to tear apart the implementation. That's honestly one of the main reasons I open sourced it. The crypto module is about 250 lines if you want to take a look.
Really appreciate you asking this. It's the exact right question to ask about any security tool.

Ruqiya Arshad • Feb 9

This week I will try to implement this model of yours.

DiaryVault • Feb 9

That's awesome, let me know how it goes! If you run into anything confusing or hit any bugs, open an issue on GitHub and I'll get back to you fast. Would love to hear what you think.

View full discussion (18 comments)