DEV Community

Cover image for I Built a Database That Works Like Human Memory — No SQLite, No ORM, Zero External Dependencies
AATEL
AATEL

Posted on

I Built a Database That Works Like Human Memory — No SQLite, No ORM, Zero External Dependencies

## TL;DR

- Append-only binary database engine written from scratch in Python
- No SQLite, no ORM, no external dependencies
- Each memory has a concept + one of 15 emotions + optional media
- LLM cognitive layer: perceive (extract structure from raw text), ask (RAG over your memories), reflect (emotional arc), dream (free association), introspect (psychological portrait)
- Provider-agnostic: any LLM via .env file, including LM Studio/Ollama locally
- Filesystem-aware media storage: hard links on ext4/NFS/NTFS, reflinks on btrfs/APFS, atomic copy on FAT32
- AATEL license, Python 3.12+
Enter fullscreen mode Exit fullscreen mode

There's a question I couldn't stop thinking about: what if a database refused to let you update or delete anything?

Not as a technical limitation. Not even as a reliability feature, like event sourcing. As a semantic choice — because some data should be immutable by nature.

Human memory is the obvious model. You can't UPDATE what you experienced. You can't DELETE a memory. Every experience accumulates, layered over time, each one carrying its own emotional weight. The same concept — "Debt", "Family", "Work" — means something completely different at 25 versus at 45. That difference, that arc through time, is the data. It's not something to normalize away.

So I built MNHEME: a database engine that enforces this constraint at the lowest level, with an LLM layer on top that understands memory the way we actually think about it.

Here's how it works.


The Constraint: True Immutability

Most databases that claim "immutability" actually mean "audit log plus current state." You can still mutate the current state — you just record the history.

MNHEME is different. There is no current state to mutate. There's only the log.

The Memory dataclass is frozen=True:

@dataclass(frozen=True)
class Memory:
    memory_id  : str
    concept    : str      # "Debt", "Family", "Travel"
    feeling    : str      # one of 15 defined emotions
    media_type : str      # TEXT, IMAGE, VIDEO, AUDIO, DOC
    content    : str
    note       : str
    tags       : tuple[str, ...]
    timestamp  : str
    checksum   : str      # SHA-256 of content
Enter fullscreen mode Exit fullscreen mode

frozen=True means Python will raise FrozenInstanceError if you try to modify any field after creation. And the MemoryDB class has no update() method, no delete() method — not just "not implemented," literally absent from the codebase.


The Storage Engine: Binary Log From Scratch

I didn't want SQLite because SQLite is fundamentally a mutable store. I wanted to build the append-only constraint into the file format itself.

The .mnheme file is a binary log of records:

┌──────────────┬──────────┬───────────────────┐
│  MAGIC (4B)  │ SIZE (4B)│  PAYLOAD (N bytes)│
└──────────────┴──────────┴───────────────────┘

MAGIC   = [0x4D, 0x4E, 0x45, 0xE0]  — record signature
SIZE    = uint32 big-endian          — payload length
PAYLOAD = JSON UTF-8                 — the memory data
Enter fullscreen mode Exit fullscreen mode

Every write:

  1. Serializes the record to JSON
  2. Prepends the MAGIC + SIZE header
  3. Appends the entire frame in a single write() call
  4. Calls os.fsync() before returning

This means every write is crash-safe. If the process dies mid-write, the record is truncated — detectable by the missing MAGIC bytes on the next record. Truncated records are silently skipped on startup.

Indexes in RAM

On startup, the file is scanned once. For each record, we store its byte offset in several dictionaries:

concept_index : { "Debt": [offset1, offset2, ...] }
feeling_index : { "fear": [offset1, offset3, ...] }
tag_index     : { "bank": [offset1, ...] }
Enter fullscreen mode Exit fullscreen mode

When you call recall("Debt"), the index returns the offsets, and read_at(offset) seeks directly to each record — reading only the bytes you need. count() never touches the file at all.

The result: count() runs at 2.7 million ops/second from RAM. recall(concept, limit=10) reads exactly 10 records, taking about 1.5ms.


The Filesystem Layer: Inode-Aware Media Storage

For attachments (images, audio, video, documents), I wanted deduplication without copying files.

The FsProbe class identifies the filesystem and probes its actual capabilities at boot — not by trusting the filesystem name, but by actually trying os.link(), ioctl(FICLONE), and os.symlink() in the target directory:

probe = FsProbe("/data/mnheme_files")
caps  = probe.detect()
# caps.can_hardlink → True (verified by actually creating a hard link)
# caps.can_reflink  → False (ioctl FICLONE returned EOPNOTSUPP)
# caps.strategy     → LinkStrategy.HARDLINK
Enter fullscreen mode Exit fullscreen mode

The strategy chosen:

Filesystem Strategy Bytes written
ext4, ZFS, NFS Hard link 0 (same inode)
btrfs, xfs+reflink Reflink (CoW) 0 (shared blocks)
NTFS Hard link 0 (same inode)
FAT32, HDFS Atomic copy full file size

For deduplication: the pool is content-addressed by SHA-256. The same image attached to 100 different memories = one physical file, 100 hard links, one inode. nlink counter shows exactly how many memories reference it.


The LLM Layer: A Brain for the Database

This is where it gets interesting. The LLM isn't the primary interface — it's a semantic processing layer that understands memory as humans experience it.

perceive() — raw input to structured memory

r = brain.perceive("I opened the letter from the bank. My hands were shaking.")

# The LLM extracted:
r.extracted_concept  # "Debt"
r.extracted_feeling  # "fear"
r.extracted_tags     # ["bank", "body", "anxiety"]
r.enriched_content   # psychologically enriched version of the text

# The Memory is already saved in MemoryDB — immutable.
Enter fullscreen mode Exit fullscreen mode

ask() — RAG over personal memory

The LLM first extracts keywords and concepts from the question, retrieves relevant memories from the database, then answers using only those memories as context. If the memories don't contain the answer, it says so.

ans = brain.ask("How do I feel about money?")
# Searches memories tagged Debt, Finance, etc.
# Answers from what's actually stored, not from training data
print(ans.confidence_note)  # "Certainty: high — direct evidence from memories"
Enter fullscreen mode Exit fullscreen mode

reflect() — emotional arc analysis

ref = brain.reflect("Debt")
# Feeds all "Debt" memories in chronological order to the LLM
# Gets back an analysis of the emotional journey
print(ref.arc)  # "from visceral dread to earned serenity"
Enter fullscreen mode Exit fullscreen mode

dream() — free association across distant memories

Samples memories from different emotional states, asks the LLM to find unexpected connections. Loosely inspired by memory consolidation during sleep.

introspect() — psychological portrait

Feeds the full distribution of concepts and feelings, plus recent memories, and asks for a psychological portrait: dominant patterns, unresolved tensions, emotional resources.


The Provider System: Truly Vendor-Agnostic

I wanted the LLM layer to work with any provider without changing code. The solution: a .env file and pure urllib.

# Local — no API key
LM_STUDIO_URL=http://localhost:1234/v1/chat/completions
LM_STUDIO_MODEL=local-model
LM_STUDIO_RPM=60

# Cloud
GROQ_API_KEY=gsk_...
ANTHROPIC_API_KEY=sk-ant-...

USE_MULTI_PROVIDER=true   # cascade fallback if primary fails
Enter fullscreen mode Exit fullscreen mode

Any variable ending in _URL + _MODEL activates a provider. Anthropic is the only special case — detected by URL pattern, uses the native Anthropic format. Everything else is OpenAI-compatible.

Rate limiting is per-provider (token bucket). Retry uses exponential backoff on 429 and 5xx. With USE_MULTI_PROVIDER=true, if one provider fails, the next in priority order is tried automatically.

No SDK. No pip install anthropic. Just HTTP.


Benchmark Results

2,000 records, Python 3.12, 9p filesystem:

remember() with fsync:      1.8ms    552 ops/s
remember() without fsync:   0.2ms  4,632 ops/s   (8.4× faster)
count() — pure RAM:         ~0ms   2,774,322 ops/s
feeling_distribution():     0.003ms  277,865 ops/s
recall(concept, limit=10):  1.5ms    636 ops/s
search() full-text:         40ms      24 ops/s    (~49k records/s)
search(limit=5):            0.1ms   8,348 ops/s   (stops at 5th match)
Cold start (2k records):    40ms     —            (49k rec/s indexed)
Enter fullscreen mode Exit fullscreen mode

File size: ~374 bytes/record → ~36MB for 100k records, ~357MB for 1M.


The Deeper Question

Is "append-only as a semantic constraint" useful beyond memory systems?

Most phenomena we model are actually immutable events that we artificially collapse into mutable state. A bank transaction doesn't change — we just keep running totals. A sensor reading doesn't update — we just display the latest one. User behavior doesn't mutate — we summarize it.

I wonder how many data models would be simpler if they started append-only and added mutability only where genuinely needed, rather than starting with full mutability and then trying to add audit trails, history, and immutability as afterthoughts.

MNHEME is one data point in that experiment.

GitHub: [https://github.com/aatel-license/mnheme]
Python 3.12+. AATEL License. Zero external dependencies.


Top comments (0)