DEV Community

Cover image for I built a memory layer for AI assistants that refuses to fake citations
Radoslav Tsvetkov
Radoslav Tsvetkov

Posted on

I built a memory layer for AI assistants that refuses to fake citations

A few months ago I was using my AI assistant to dig through my Obsidian vault. I asked it about a decision I had made on a side project, and it answered with confidence. The answer cited two notes. One of them existed but did not say what the model claimed. The other did not exist at all.

I kept staring at the response for a minute, because it sounded exactly right. It matched what I half-remembered. If I had not gone to check the source, I would have used that answer in a real conversation with someone.

That is the problem I want to tell you about, and the one I have spent the last few months trying to fix.

Why "chat with your notes" tools quietly lie to you

The standard recipe for personal RAG looks like this:

  1. Take your markdown vault.
  2. Split it into chunks.
  3. Index those chunks with keyword search and embeddings.
  4. At query time, retrieve the top few chunks.
  5. Hand them to a language model with an instruction like "answer using only the context, and cite your sources."
  6. Hope.

Step six is where things go sideways. The model is not actually obligated to cite anything correctly. It is just being asked nicely. When it gets confused, it picks a note title that sounds related and uses it as a citation. When it knows there is a related concept in the vault but cannot find a clean quote, it paraphrases and attributes a paraphrase that does not exist anywhere.

For chatting with Wikipedia, this is fine. For your own decisions, meeting notes, and life events, it is not. The whole point of writing things down was to have a single source of truth. If your AI layer can invent quotes from notes that do not exist, you have lost the property that made the notes useful in the first place.

The shift: claims, not chunks

Memora's central idea is small but, once you accept it, kind of irreversible.

Stop treating the chunk of text as the unit of memory. Treat the claim as the unit of memory.

A claim is a small structured fact. It has a subject, a predicate, and an object. It is extracted from a specific note, and it carries the exact byte range of the source text it came from, plus a blake3 hash of that source. It also carries a validity window (when the claim started being true and, optionally, when it stopped) and a privacy band.

When you index your vault, Memora calls a model once per note to extract these claims. They go into a SQLite database with edges between them, like "supersedes", "contradicts", and "derives_from".

When you query, you do not get raw chunks back. You get claims. Each claim has an ID. The answering model is asked to cite specific claim IDs in its response.

Then comes the part that, in my opinion, makes the whole thing actually useful.

Validation before you ever see the answer

Before any answer reaches the user, Memora does the following:

For each cited claim ID, it looks up the byte range and the source note, re-reads the exact span from your markdown on disk, and recomputes the blake3 hash. If the hash matches the one stored at extraction time, the citation stands. If it does not, the citation is stripped. If the model invented an ID that does not exist in the database, that one is stripped too.

If the answer ends up with no valid citations left, the model is re-prompted, this time with only the claims that survived as context. The system enforces the citation contract through Rust types and span hashes, not through prompt obedience.

This is a different trust model from "ask the model to be careful". It does not assume good behavior. It checks.

The boring engineering decisions

I want to talk about a few choices that look small but pay off every single day.

Rust as the language. I wanted a single binary you could drop on a machine and forget. No Python environment, no node_modules, no Docker required. Cargo install or download a release. The type system also makes the citation contract enforceable at compile time, which matters when you are trying to make a guarantee like "no unverified citation will ever leave this function".

SQLite + HNSW for storage. Personal vaults are not at internet scale. A few thousand notes is the realistic upper bound for a long time. SQLite handles claims and edges fine. HNSW handles vector search fine. There is no Kafka, no vector DB service, no infrastructure for you to run.

Obsidian as the substrate. I did not want to invent another note-taking app. The vault stays in plain markdown. You can edit in Obsidian, in vim, in TextEdit. Memora watches for changes and re-extracts claims from changed notes. The notes are still yours, in a format that will outlive the project.

MCP for integration. Instead of building a chat UI, Memora exposes its tools over the Model Context Protocol. That means it works inside Claude Desktop, Cursor, and anything else that speaks MCP. You bring the chat interface you already use.

Things that took longer than I expected

Two surprises worth mentioning.

The first one is that "atomic claims" is harder than it sounds. Early on I had the extractor pulling things like "the project was good", which is technically a claim but completely useless. The current extraction prompt has been through many revisions and is paired with deduplication, normalization, and a gate that filters single-claim noise out of the active challenger output. There is still room to improve. If you have ideas, I would love to hear them.

The second one is that local LLMs are not yet good enough at extraction. Qwen 14B hallucinates relationships. Qwen 32B is acceptable but misses cross-region patterns. Llama 70B can match Claude Haiku quality but at significant memory cost. So the recommended setup right now is Claude Haiku for extraction (about $0.30 to index a 100-note vault) with a local model for embeddings. The fully local path works, but I want to be honest that it is not at production quality yet.

Where the project is now

It is at v0.1.27. It is open source under Apache 2.0. It indexes 100-note vaults in 5 to 10 minutes with Claude Haiku. The active challenger surfaces decisions, contradictions, stale dependencies, and open questions in a generated atlas page in your vault, so you can keep an eye on the state of your own knowledge over time.

If you live in your notes, if you have ever asked an AI tool a question about your own writing and gotten back something that was not actually there, please try it.

Repo: https://github.com/radotsvetkov/memora
Architecture demo: https://radotsvetkov.github.io/memora

I am especially interested in feedback from people with messy vaults, weird folder layouts, and strong opinions about retrieval. Issues, edge cases, and design discussions are welcome on GitHub.

If you read this far, thank you. I would much rather hear that I am wrong about something specific than that this looks neat.

Top comments (0)