DEV Community

Dharani
Dharani

Posted on

The .txt File as the Soul of a Personal AI — FileRAG Memory Architecture

The .txt File as the Soul of a Personal AI — FileRAG Memory Architecture

By Dharanidharan J (JD)

Full Stack & AI Engineer | Building Jarvix


The Problem Nobody Talks About

Every chatbot tutorial teaches you the same thing:

history = []
history.append({"role": "user", "content": message})
Enter fullscreen mode Exit fullscreen mode

And that works — until it doesn't.

After 500 turns, your dict has forgotten who the user is. After 1000 turns, you're hitting token limits. After a restart, everything is gone. Redis helps with persistence but still buries early facts under noise. Vector DBs help with retrieval but bloat storage and need infrastructure.

What if the memory itself was just a file?


The Idea

Every conversation a user has gets distilled into a plain .txt file. That file is the brain. On every new query, a hybrid BM25 + semantic RAG retrieves the most relevant chunks from it and injects them as context.

users/
└── jd.txt        ← the soul file
Enter fullscreen mode Exit fullscreen mode

The soul file looks like this:

[Turns 1-5]
- User's name is JD, software engineer
- Building FileRAG, a novel memory architecture
- Uses Pop!_OS with Fish shell and NVIDIA GPU

[Turns 6-10]
- Has a cat named Pixel who distracts during coding
- Paused TaskNest due to burnout
- Now focused on AgenticMesh
Enter fullscreen mode Exit fullscreen mode

Human readable. Editable. Yours.


Why This Is Different

Most memory systems store messages. FileRAG stores a relationship.

System What it stores
Dict / Redis Raw message objects
Vector DB Embeddings of messages
FileRAG Distilled understanding of the user

The longer you use it, the more the AI understands you — not because it has more messages, but because it has a better summary of who you are.


The Architecture

User message
     ↓
Topic drift check (cosine similarity)
     ├── Drift detected → distill current buffer immediately
     └── No drift → continue
     ↓
Hybrid retrieval (BM25 + ChromaDB) from soul file
     ↓
Inject context → LLM responds
     ↓
Append to turn buffer
     ↓
Every 5 turns → distill → append to soul file → update ChromaDB
     ↓
Emergency distillation on exit (SIGINT/SIGTERM)
Enter fullscreen mode Exit fullscreen mode

Key innovations:

1. Topic-Drift Distillation

Instead of waiting every N turns blindly, the system measures semantic similarity between the current buffer and the new message. If similarity drops below 0.25, it immediately distills and starts a fresh block. This keeps topic chunks clean and isolated.

2. Deduplication

Before writing any new chunk, cosine similarity is checked against all existing chunks. If >92% similar, the chunk is skipped. This prevents filler conversations from polluting the soul file.

3. Emergency Exit Handler

SIGINT and SIGTERM are intercepted. On Ctrl+C, the current buffer is immediately distilled before the process exits. Nothing is lost.

4. Hybrid Retrieval

BM25 catches exact keywords (project names, usernames). Semantic search catches meaning (preferences, personality). Together they outperform either alone.


Benchmark Results

Tested on Pop!_OS, NVIDIA GPU, sentence-transformers all-MiniLM-L6-v2 embeddings

Small (20 turns)

Metric Dict Redis Vector DB FileRAG
Write Speed (ms) 0.0004 0.30 33.38 20.33
Read Speed (ms) 0.002 0.26 6.64 9.26
Storage (KB) 1.42 1.38 396.16 356.66
Accuracy 100% 100% 67% 100%
Persistent

At small scale, Dict and Redis win on speed. FileRAG matches on accuracy. Fair.

Medium (500 turns)

Metric Dict Redis Vector DB FileRAG
Write Speed (ms) 0.0002 0.08 22.23 18.93
Read Speed (ms) 0.002 0.24 8.51 7.64
Storage (KB) 34.75 33.77 1604.16 653.47
Accuracy 0% 0% 67% 100%

This is where it gets interesting. Dict and Redis completely fail — core facts buried under 490 turns of noise. FileRAG still retrieves perfectly.

Large (1000 turns)

Metric Dict Redis Vector DB FileRAG
Storage (KB) 69.47 67.51 4338.36 938.74
Soul file only (KB) 18.58
Accuracy 0% 0% 67% 100%

FileRAG's total storage includes ChromaDB index overhead. The soul file itself — the actual human-readable memory — is just 18 KB for 1000 turns.

---|---|---|---|---|
| Storage (KB) | 3,478 | 3,381 | 76,159 | 29,812 |
| Soul file (KB) | — | — | — | ~1,865 |
| Accuracy | 0% | 0% | ~67% | ~100% |

At 100k turns, Vector DB would consume ~74 MB just for index storage. FileRAG's soul file stays under 2 MB — human-readable, portable, private.


The Verdict

Category Dict Redis Vector DB FileRAG
Fastest write Medium
Best accuracy at scale Medium
Smallest storage at scale
Persistent
No infrastructure
Local / offline ⚠️
Privacy (on device) ⚠️
Grows naturally with user Medium

FileRAG is not the fastest. It is not the simplest. But it is the only architecture that gets more accurate as the conversation grows, without growing infrastructure requirements.


The Human Analogy

Your brain doesn't record every conversation verbatim. It compresses experiences into memory — feelings, facts, patterns. The hippocampus distills, the cortex stores.

FileRAG does the same thing:

Conversation → Distillation → Soul file → Retrieval → Natural response
Experience  → Hippocampus  → Cortex    → Recall    → Natural behaviour
Enter fullscreen mode Exit fullscreen mode

The soul file is not a database. It is a diary the AI reads before speaking to you.


Limitations (Honest)

  • Write speed is slower than Dict/Redis — embedding costs time
  • Not suited for 10k+ concurrent users — file I/O doesn't scale horizontally
  • Distillation quality depends on the LLM used — a weak summariser produces weak memories
  • 10k+ turns require deduplication to prevent soul file pollution (implemented, but adds complexity)

Stack

LLM          → Groq (llama3-70b-8192)
Distillation → Groq (llama3-70b-8192) every 5 turns or on topic drift
Embeddings   → sentence-transformers/all-MiniLM-L6-v2
Vector store → ChromaDB (persistent)
Retrieval    → Hybrid BM25 + Cosine Semantic
Memory       → {user_id}.txt — the soul file
Enter fullscreen mode Exit fullscreen mode

Get the Code

The full implementation — main.py, benchmark.py, and the architecture — is available on GitHub:

github.com/dharanidh75/filerag-memory

This is also the memory layer being built into Jarvix — a local-first voice AI assistant for Pop!_OS.


What's Next

  • v2: Structured memory schema (facts, preferences, timeline)
  • v3: Multi-user support with isolated soul files per session
  • v4: Fine-tuned distillation model for higher extraction quality
  • v5: Production-ready API layer

If you're building a local AI, a personal assistant, or just tired of your chatbot forgetting who you are after every restart — give FileRAG a try.

The soul file is 18 KB. Your AI deserves better than a dict.


Dharanidharan J (JD)

LinkedIn · GitHub

Top comments (0)