Most local LLM setups have a memory problem. They either save everything — which means your assistant is cluttered with useless trivia and small talk — or they save nothing at all. I wanted something smarter, so I built MemoryGate.
The Problem
Imagine you tell your local LLM assistant:
- "My mom was diagnosed with diabetes last week"
- "My AWS API key is AKIA..."
- "The project deadline is Friday"
And then later:
- "What's the weather like?"
- "Tell me a joke"
- "What year was the Eiffel Tower built?"
All of these turns are treated equally. Most memory systems will either save all of them or none of them. But clearly the first three matter and the last three don't. That gap is what MemoryGate fills.
What MemoryGate Does
MemoryGate is a three-stage pipeline that runs entirely locally:
Stage 1 — Generate training data
A local LLM running via LM Studio generates labelled conversation examples. Each example is tagged as either important (label = 1) or not important (label = 0).
High importance examples cover things like:
- Medical diagnoses, prescriptions, allergies
- Passwords, API keys, access tokens
- Legal deadlines and contract details
- Personal grief and family emergencies
- Financial decisions and bank details
Low importance examples cover things like:
- Casual greetings and small talk
- General trivia and history facts
- Jokes and creative requests
- Simple definitions
Stage 2 — Train the classifier
A DistilBERT model is fine-tuned on that synthetic data using PyTorch. The training pipeline includes mixed precision (AMP), cosine learning rate scheduling, gradient clipping, and stratified train/val splits. The best checkpoint is saved based on validation loss.
Stage 3 — Run the memory filter
At runtime, every conversation turn is scored by the classifier. If the score exceeds the importance threshold (default 0.60), the turn is encrypted and saved to a ChromaDB RAG store. If it doesn't, it's discarded.
The Tech Stack
- PyTorch + HuggingFace Transformers — DistilBERT fine-tuning
- ChromaDB — vector store for RAG memory retrieval
- Sentence Transformers — embedding conversations for similarity search
- Fernet + PBKDF2HMAC — end-to-end encryption for all stored memories
- LM Studio — local LLM for synthetic data generation and chat
- Whisper STT + Kokoro TTS — optional voice I/O for hands-free use
- DuckDuckGo search — automatic web search with no API key required
Everything runs locally. No cloud, no API keys, no data leaving your machine.
Why Train a Classifier Instead of Prompting?
A common question I get is: why not just ask the LLM itself whether something is important?
A few reasons:
- Speed — Running a full LLM inference pass after every conversation turn adds significant latency. DistilBERT is tiny and fast.
- Cost — If you are using a large local model, every inference pass is expensive on your hardware. The classifier adds almost nothing.
- Consistency — LLMs can be inconsistent judges of their own outputs. A fine-tuned classifier gives you a reliable, calibrated score every time.
- Control — You own the training data and the model. You can retrain it on your own categories, adjust the threshold, and fully audit what gets saved.
Encryption
One thing I cared about deeply was privacy. Every memory stored by MemoryGate is encrypted using Fernet symmetric encryption with a key derived via PBKDF2HMAC (600,000 iterations, SHA-256). Your master password never leaves your machine and the encryption salt is stored locally.
This means even if someone gets access to your ChromaDB files, they cannot read your memories without your master password.
Example Output
Here is what a typical session looks like:
You: My doctor said I need to start metformin 500mg twice daily
Importance: 91% ✅ SAVED | Total memories: 12
You: What's the capital of France?
Importance: 4% not saved | Total memories: 12
The medical instruction gets saved. The trivia question doesn't. Exactly as intended.
What's Next
A few things I want to improve:
- A config file so you can add your own importance categories without touching the code
- A web UI instead of the terminal interface
- Support for other embedding models beyond all-MiniLM-L6-v2
- Multi-user support with separate encrypted memory stores
Try It
MemoryGate is open source under AGPL-3.0.
GitHub: https://github.com/ErenalpCet/MemoryGate
You will need Python 3.10, Anaconda, and LM Studio running locally with a model loaded. Full setup instructions are in the README.
I would love feedback — especially on the importance categories and whether the threshold approach makes sense for your use case. Drop a comment or open a Discussion on the repo.
Built by ErenalpCet — a student passionate about AI, PyTorch, and privacy-first software.
Top comments (0)