Erenalp Çetintürk

Posted on Mar 31

I Built a DistilBERT Classifier That Filters What Your Local LLM Remembers

#ai #security #python #opensource

Most local LLM setups have a memory problem. They either save everything — which means your assistant is cluttered with useless trivia and small talk — or they save nothing at all. I wanted something smarter, so I built MemoryGate.

The Problem

Imagine you tell your local LLM assistant:

"My mom was diagnosed with diabetes last week"
"My AWS API key is AKIA..."
"The project deadline is Friday"

And then later:

"What's the weather like?"
"Tell me a joke"
"What year was the Eiffel Tower built?"

All of these turns are treated equally. Most memory systems will either save all of them or none of them. But clearly the first three matter and the last three don't. That gap is what MemoryGate fills.

What MemoryGate Does

MemoryGate is a three-stage pipeline that runs entirely locally:

Stage 1 — Generate training data

A local LLM running via LM Studio generates labelled conversation examples. Each example is tagged as either important (label = 1) or not important (label = 0).

High importance examples cover things like:

Medical diagnoses, prescriptions, allergies
Passwords, API keys, access tokens
Legal deadlines and contract details
Personal grief and family emergencies
Financial decisions and bank details

Low importance examples cover things like:

Casual greetings and small talk
General trivia and history facts
Jokes and creative requests
Simple definitions

Stage 2 — Train the classifier

A DistilBERT model is fine-tuned on that synthetic data using PyTorch. The training pipeline includes mixed precision (AMP), cosine learning rate scheduling, gradient clipping, and stratified train/val splits. The best checkpoint is saved based on validation loss.

Stage 3 — Run the memory filter

At runtime, every conversation turn is scored by the classifier. If the score exceeds the importance threshold (default 0.60), the turn is encrypted and saved to a ChromaDB RAG store. If it doesn't, it's discarded.

The Tech Stack

PyTorch + HuggingFace Transformers — DistilBERT fine-tuning
ChromaDB — vector store for RAG memory retrieval
Sentence Transformers — embedding conversations for similarity search
Fernet + PBKDF2HMAC — end-to-end encryption for all stored memories
LM Studio — local LLM for synthetic data generation and chat
Whisper STT + Kokoro TTS — optional voice I/O for hands-free use
DuckDuckGo search — automatic web search with no API key required

Everything runs locally. No cloud, no API keys, no data leaving your machine.

Why Train a Classifier Instead of Prompting?

A common question I get is: why not just ask the LLM itself whether something is important?

A few reasons:

Speed — Running a full LLM inference pass after every conversation turn adds significant latency. DistilBERT is tiny and fast.
Cost — If you are using a large local model, every inference pass is expensive on your hardware. The classifier adds almost nothing.
Consistency — LLMs can be inconsistent judges of their own outputs. A fine-tuned classifier gives you a reliable, calibrated score every time.
Control — You own the training data and the model. You can retrain it on your own categories, adjust the threshold, and fully audit what gets saved.

Encryption

One thing I cared about deeply was privacy. Every memory stored by MemoryGate is encrypted using Fernet symmetric encryption with a key derived via PBKDF2HMAC (600,000 iterations, SHA-256). Your master password never leaves your machine and the encryption salt is stored locally.

This means even if someone gets access to your ChromaDB files, they cannot read your memories without your master password.

Example Output

Here is what a typical session looks like:

You: My doctor said I need to start metformin 500mg twice daily

Importance: 91% ✅ SAVED | Total memories: 12

You: What's the capital of France?

Importance: 4% not saved | Total memories: 12

The medical instruction gets saved. The trivia question doesn't. Exactly as intended.

What's Next

A few things I want to improve:

A config file so you can add your own importance categories without touching the code
A web UI instead of the terminal interface
Support for other embedding models beyond all-MiniLM-L6-v2
Multi-user support with separate encrypted memory stores

Try It

MemoryGate is open source under AGPL-3.0.

GitHub: https://github.com/ErenalpCet/MemoryGate

You will need Python 3.10, Anaconda, and LM Studio running locally with a model loaded. Full setup instructions are in the README.

I would love feedback — especially on the importance categories and whether the threshold approach makes sense for your use case. Drop a comment or open a Discussion on the repo.

Built by ErenalpCet — a student passionate about AI, PyTorch, and privacy-first software.

DEV Community