DEV Community

Cover image for Give Your Local LLM a Memory That Actually Works
Radu C.
Radu C.

Posted on

Give Your Local LLM a Memory That Actually Works

LLMs Remember Your Name. They Forget Everything Else.

LLMs have conversation memory — for about 20-30 messages. Then the context window fills up and everything before that is gone. The typical fix is vector search: embed conversations, retrieve relevant chunks later.

That works until facts contradict each other, critical information decays at the same rate as small talk, or the system quietly overwrites a drug allergy because the user mentioned a different medication.

widemem is an open-source memory layer that handles the parts vector search alone can't.

What It Does

Batch conflict resolution — When a message has 5 new facts, most systems check each one individually. 5 facts, 5 LLM calls. widemem batches all new facts + related existing memories into one call. The model returns ADD/UPDATE/DELETE per fact.

Importance-weighted scoring — Every fact gets rated 1-10. Retrieval combines similarity, importance, and recency. A 9/10 fact stays relevant for months. A 2/10 fades in weeks.

YMYL safety — Health, legal, financial facts get an importance floor of 8.0, immunity from decay, and forced contradiction detection. "Bank account" triggers it. "River bank" doesn't.

Contradiction detection — Optional callback before overwriting facts that conflict with existing memories. Useful when silent overwrites are dangerous.

Hierarchical memory — Facts → Summaries → Themes. Broad queries get themes, specific queries get facts.

Runs Fully Local

pip install widemem-ai[ollama]
ollama pull llama3.2
Enter fullscreen mode Exit fullscreen mode
from widemem import WideMemory, MemoryConfig
from widemem.core.types import LLMConfig, EmbeddingConfig, VectorStoreConfig

memory = WideMemory(MemoryConfig(
    llm=LLMConfig(provider="ollama", model="llama3.2"),
    embedding=EmbeddingConfig(provider="sentence-transformers", model="all-MiniLM-L6-v2", dimensions=384),
    vector_store=VectorStoreConfig(provider="faiss"),
))
Enter fullscreen mode Exit fullscreen mode

Ollama + sentence-transformers + FAISS + SQLite. Nothing leaves the machine.

Also ships as a FastAPI sidecar (python -m widemem.server) and there's Gin middleware for Ollama that adds memory transparently with one env var.


140 tests. Apache 2.0. Python 3.10+. github.com/remete618/widemem-aipip install widemem-ai

Top comments (0)