Most RAG implementations are lazy. Developers just throw embeddings into a vector database and hope for the best. That is not how you build a production-grade AI agent with long-term memory.
The standard approach of 100% semantic similarity is fundamentally flawed. It lacks the human element of situational awareness and chronological priority.
If a user mentioned their favorite coffee once three years ago, but just told the AI about a critical life event ten minutes ago, a basic vector search treats them with equal weight — if the "coffee" embedding happens to be a closer match to the current query.
What I Built Instead
I solved this in the https://wtmf.ai codebase by implementing a Hybrid Scoring Algorithm in our memory-context utility.
We moved away from a singular focus on cosine similarity and engineered a multi-dimensional ranking system that weighs context like a biological brain would.
The Scoring Model Breakdown
Semantic Similarity (70%)
We use OpenAI text-embedding-3-small and pgvector to handle the heavy lifting of conceptual matching.
- 1536 dimensions of context provide the foundation
- But they are no longer the final word
Recency Decay (15%)
We implemented a time-decay function directly in the SQL query.
- Calculates the epoch difference between now and the memory's creation date
- Exponentially boosts anything from the last 30 days
This ensures that the AI stays "in the moment" and doesn't hallucinate context from months-old conversations.
Explicit Importance (15%)
Not all sentences are created equal.
During memory extraction, we assign an importance score based on the critical nature of the information.
- User’s name or medical allergy → High importance
- Casual comment about the weather → Low importance
This ensures high-impact data resurfaces, regardless of semantic proximity or age.
The Outcome
The result is a memory recall system where the AI doesn't just retrieve data — it prioritizes it.
It understands that human context is a mix of:
- What you said
- When you said it
- How much it actually matters
Final Thought
This is the difference between:
- A chatbot that reads a database
- And an agent that actually remembers you
Stop building basic RAG. Start engineering memory.
Top comments (0)