DEV Community

VAC Memory System: Persistent Memory for LLMs with State-of-the-Art LoCoMo Accuracy

Introduction

Long-term memory remains one of the most challenging issues for large language models (LLMs). Existing systems either rely entirely on input context (limited length) or suffer from issues like hallucinations and relevance loss.

The VAC Memory System is a unique Retrieval-Augmented Generation (RAG) architecture that provides persistent memory for LLMs. It has proven itself as a leader in the LoCoMo benchmark with an accuracy score of 80.1%. For comparison, the closest competitors, Zep and Mem0, scored 75.14% and 66.9%, respectively.


The Problem and the Solution

LLMs inherently maintain static statistical "memory" embedded in parameters. VAC's objective is to enable dynamic memory retrieval, extracting accurate data without modifying the model.

Key VAC advantages:

  1. MCA (Candidate Filtering): The Multi-Candidate Assessment addresses the false-positive problem found in traditional vector databases (like FAISS), achieving entity-level precision filtering before expensive computations.
   def calculate_query_coverage(query_keywords: set, memory_keywords: set) -> float:
       intersection = len(query_keywords & memory_keywords)
       return intersection / len(query_keywords)
Enter fullscreen mode Exit fullscreen mode
  1. Physics-Inspired Ranking: By conceptualizing text documents as "planets" with "mass" and "gravity," VAC innovates new retrieval mechanisms:
   def calculate_force(query_mass, memory_mass, distance):
       force = G * (query_mass * memory_mass) / (distance ** 2 + DELTA)
       return force
Enter fullscreen mode Exit fullscreen mode
  1. Orchestration: VAC operates modularly, minimizing reliance on LLMs beyond the answer-generation phase.

System Architecture

The VAC Memory System pipeline consists of the following steps:

  1. MCA-PreFilter: Filtering candidates by entity coverage to reduce computational costs.
  2. Vector Processing with FAISS: Embedding and semantic search through 1024D vectors (BGE-Large).
  3. BM25 Search: Traditional exact-matching methods.
  4. Cross-Encoder Reranking: Precision optimization for the top N candidates.

Here is the ranking pipeline code:

def rank_memories(query, memories):
    query_keywords = extract_keywords_simple(query)
    scored_mem = [
        calculate_mass(mem, query_keywords)
        for mem in memories
    ]
    return sorted(scored_mem, key=lambda x: x['force'], reverse=True)
Enter fullscreen mode Exit fullscreen mode

Results

Benchmarks were conducted on LoCoMo's dataset with 100 validated runs. Key metrics:

  • VAC Memory: 80.1%
  • Zep: 75.14%
  • Mem0: 66.9%

Open Source

Experience it yourself:


Final Thoughts

Retrieval-Augmented Generation (RAG) is rapidly evolving. VAC Memory System demonstrates how combining classical algorithms with innovative approaches can significantly improve outcomes.


Top comments (0)