Multi-Signal Memory Architecture for AI Agents

#architecture #rag #ai #agents

Multi-Signal Memory Architecture for AI Agents

Most AI agent memory systems use a single retrieval signal: embedding similarity. You embed the query, embed the memories, and return the top-k by cosine similarity. This works for simple cases but fails spectacularly for agents that need to remember context across weeks of conversation.

The Problem with Single-Signal Retrieval

Imagine your agent has 10,000 memories. The user asks "What did we decide about the wallet?" Embedding search returns memories about "wallet" — but it might miss the memory that says "Colby decided to use Base chain for all payments" because the embedding for "decide" and "wallet" doesn't strongly align with "what did we decide about the wallet."

The Four Signals

Norax uses four retrieval signals, combined with learned weights:

1. Keyword Matching (BM25)

Fast, exact term matching. If the user says "TypeORM," you want memories containing "TypeORM" — not memories that are semantically similar but use different words.

2. Embedding Similarity

Dense vector similarity captures semantic relationships. "Crypto wallet" should match "EVM address on Base" even though no words overlap.

3. Temporal Decay

Recent memories are more likely to be relevant. A memory from 2 hours ago gets a boost; a memory from 3 months ago gets penalized. The decay rate is configurable per memory kind — procedural memories decay slower than scratchpad entries.

4. Entity Graph Reranking

After the first three signals produce candidates, we rerank based on entity overlap. If the query mentions "Colby" and "wallet," memories that contain both entities get a boost. This is implemented as a community-detected graph where entities are nodes and co-occurrence creates edges.

Combining the Signals

def combine_signals(query, candidates, entities):
    for item in candidates:
        # Normalize each signal to [0, 1]
        kw_score = bm25_score(query, item) / max_bm25
        emb_score = cosine_sim(embed(query), item.embedding)
        temp_score = math.exp(-age_days(item) / 30)
        ent_score = entity_overlap(item, entities)

        # Weighted combination
        item.final_score = (
            0.25 * kw_score +
            0.35 * emb_score +
            0.20 * temp_score +
            0.20 * ent_score
        )
    return sorted(candidates, key=lambda x: x.final_score, reverse=True)

Results

In practice, multi-signal retrieval outperforms single-signal by 40-60% on recall@10 for agent memory workloads. The biggest wins come from entity graph reranking — it catches relationships that embedding similarity alone misses.

Conclusion

If you're building an AI agent, don't rely on embedding similarity alone. Add keyword matching for precision, temporal decay for recency, and entity graphs for relationship awareness. The combination is dramatically more effective than any single signal.

DEV Community

Multi-Signal Memory Architecture for AI Agents

Multi-Signal Memory Architecture for AI Agents

The Problem with Single-Signal Retrieval

The Four Signals

1. Keyword Matching (BM25)

2. Embedding Similarity

3. Temporal Decay

4. Entity Graph Reranking

Combining the Signals

Results

Conclusion

Top comments (0)