Kim Namhyun

Posted on Feb 25

Solving Key Collisions in LLM Memory — Embedding Model Swap and Hybrid Normalization

#llm #machinelearning #rag #systemdesign

LLM gpt-oss:20b · Embedding bge-m3 (1024-dim) · SQLite · 3-Tier Memory (STM/MTM/LTM)

Origin

@signalstack replied on previous post's comments(about key collisions) and I reviewed it (thanks to them, we caught critical bugs). Upon reviewing them, three issues surfaced in the memory system:

Deadlock — Server hangs permanently during memory save operations
Key mismatch — "favorite food" and "most favorite food" stored as separate entries, leaving stale values
Key collision — Saving "dog name: Choco" overwrites "cat name: Nabi" because the embedding model can't distinguish them

This post documents how we diagnosed these issues, what experiments we ran, and how we fixed them.

In a 3-tier memory system, when a user says "My hobby is hiking" then later "I switched to pottery" — what happens?

Facts first enter MTM (mid-term memory). After repeated access, they promote to LTM (long-term memory). The problem: during promotion, old values can conflict with new ones. ON CONFLICT(key) DO UPDATE should handle this, but when the LLM extracts the same concept under different keys, the upsert misses entirely.

1st: "favorite_food|pizza|MID"         → saved to MTM
2nd: "most_favorite_food|sushi|MID"    → different key → saved as separate entry!

We designed 10 scenarios to verify this, capturing LTM/MTM database state after every test.

What Broke First

1. Deadlock: Server hangs forever

remember_mtm() → remember() call chain acquired the same threading.Lock() twice → permanent deadlock. Fixed: Lock() → RLock().

2. Key Inconsistency: Stale food kept appearing

The LLM extracted 좋아하는 음식 (favorite food) and 가장 좋아하는 음식 (most favorite food) as different keys → ON CONFLICT(key) missed → pizza never became doenjang-jjigae.

Embedding Model Shootout

To normalize keys, we need to answer: "Are these two keys the same concept?" We tested 4 models:

Key Pair	all-minilm	nomic-embed	BGE-M3	Qwen3-Embed
favorite food ↔ most favorite food	0.95	0.96	0.92	0.92
name ↔ cat's name	0.89	0.96	0.69	0.63
hobby ↔ blood type	1.00 ❌	1.00 ❌	0.40 ✅	0.49
cat's name ↔ dog's name	1.00 ❌	1.00 ❌	0.87 ✅	0.91 ❌

all-minilm and nomic-embed couldn't distinguish short Korean words at all (hobby vs blood type = 1.0). Only BGE-M3 separated them correctly.

We also switched from embedding key: value to key-only embedding. Since keyword matching already covers value search in recall, key-only embedding is more efficient — and enables zero-cost key normalization from DB.

Final Approach

BGE-M3 (threshold 0.9) + substring fallback hybrid

Step 1: cosine_similarity(new_key, existing_key) ≥ 0.9 → match
Step 2: substring containment + length ratio ≥ 60% → match

Reuses key-only embeddings stored in DB — only 1 extra Ollama call per new key.

10 Test Cases in Detail

MC01: Hobby Change — Hiking → Pottery

Goal: Basic value replacement works correctly

Step	Input	LTM	MTM
MC01a	"My hobby is hiking"	—	`hobby: hiking` (access=0)
MC01b	"I switched to pottery"	—	`hobby: pottery` (access=1)
MC01 check	"What's my hobby?"	`hobby: pottery` ← promoted	—

Result: ✅ "Pottery" — upsert in MTM, then normal promotion to LTM

MC02: Job Change — Teacher → Programmer

Goal: Clear factual replacement

Step	Input	LTM	MTM
MC02a	"I'm a teacher"	hobby: pottery	`job: teacher` (access=0)
MC02b	"I quit, now a programmer"	hobby: pottery	`job: programmer` (access=1)
MC02 check	"What's my job?"	hobby, job: programmer ← promoted	—

Result: ✅ "Programmer" — previous value completely replaced

MC03: Food Preference 3x Change — Pizza → Sushi → Doenjang-jjigae

Goal: Only latest value survives after 3 changes

Step	Input	LTM	MTM
MC03a	"I love pizza"	hobby, job	`favorite food: pizza` (access=0)
MC03b	"Changed to sushi"	hobby, job	`favorite food: sushi` (access=1)
MC03c	"Actually doenjang-jjigae"	hobby, job	`favorite food: doenjang-jjigae` (access=2)
MC03 check	"Favorite food?"	hobby, job, food: doenjang-jjigae ← promoted	—

Result: ✅ "Doenjang-jjigae" — access_count ≥ 2 triggers auto-promotion, latest value only

MC04: Natural Correction — "Yoga was just a phase, now it's boxing"

Goal: Correction without explicit "update" command

Step	Input	LTM	MTM
MC04a	"I do yoga"	job, hobby: yoga, food	—
MC04b	"Yoga was brief, now boxing"	job, hobby: boxing, food	—
MC04 check	"What exercise?"	job, hobby: boxing, food	—

Result: ✅ "Boxing" — direct LTM update (existing hobby key: yoga → boxing)

MC05: Address Change — Seoul → Busan

Goal: Specific location data replacement

Step	Input	LTM	MTM
MC05a	"Live in Seoul Gangnam"	job, hobby, food	`address: Seoul Gangnam` (access=0)
MC05b	"Moved to Busan Haeundae"	job, hobby, food	`address: Busan Haeundae` (access=1)
MC05 check	"Where do I live?"	+address: Busan Haeundae ← promoted	—

Result: ✅ "Busan Haeundae" — previous value completely removed

MC06: Pet Addition — Cat Nabi + Dog Choco (additive, not replacement)

Goal: When adding, not replacing, both entries must survive. This was the key test — all-minilm failed here because name and cat's name had cosine similarity of 1.0.

Step	Input	LTM	MTM
MC06a	"I have a cat named Nabi"	job, hobby, food, address	`cat name: Nabi` (access=0)
MC06b	"Also adopted dog Choco"	job, hobby, food, address	`cat name: Nabi` (access=1), `dog name: Choco` (access=0)
MC06 check	"Pet names?"	+cat: Nabi, dog: Choco ← both promoted	—

Result: ✅ "Cat Nabi, Dog Choco" — BGE-M3 correctly separated cat name ≠ dog name (cosine 0.87 < threshold 0.9)

Previous failure: all-minilm gave cosine 1.0 for both keys, causing name: Choco to overwrite name: Nabi.

MC07: Blood Type Correction — A → AB

Goal: Explicit correction of wrong information

Step	Input	LTM	MTM
MC07a	"Blood type A"	+`blood type: A`	—
MC07b	"No wait, it's AB"	`blood type: A` → `AB`	—
MC07 check	"Blood type?"	`blood type: AB`	—

Result: ✅ "AB" — forget + remember combo for direct LTM correction

MC08: Preference Reversal — Loves coffee → Quit coffee

Goal: Not just value change but 180° direction reversal

Step	Input	LTM	MTM
MC08a	"Love coffee, drink daily"	…7 items	`coffee: daily` (access=0)
MC08b	"Quit coffee, only tea now"	…7 items	`coffee: quit`, `tea: drinks`, `caffeine: none`
MC08 check	"Do I like coffee?"	…+`coffee: likes`,`tea: drinks`	—

Result: ✅ "Coffee quit, no caffeine. Drinks tea."

MC09: Cross-Session Correction — Summer → Autumn

Goal: Can correct memories from a previous session

Step	Input	LTM	MTM
MC09a	"Love summer"	…existing	`favorite season: summer` (access=0)
MC09b	"Actually prefer autumn now"	…existing	`favorite season: autumn` (access=2)
MC09 check	"Favorite season?"	…+`season: autumn`	—

Result: ✅ "Autumn" — recall → forget(summer) → remember(autumn) pattern

MC10: Multi-Attribute Change — Color red→blue, Number 7→13

Goal: Multiple attributes changed at once without missing any

Step	Input	LTM	MTM
MC10a	"Color red, number 7"	…existing	`color: red`, `number: 7`
MC10b	"Color blue, number 13"	…existing	`color: blue`, `number: 13`
MC10 check	"Color and number?"	…existing	`color: blue`, `number: 13`

Result: ✅ "Blue + 13" — both attributes correctly replaced

Final Scoreboard

#	Scenario	Before (all-minilm)	After (Hybrid)
MC01	Hobby: hiking → pottery	❌ (stale data)	✅
MC02	Job: teacher → programmer	✅	✅
MC03	Food: 3x change	✅	✅
MC04	Exercise: natural correction	✅	✅
MC05	Address: moved	✅	✅
MC06	Pets: cat + dog (additive)	❌ (key collision)	✅
MC07	Blood type: A → AB	✅	✅
MC08	Coffee: preference reversal	✅	✅
MC09	Season: cross-session	✅	✅
MC10	Multi-attribute	❌ (season conflict)	✅
Total		7/10	10/10

Top comments (1)

Kim Namhyun • Feb 25

@signalstack