LLM gpt-oss:20b · Embedding bge-m3 (1024-dim) · SQLite · 3-Tier Memory (STM/MTM/LTM)
Origin
@signalstack replied on previous post's comments(about key collisions) and I reviewed it (thanks to them, we caught critical bugs). Upon reviewing them, three issues surfaced in the memory system:
- Deadlock — Server hangs permanently during memory save operations
- Key mismatch — "favorite food" and "most favorite food" stored as separate entries, leaving stale values
- Key collision — Saving "dog name: Choco" overwrites "cat name: Nabi" because the embedding model can't distinguish them
This post documents how we diagnosed these issues, what experiments we ran, and how we fixed them.
In a 3-tier memory system, when a user says "My hobby is hiking" then later "I switched to pottery" — what happens?
Facts first enter MTM (mid-term memory). After repeated access, they promote to LTM (long-term memory). The problem: during promotion, old values can conflict with new ones. ON CONFLICT(key) DO UPDATE should handle this, but when the LLM extracts the same concept under different keys, the upsert misses entirely.
1st: "favorite_food|pizza|MID" → saved to MTM
2nd: "most_favorite_food|sushi|MID" → different key → saved as separate entry!
We designed 10 scenarios to verify this, capturing LTM/MTM database state after every test.
What Broke First
1. Deadlock: Server hangs forever
remember_mtm() → remember() call chain acquired the same threading.Lock() twice → permanent deadlock. Fixed: Lock() → RLock().
2. Key Inconsistency: Stale food kept appearing
The LLM extracted 좋아하는 음식 (favorite food) and 가장 좋아하는 음식 (most favorite food) as different keys → ON CONFLICT(key) missed → pizza never became doenjang-jjigae.
Embedding Model Shootout
To normalize keys, we need to answer: "Are these two keys the same concept?" We tested 4 models:
| Key Pair | all-minilm | nomic-embed | BGE-M3 | Qwen3-Embed |
|---|---|---|---|---|
| favorite food ↔ most favorite food | 0.95 | 0.96 | 0.92 | 0.92 |
| name ↔ cat's name | 0.89 | 0.96 | 0.69 | 0.63 |
| hobby ↔ blood type | 1.00 ❌ | 1.00 ❌ | 0.40 ✅ | 0.49 |
| cat's name ↔ dog's name | 1.00 ❌ | 1.00 ❌ | 0.87 ✅ | 0.91 ❌ |
all-minilm and nomic-embed couldn't distinguish short Korean words at all (hobby vs blood type = 1.0). Only BGE-M3 separated them correctly.
We also switched from embedding key: value to key-only embedding. Since keyword matching already covers value search in recall, key-only embedding is more efficient — and enables zero-cost key normalization from DB.
Final Approach
BGE-M3 (threshold 0.9) + substring fallback hybrid
Step 1: cosine_similarity(new_key, existing_key) ≥ 0.9 → match
Step 2: substring containment + length ratio ≥ 60% → match
Reuses key-only embeddings stored in DB — only 1 extra Ollama call per new key.
10 Test Cases in Detail
MC01: Hobby Change — Hiking → Pottery
Goal: Basic value replacement works correctly
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC01a | "My hobby is hiking" | — |
hobby: hiking (access=0) |
| MC01b | "I switched to pottery" | — |
hobby: pottery (access=1) |
| MC01 check | "What's my hobby?" |
hobby: pottery ← promoted |
— |
Result: ✅ "Pottery" — upsert in MTM, then normal promotion to LTM
MC02: Job Change — Teacher → Programmer
Goal: Clear factual replacement
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC02a | "I'm a teacher" | hobby: pottery |
job: teacher (access=0) |
| MC02b | "I quit, now a programmer" | hobby: pottery |
job: programmer (access=1) |
| MC02 check | "What's my job?" | hobby, job: programmer ← promoted | — |
Result: ✅ "Programmer" — previous value completely replaced
MC03: Food Preference 3x Change — Pizza → Sushi → Doenjang-jjigae
Goal: Only latest value survives after 3 changes
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC03a | "I love pizza" | hobby, job |
favorite food: pizza (access=0) |
| MC03b | "Changed to sushi" | hobby, job |
favorite food: sushi (access=1) |
| MC03c | "Actually doenjang-jjigae" | hobby, job |
favorite food: doenjang-jjigae (access=2) |
| MC03 check | "Favorite food?" | hobby, job, food: doenjang-jjigae ← promoted | — |
Result: ✅ "Doenjang-jjigae" — access_count ≥ 2 triggers auto-promotion, latest value only
MC04: Natural Correction — "Yoga was just a phase, now it's boxing"
Goal: Correction without explicit "update" command
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC04a | "I do yoga" | job, hobby: yoga, food | — |
| MC04b | "Yoga was brief, now boxing" | job, hobby: boxing, food | — |
| MC04 check | "What exercise?" | job, hobby: boxing, food | — |
Result: ✅ "Boxing" — direct LTM update (existing hobby key: yoga → boxing)
MC05: Address Change — Seoul → Busan
Goal: Specific location data replacement
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC05a | "Live in Seoul Gangnam" | job, hobby, food |
address: Seoul Gangnam (access=0) |
| MC05b | "Moved to Busan Haeundae" | job, hobby, food |
address: Busan Haeundae (access=1) |
| MC05 check | "Where do I live?" | +address: Busan Haeundae ← promoted | — |
Result: ✅ "Busan Haeundae" — previous value completely removed
MC06: Pet Addition — Cat Nabi + Dog Choco (additive, not replacement)
Goal: When adding, not replacing, both entries must survive. This was the key test — all-minilm failed here because name and cat's name had cosine similarity of 1.0.
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC06a | "I have a cat named Nabi" | job, hobby, food, address |
cat name: Nabi (access=0) |
| MC06b | "Also adopted dog Choco" | job, hobby, food, address |
cat name: Nabi (access=1), dog name: Choco (access=0) |
| MC06 check | "Pet names?" | +cat: Nabi, dog: Choco ← both promoted | — |
Result: ✅ "Cat Nabi, Dog Choco" — BGE-M3 correctly separated cat name ≠ dog name (cosine 0.87 < threshold 0.9)
Previous failure: all-minilm gave cosine 1.0 for both keys, causing
name: Chocoto overwritename: Nabi.
MC07: Blood Type Correction — A → AB
Goal: Explicit correction of wrong information
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC07a | "Blood type A" | +blood type: A
|
— |
| MC07b | "No wait, it's AB" |
blood type: A → AB
|
— |
| MC07 check | "Blood type?" | blood type: AB |
— |
Result: ✅ "AB" — forget + remember combo for direct LTM correction
MC08: Preference Reversal — Loves coffee → Quit coffee
Goal: Not just value change but 180° direction reversal
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC08a | "Love coffee, drink daily" | …7 items |
coffee: daily (access=0) |
| MC08b | "Quit coffee, only tea now" | …7 items |
coffee: quit, tea: drinks, caffeine: none
|
| MC08 check | "Do I like coffee?" | …+coffee: likes,tea: drinks
|
— |
Result: ✅ "Coffee quit, no caffeine. Drinks tea."
MC09: Cross-Session Correction — Summer → Autumn
Goal: Can correct memories from a previous session
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC09a | "Love summer" | …existing |
favorite season: summer (access=0) |
| MC09b | "Actually prefer autumn now" | …existing |
favorite season: autumn (access=2) |
| MC09 check | "Favorite season?" | …+season: autumn
|
— |
Result: ✅ "Autumn" — recall → forget(summer) → remember(autumn) pattern
MC10: Multi-Attribute Change — Color red→blue, Number 7→13
Goal: Multiple attributes changed at once without missing any
| Step | Input | LTM | MTM |
|---|---|---|---|
| MC10a | "Color red, number 7" | …existing |
color: red, number: 7
|
| MC10b | "Color blue, number 13" | …existing |
color: blue, number: 13
|
| MC10 check | "Color and number?" | …existing |
color: blue, number: 13
|
Result: ✅ "Blue + 13" — both attributes correctly replaced
Final Scoreboard
| # | Scenario | Before (all-minilm) | After (Hybrid) |
|---|---|---|---|
| MC01 | Hobby: hiking → pottery | ❌ (stale data) | ✅ |
| MC02 | Job: teacher → programmer | ✅ | ✅ |
| MC03 | Food: 3x change | ✅ | ✅ |
| MC04 | Exercise: natural correction | ✅ | ✅ |
| MC05 | Address: moved | ✅ | ✅ |
| MC06 | Pets: cat + dog (additive) | ❌ (key collision) | ✅ |
| MC07 | Blood type: A → AB | ✅ | ✅ |
| MC08 | Coffee: preference reversal | ✅ | ✅ |
| MC09 | Season: cross-session | ✅ | ✅ |
| MC10 | Multi-attribute | ❌ (season conflict) | ✅ |
| Total | 7/10 | 10/10 |
Top comments (1)
@signalstack