Why Your AI Agent's Memory Is Broken (And How to Fix It With SQLite)
Every developer who has built a "memory-enabled" chatbot knows the drill: chunk the conversation, generate embeddings, shove everything into Qdrant or Pinecone, fetch Top-K by cosine similarity. Done, right?
Wrong. And by the time your agent serves its 500th conversation, you'll understand exactly why.
The Problem: Classic RAG Destroys Long-Lived Agents
Here's a concrete failure case I ran into while building a persistent local AI agent:
A user said: "I prefer Python." A week later: "I'm writing in Rust now." Another week: "What language should I use for a CLI tool?"
The agent fetched both facts from the vector store with nearly identical cosine scores and delivered an answer that blended click with clap, argparse with structopt. Pure schizophrenia.
Vector databases don't know about time. They don't know that one fact supersedes another. They don't forget — ever. And that's a fundamental architectural mismatch for agents that need to maintain a coherent model of a person over weeks and months.
The real problems compound fast:
- Context pollution — after a month, thousands of fragments compete for attention with equal weight
- No conflict resolution — user worked at Company A, now works at Company B, agent sees both with equal confidence
- No provenance — where did this fact come from? When? Can we trust it?
- Zero forgetting — irrelevant info from 6 months ago competes with critical facts from yesterday
In 2026, with agents handling increasingly long-horizon tasks, this isn't a minor UX issue. It's a fundamental reliability problem.
Why It Matters More Than Ever
AI agents in 2026 aren't just answering questions — they're managing codebases, scheduling tasks, maintaining context across multi-day projects. The gap between "a chatbot that remembers" and "an agent with genuine working memory" is enormous.
If your agent can't resolve contradictions, can't weight recent information higher, can't gracefully forget stale data — it will gaslight users at scale. That's not a product; it's a liability.
The good news: you don't need a distributed vector database or a specialized graph DB service. You need SQLite and a solid architecture.
The Solution: Graph Cognitive Memory on a Single SQLite File
Instead of a flat vector store, implement a typed graph with four distinct node types, five edge types, full-text search, vector search, and the Ebbinghaus forgetting curve — all in one .db file.
The Node Schema
CREATE TABLE nodes (
id TEXT PRIMARY KEY,
type TEXT NOT NULL CHECK(type IN (
'episodic','semantic','procedural','opinion')),
content TEXT NOT NULL,
embedding BLOB,
event_time INTEGER NOT NULL,
valid_from INTEGER NOT NULL,
valid_until INTEGER, -- NULL = still valid
confidence REAL NOT NULL DEFAULT 1.0,
access_count INTEGER NOT NULL DEFAULT 0,
last_accessed INTEGER,
decay_rate REAL NOT NULL DEFAULT 0.1,
source_type TEXT, -- conversation | consolidation | tool
session_id TEXT
);
Four node types with different lifecycles:
| Type | Created by | Purpose | Forgetting |
|---|---|---|---|
episodic |
Every message (hot path) | Raw dialogue — immutable audit trail | Never |
semantic |
Background LLM consolidator | Facts: "User prefers dark mode" | Ebbinghaus decay |
procedural |
Consolidator | Action patterns | Ebbinghaus decay |
opinion |
Consolidator | Subjective ratings | Ebbinghaus decay |
The Edge Schema
CREATE TABLE edges (
id TEXT PRIMARY KEY,
source_id TEXT REFERENCES nodes(id) ON DELETE CASCADE,
target_id TEXT REFERENCES nodes(id) ON DELETE CASCADE,
relation_type TEXT NOT NULL CHECK(relation_type IN (
'temporal','causal','entity','derived_from','supersedes')),
confidence REAL NOT NULL DEFAULT 1.0,
valid_from INTEGER NOT NULL,
valid_until INTEGER
);
The supersedes edge is the key to conflict resolution. When user moves from Company A to Company B, the new fact creates a supersedes edge pointing to the old one — the old fact gets soft-deleted, history is preserved.
Hybrid Search: FTS5 + Vector + Graph
-- Full-text search index (external content, zero duplication)
CREATE VIRTUAL TABLE nodes_fts USING fts5(
content, content=nodes, content_rowid=rowid,
tokenize='unicode61'
);
-- Vector search (sqlite-vec extension)
CREATE VIRTUAL TABLE vec_nodes USING vec0(
node_id TEXT PRIMARY KEY, embedding float[256]
);
At query time, combine all three with Reciprocal Rank Fusion:
def reciprocal_rank_fusion(result_lists: list[list[str]], k=60) -> list[str]:
scores = {}
for results in result_lists:
for rank, node_id in enumerate(results):
scores[node_id] = scores.get(node_id, 0) + 1.0 / (k + rank + 1)
return sorted(scores, key=scores.get, reverse=True)
# Usage: merge FTS5, vector KNN, and graph traversal results
combined = reciprocal_rank_fusion([fts_results, vector_results, graph_results])
FTS5 catches keyword matches. Vectors catch semantic similarity. Graph traversal follows entity relationships. RRF merges them without tuning weights.
The Write Path: 50ms Hot Path, No LLM
The critical principle: LLM on write, algorithms on read. The hot path (user sends message → response) must never block on an LLM call.
async def _on_user_message(self, data: dict) -> None:
text = data.get("text", "").strip()
session_id = data.get("session_id")
# Detect session change → kick off consolidation of old session
if session_id != self._current_session_id:
if self._current_session_id:
asyncio.create_task(
self._consolidate_session(self._current_session_id)
)
self._current_session_id = session_id
# Create episodic node — fire and forget
node_id = str(uuid.uuid4())
self._write_queue.put_nowait(WriteOp(
sql=INSERT_NODE_SQL,
params={"id": node_id, "type": "episodic", "content": text, ...},
future=None # No await needed
))
# Link to previous episode with temporal edge
if prev_id := await self._storage.get_last_episode_id(session_id):
self._write_queue.put_nowait(WriteOp(
sql=INSERT_EDGE_SQL,
params={"source_id": prev_id, "target_id": node_id,
"relation_type": "temporal", ...},
future=None
))
# Generate embedding in background — don't block UI
if self._embed_fn:
asyncio.create_task(self._slow_path(node_id, text))
All writes go through a single-writer queue — the only way to avoid SQLite's "database is locked" under concurrent async workloads. A dedicated writer coroutine drains the queue sequentially with WAL mode enabled. Reads use a separate connection with PRAGMA query_only=ON.
Forgetting: The Ebbinghaus Curve
Without forgetting, context bloats. Old irrelevant facts compete with fresh ones. The solution: implement the Ebbinghaus forgetting curve.
import math
class DecayService:
THRESHOLD = 0.05 # prune below this confidence
async def apply(self, storage) -> dict:
nodes = await storage.get_decayable_nodes()
updates, to_prune = [], []
for node in nodes:
days_since = (now() - node["last_accessed"]) / 86400.0
# Sub-exponential decay — mirrors human forgetting rate
new_confidence = node["confidence"] * math.exp(
-node["decay_rate"] * (max(0, days_since) ** 0.8)
)
if new_confidence < self.THRESHOLD:
to_prune.append(node["id"])
else:
updates.append((node["id"], new_confidence))
await storage.batch_update_confidence(updates)
await storage.soft_delete_nodes(to_prune) # soft delete, never physical
return {"decayed": len(updates), "pruned": len(to_prune)}
The 0.8 exponent makes decay sub-exponential — slower than pure exponential, matching how human memory actually fades. Episodic nodes (raw dialogue) never decay — they're the immutable audit trail. Access reinforcement is built in: every retrieval increments access_count and resets last_accessed, refreshing the confidence.
Conflict Resolution in Practice
Background consolidation runs on session end or nightly cron. A cheap LLM (gpt-5-mini or local Ollama) extracts facts from raw episodes, detects conflicts via hybrid search, and resolves them:
async def resolve_conflict(old_node_id: str, new_node_id: str):
# Lower confidence and accelerate decay on old fact
await storage.update_node_fields(old_node_id, {
"confidence": 0.3,
"decay_rate": 0.5 # will be pruned within days
})
# Soft-delete — sets valid_until = now()
await storage.soft_delete_node(old_node_id)
# Evolutionary edge — history preserved
await storage.insert_edge({
"source_id": new_node_id,
"target_id": old_node_id,
"relation_type": "supersedes",
"confidence": 0.95
})
Result: old fact is invisible to queries (WHERE valid_until IS NULL), new fact is authoritative, full evolution history is queryable if needed.
Key Takeaways
Vector databases alone are not memory — they're search indexes. For agents, you need temporal ordering, conflict resolution, and forgetting.
SQLite is enough — FTS5 + sqlite-vec + WAL mode gives you full-text, vector, and graph search in a single file. No infrastructure overhead.
Keep LLM off the hot path — 50ms episodic write with async queue, background consolidation with cheap models (costs less than a cent per session).
Implement forgetting — the Ebbinghaus curve is a 10-line function that prevents context bloat and keeps the agent grounded in current reality.
Type your edges —
temporal,causal,supersedes,derived_from,entityedges carry structural information no embedding can express.
The full implementation (code + SQL schema) is on GitHub: github.com/VitalyOborin/yodoca
Built something similar? Check out gerus-lab.com
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.