Why Your AI Agent's Memory Is Broken (And How to Fix It With SQLite)

#python #ai #architecture #sqlite

Why Your AI Agent's Memory Is Broken (And How to Fix It With SQLite)

Every developer who has built a "memory-enabled" chatbot knows the drill: chunk the conversation, generate embeddings, shove everything into Qdrant or Pinecone, fetch Top-K by cosine similarity. Done, right?

Wrong. And by the time your agent serves its 500th conversation, you'll understand exactly why.

The Problem: Classic RAG Destroys Long-Lived Agents

Here's a concrete failure case I ran into while building a persistent local AI agent:

A user said: "I prefer Python." A week later: "I'm writing in Rust now." Another week: "What language should I use for a CLI tool?"

The agent fetched both facts from the vector store with nearly identical cosine scores and delivered an answer that blended click with clap, argparse with structopt. Pure schizophrenia.

Vector databases don't know about time. They don't know that one fact supersedes another. They don't forget — ever. And that's a fundamental architectural mismatch for agents that need to maintain a coherent model of a person over weeks and months.

The real problems compound fast:

Context pollution — after a month, thousands of fragments compete for attention with equal weight
No conflict resolution — user worked at Company A, now works at Company B, agent sees both with equal confidence
No provenance — where did this fact come from? When? Can we trust it?
Zero forgetting — irrelevant info from 6 months ago competes with critical facts from yesterday

In 2026, with agents handling increasingly long-horizon tasks, this isn't a minor UX issue. It's a fundamental reliability problem.

Why It Matters More Than Ever

AI agents in 2026 aren't just answering questions — they're managing codebases, scheduling tasks, maintaining context across multi-day projects. The gap between "a chatbot that remembers" and "an agent with genuine working memory" is enormous.

If your agent can't resolve contradictions, can't weight recent information higher, can't gracefully forget stale data — it will gaslight users at scale. That's not a product; it's a liability.

The good news: you don't need a distributed vector database or a specialized graph DB service. You need SQLite and a solid architecture.

The Solution: Graph Cognitive Memory on a Single SQLite File

Instead of a flat vector store, implement a typed graph with four distinct node types, five edge types, full-text search, vector search, and the Ebbinghaus forgetting curve — all in one .db file.

The Node Schema

CREATE TABLE nodes (
    id          TEXT PRIMARY KEY,
    type        TEXT NOT NULL CHECK(type IN (
                    'episodic','semantic','procedural','opinion')),
    content     TEXT NOT NULL,
    embedding   BLOB,

    event_time  INTEGER NOT NULL,
    valid_from  INTEGER NOT NULL,
    valid_until INTEGER,          -- NULL = still valid

    confidence  REAL NOT NULL DEFAULT 1.0,
    access_count INTEGER NOT NULL DEFAULT 0,
    last_accessed INTEGER,
    decay_rate  REAL NOT NULL DEFAULT 0.1,

    source_type TEXT,             -- conversation | consolidation | tool
    session_id  TEXT
);

Four node types with different lifecycles:

Type	Created by	Purpose	Forgetting
`episodic`	Every message (hot path)	Raw dialogue — immutable audit trail	Never
`semantic`	Background LLM consolidator	Facts: "User prefers dark mode"	Ebbinghaus decay
`procedural`	Consolidator	Action patterns	Ebbinghaus decay
`opinion`	Consolidator	Subjective ratings	Ebbinghaus decay

The Edge Schema

CREATE TABLE edges (
    id           TEXT PRIMARY KEY,
    source_id    TEXT REFERENCES nodes(id) ON DELETE CASCADE,
    target_id    TEXT REFERENCES nodes(id) ON DELETE CASCADE,
    relation_type TEXT NOT NULL CHECK(relation_type IN (
                    'temporal','causal','entity','derived_from','supersedes')),
    confidence   REAL NOT NULL DEFAULT 1.0,
    valid_from   INTEGER NOT NULL,
    valid_until  INTEGER
);

The supersedes edge is the key to conflict resolution. When user moves from Company A to Company B, the new fact creates a supersedes edge pointing to the old one — the old fact gets soft-deleted, history is preserved.

Hybrid Search: FTS5 + Vector + Graph

-- Full-text search index (external content, zero duplication)
CREATE VIRTUAL TABLE nodes_fts USING fts5(
    content, content=nodes, content_rowid=rowid,
    tokenize='unicode61'
);

-- Vector search (sqlite-vec extension)
CREATE VIRTUAL TABLE vec_nodes USING vec0(
    node_id TEXT PRIMARY KEY, embedding float[256]
);

At query time, combine all three with Reciprocal Rank Fusion:

def reciprocal_rank_fusion(result_lists: list[list[str]], k=60) -> list[str]:
    scores = {}
    for results in result_lists:
        for rank, node_id in enumerate(results):
            scores[node_id] = scores.get(node_id, 0) + 1.0 / (k + rank + 1)
    return sorted(scores, key=scores.get, reverse=True)

# Usage: merge FTS5, vector KNN, and graph traversal results
combined = reciprocal_rank_fusion([fts_results, vector_results, graph_results])

FTS5 catches keyword matches. Vectors catch semantic similarity. Graph traversal follows entity relationships. RRF merges them without tuning weights.

The Write Path: 50ms Hot Path, No LLM

The critical principle: LLM on write, algorithms on read. The hot path (user sends message → response) must never block on an LLM call.

async def _on_user_message(self, data: dict) -> None:
    text = data.get("text", "").strip()
    session_id = data.get("session_id")

    # Detect session change → kick off consolidation of old session
    if session_id != self._current_session_id:
        if self._current_session_id:
            asyncio.create_task(
                self._consolidate_session(self._current_session_id)
            )
        self._current_session_id = session_id

    # Create episodic node — fire and forget
    node_id = str(uuid.uuid4())
    self._write_queue.put_nowait(WriteOp(
        sql=INSERT_NODE_SQL,
        params={"id": node_id, "type": "episodic", "content": text, ...},
        future=None  # No await needed
    ))

    # Link to previous episode with temporal edge
    if prev_id := await self._storage.get_last_episode_id(session_id):
        self._write_queue.put_nowait(WriteOp(
            sql=INSERT_EDGE_SQL,
            params={"source_id": prev_id, "target_id": node_id,
                    "relation_type": "temporal", ...},
            future=None
        ))

    # Generate embedding in background — don't block UI
    if self._embed_fn:
        asyncio.create_task(self._slow_path(node_id, text))

All writes go through a single-writer queue — the only way to avoid SQLite's "database is locked" under concurrent async workloads. A dedicated writer coroutine drains the queue sequentially with WAL mode enabled. Reads use a separate connection with PRAGMA query_only=ON.

Forgetting: The Ebbinghaus Curve

Without forgetting, context bloats. Old irrelevant facts compete with fresh ones. The solution: implement the Ebbinghaus forgetting curve.

import math

class DecayService:
    THRESHOLD = 0.05  # prune below this confidence

    async def apply(self, storage) -> dict:
        nodes = await storage.get_decayable_nodes()
        updates, to_prune = [], []

        for node in nodes:
            days_since = (now() - node["last_accessed"]) / 86400.0
            # Sub-exponential decay — mirrors human forgetting rate
            new_confidence = node["confidence"] * math.exp(
                -node["decay_rate"] * (max(0, days_since) ** 0.8)
            )

            if new_confidence < self.THRESHOLD:
                to_prune.append(node["id"])
            else:
                updates.append((node["id"], new_confidence))

        await storage.batch_update_confidence(updates)
        await storage.soft_delete_nodes(to_prune)  # soft delete, never physical

        return {"decayed": len(updates), "pruned": len(to_prune)}

The 0.8 exponent makes decay sub-exponential — slower than pure exponential, matching how human memory actually fades. Episodic nodes (raw dialogue) never decay — they're the immutable audit trail. Access reinforcement is built in: every retrieval increments access_count and resets last_accessed, refreshing the confidence.

Conflict Resolution in Practice

Background consolidation runs on session end or nightly cron. A cheap LLM (gpt-5-mini or local Ollama) extracts facts from raw episodes, detects conflicts via hybrid search, and resolves them:

async def resolve_conflict(old_node_id: str, new_node_id: str):
    # Lower confidence and accelerate decay on old fact
    await storage.update_node_fields(old_node_id, {
        "confidence": 0.3,
        "decay_rate": 0.5   # will be pruned within days
    })

    # Soft-delete — sets valid_until = now()
    await storage.soft_delete_node(old_node_id)

    # Evolutionary edge — history preserved
    await storage.insert_edge({
        "source_id": new_node_id,
        "target_id": old_node_id,
        "relation_type": "supersedes",
        "confidence": 0.95
    })

Result: old fact is invisible to queries (WHERE valid_until IS NULL), new fact is authoritative, full evolution history is queryable if needed.

Key Takeaways

Vector databases alone are not memory — they're search indexes. For agents, you need temporal ordering, conflict resolution, and forgetting.
SQLite is enough — FTS5 + sqlite-vec + WAL mode gives you full-text, vector, and graph search in a single file. No infrastructure overhead.
Keep LLM off the hot path — 50ms episodic write with async queue, background consolidation with cheap models (costs less than a cent per session).
Implement forgetting — the Ebbinghaus curve is a 10-line function that prevents context bloat and keeps the agent grounded in current reality.
Type your edges — temporal, causal, supersedes, derived_from, entity edges carry structural information no embedding can express.