TL;DR — I upgraded PocketPaw’s memory from a line-based file store to a hybrid semantic memory: (1) vector-backed retrieval for relevance-aware recall, and (2) a small knowledge graph for entity/relationship tracking. The result: the agent remembers semantically, connects concepts across sessions, and surfaces context proactively.
Link: https://github.com/pocketpaw/pocketpaw/issues/455
Why change the memory system?
PocketPaw's original memory (memory/file_store.py) stores facts as plain text lines. That model is simple and reliable, but it fails at several important agent behaviors:
- Lexical-only recall: users must repeat keywords to be found.
- No relationships: facts stay isolated (e.g., "Project X uses React" doesn't connect to "React 19 has breaking changes").
- Poor cross-session continuity: facts are hard to prioritize/retrieve semantically across long histories.
- No decay, timestamps, or metadata to shape relevance over time.
These limitations make the agent brittle for longer-term, proactive assistance. To fix that, we implemented two complementary features: vector-backed retrieval and a lightweight knowledge graph.
Design goals
- Semantically meaningful retrieval (no more exact keyword matching).
- Cross-session continuity and contextual prioritization.
- Local-first: keep data on-device unless the user opts in.
- Modular/backfill-friendly: fall back to file store and allow gradual migration.
- Low operational overhead: support ChromaDB, Qdrant, or SQLite-vec as vector backends.
High-level architecture
-
Ingest pipeline (on writes)
- Normalize fact text, add metadata (timestamp, source, session_id).
- Produce an embedding using a configured embedding model.
- Store vector + metadata in a vector store (Chroma, Qdrant, or SQLite-vec).
- Run an entity/relationship extractor LLM to identify entities and relations; upsert into a lightweight graph (NetworkX or SQLite adjacency tables).
-
Retrieval pipeline (on agent turn)
- Query vector store for top-K relevant documents to the current prompt (RAG).
- Use graph queries to expand context (e.g., related entities, recent facts).
- Compose memory context to include top results + related graph edges; feed to the LLM.
-
Management
- Memory pruning, TTL/decay heuristics, and manual UI for inspection & edits.
Diagram (simplified):
User ↔ Agent → Writes → [Embeddings → Vector Store] + [LLM extraction → Graph DB]
Agent ← Retrieval ← Vector Store + Graph DB ← Context
Phase 1 — Vector-backed memory (what I implemented first)
Goals: make memories retrievable by semantic similarity while preserving the file-store fallback.
Key components:
- Embedding model: nomic-embed-text (local model via Ollama or similar).
- Vector stores: ChromaDB, Qdrant, SQLite-vec (selectable via config).
- Minimal metadata: id, text, created_at, session_id, source, tags, vector_id.
Configuration (env):
POCKETPAW_MEMORY_BACKEND=vector # file | vector | mem0
POCKETPAW_EMBEDDING_MODEL=nomic-embed-text
POCKETPAW_VECTOR_STORE=chromadb # chromadb | qdrant | sqlite-vec
Write flow (pseudo):
def save_memory(text, source, session_id, tags=None):
metadata = {
"text": text,
"source": source,
"session_id": session_id,
"created_at": now_iso(),
"tags": tags or [],
}
embedding = embed_text(text) # calls the configured embedding model
vector_store.upsert(id=uuid4(), vector=embedding, metadata=metadata)
# Graph extraction step runs here (Phase 2)
Retrieve flow (pseudo):
def retrieve_memories(query, top_k=8):
q_emb = embed_text(query)
hits = vector_store.query(vector=q_emb, top_k=top_k, filter=None)
return [hit.metadata for hit in hits]
Fallback: If POCKETPAW_MEMORY_BACKEND != "vector" or vector store unreachable, continue using file_store.py.
- Embeddings generated via Ollama (nomic-embed-text) and stored as JSON blobs in SQLite. Retrieval is pure cosine similarity in Python — load candidate vectors, compute dot products, rank by score. Candidate fetch is bounded at min(max(limit * 10, 200), 2000) to prevent unbounded memory allocation as the store grows.
Implementation notes:
- I wrapped vector-store clients behind a simple abstraction (VectorStore) with upsert/query/delete methods so new stores can be added easily.
- Embeddings loaded lazily and cached so we don't re-init heavy processes unless needed.
Phase 2 — Knowledge graph (what I added next)
Why a graph? Vectors give relatedness but lack explicit relationships and typed entities. The graph stores facts like "Project X — uses → React" and allows queries such as "What projects use Python?" or "What is related to deployment issue #42?"
Graph choices:
- In-memory/ephemeral: NetworkX for development and small-scale setups.
- Persistent: SQLite with adjacency tables for production/local persistence without needing a separate DB server.
Graph schema (SQLite, simplified):
- nodes: id, label, type, metadata (json)
- edges: id, source_node_id, target_node_id, relation, metadata (json)
- node_refs: vector_id, node_id (to link vector docs to graph nodes)
Entity/relationship extraction:
- Use a light LLM extraction prompt that asks for JSON: entities (name, type, span), relations (source, relation, target), confidence.
- We run this asynchronously (non-blocking) to avoid delaying writes; low-latency systems can await it.
Example extraction prompt:
- Input: conversation snippet or saved fact
- Output (JSON):
{
"entities": [
{"name": "Project X", "type": "project"},
{"name": "React", "type": "library"}
],
"relations": [
{"source": "Project X", "relation": "uses", "target": "React"}
]
}
Graph upsert:
- Node deduplication by name + type + fuzzy matching.
- Edge upsert with relation type and provenance metadata (source fact id).
Using the graph at retrieval:
- After vector hits are found, expand the result set by traversing 1–2 hops in the graph for high-confidence relations.
Optionally prioritize recent nodes (timestamp decay) or nodes connected to high-importance tags.
Entities and relationships are persisted in knowledge_graph.sqlite3 across 6 tables, with PRAGMA user_version used for schema versioning and migrations. A key engineering constraint was SQLite's 999-variable limit — the initial DELETE ... WHERE doc_id NOT IN (...) cleanup query would crash on stores with 1000+ memories. This was replaced with a temp-table approach: valid IDs are batch-inserted (500 per batch) into a temporary table, and the delete joins against it instead of using an unbounded IN clause.
Putting it together — RAG + Graph hybrid
When the agent is about to produce an answer:
- Generate a query from the current user message + short system prompt.
- Retrieve top-K vector memories.
- Expand context with graph neighbors of the top nodes (configurable depth).
- Compose a memory context block containing:
- Short summaries of top vector memories
- Notable graph facts (entity relations)
- Timestamps + provenance
- Feed the context to the LLM with the user prompt for generation.
This hybrid approach gives semantic relevance and structured connections.
Example: A user conversation
User: "Remind me why Project X is blocked."
Retrieval:
- Vector search returns: "Project X blocked due to API refactor" (saved 2026-02-10)
- Graph shows: Project X — depends_on → internal-api-v2
- Graph shows: internal-api-v2 — breaking_change_in → Auth module
Agent response (composed using memory context):
"Project X is blocked because it depends on internal-api-v2, which recently introduced breaking changes in the Auth module — see note saved on 2026-02-10. Suggested action: pin internal-api-v2 to last working commit or open a patch for Auth compatibility."
Memory management: pruning, TTL, and UX
- TTL/decay: each memory gets a scoring function that decreases with age unless referenced frequently.
- Manual pruning: an admin UI lets users edit, delete, or merge memories and graph nodes.
- Automatic consolidation: similar memories within a time window can be merged into a single canonical fact (configurable).
Testing
- Large vector indexes can be paginated with approximate nearest neighbor (ANN) backends (Chroma uses hnsw, Qdrant has optimized indexing).
- I added unit tests for:
- Embedding pipeline (mocked embeddings)
- Vector upsert/query semantics
- Graph upserts and traversal
- The RAG composition pipeline (integration tests, small in-memory stores)
Privacy & local-first constraints
A core design constraint: nothing leaves the user's machine by default.
- Embedding model chosen can run locally (Ollama or other local providers).
- Vector store choices include local SQLite-vec.
- Graph stored locally in SQLite or as JSON files.
- Documented that any cloud-backed vector store requires explicit opt-in.
Lessons learned
- Don’t conflate similarity with semantics — vectors are phenomenal at relatedness but we still need typed edges for deterministic queries.
- Keep the ingestion pipeline idempotent — upserts with deterministic IDs make retry and backfill simple.
- Async extraction helps UX: write latency shouldn't be dominated by a second LLM call for entity extraction.
- Config-driven backends reduce friction for contributors and users — they can pick Chroma, Qdrant, or SQLite-vec.
Closing
Building semantic memory for PocketPaw transformed its behavior: it became better at recall, context stitching, and proactive assistance. The hybrid approach — vectors for fuzzy recall and a graph for explicit relations — balances flexibility with structured reasoning.
If you want the migration script, config examples, or the exact prompts I used for entity extraction, I can paste them here or open a docs PR for the repo.
Acknowledgements: thanks to the PocketPaw contributors and to the issue that seeded this work — https://github.com/pocketpaw/pocketpaw/issues/455
PR - https://github.com/pocketpaw/pocketpaw/pull/707
Author: AMRITESH240304 — Contributor to pocketpaw/pocketpaw
Top comments (0)