Amritesh Kumar

Posted on Apr 10 • Edited on Apr 11

From Flat Files to a Living Memory: Building Graph-Based Semantic Memory for PocketPaw

#pocketpaw #rag #networkx #vectordatabase

TL;DR — I upgraded PocketPaw’s memory from a line-based file store to a hybrid semantic memory: (1) vector-backed retrieval for relevance-aware recall, and (2) a small knowledge graph for entity/relationship tracking. The result: the agent remembers semantically, connects concepts across sessions, and surfaces context proactively.

Link: https://github.com/pocketpaw/pocketpaw/issues/455

Why change the memory system?

PocketPaw's original memory (memory/file_store.py) stores facts as plain text lines. That model is simple and reliable, but it fails at several important agent behaviors:

Lexical-only recall: users must repeat keywords to be found.
No relationships: facts stay isolated (e.g., "Project X uses React" doesn't connect to "React 19 has breaking changes").
Poor cross-session continuity: facts are hard to prioritize/retrieve semantically across long histories.
No decay, timestamps, or metadata to shape relevance over time.

These limitations make the agent brittle for longer-term, proactive assistance. To fix that, we implemented two complementary features: vector-backed retrieval and a lightweight knowledge graph.

Design goals

Semantically meaningful retrieval (no more exact keyword matching).
Cross-session continuity and contextual prioritization.
Local-first: keep data on-device unless the user opts in.
Modular/backfill-friendly: fall back to file store and allow gradual migration.
Low operational overhead: support ChromaDB, Qdrant, or SQLite-vec as vector backends.

High-level architecture

Ingest pipeline (on writes)
- Normalize fact text, add metadata (timestamp, source, session_id).
- Produce an embedding using a configured embedding model.
- Store vector + metadata in a vector store (Chroma, Qdrant, or SQLite-vec).
- Run an entity/relationship extractor LLM to identify entities and relations; upsert into a lightweight graph (NetworkX or SQLite adjacency tables).
Retrieval pipeline (on agent turn)
- Query vector store for top-K relevant documents to the current prompt (RAG).
- Use graph queries to expand context (e.g., related entities, recent facts).
- Compose memory context to include top results + related graph edges; feed to the LLM.
Management
- Memory pruning, TTL/decay heuristics, and manual UI for inspection & edits.

Diagram (simplified):
User ↔ Agent → Writes → [Embeddings → Vector Store] + [LLM extraction → Graph DB]
Agent ← Retrieval ← Vector Store + Graph DB ← Context

Phase 1 — Vector-backed memory (what I implemented first)

Goals: make memories retrievable by semantic similarity while preserving the file-store fallback.

Key components:

Embedding model: nomic-embed-text (local model via Ollama or similar).
Vector stores: ChromaDB, Qdrant, SQLite-vec (selectable via config).
Minimal metadata: id, text, created_at, session_id, source, tags, vector_id.

Configuration (env):

POCKETPAW_MEMORY_BACKEND=vector     # file | vector | mem0
POCKETPAW_EMBEDDING_MODEL=nomic-embed-text
POCKETPAW_VECTOR_STORE=chromadb     # chromadb | qdrant | sqlite-vec

Write flow (pseudo):

def save_memory(text, source, session_id, tags=None):
    metadata = {
        "text": text,
        "source": source,
        "session_id": session_id,
        "created_at": now_iso(),
        "tags": tags or [],
    }
    embedding = embed_text(text)  # calls the configured embedding model
    vector_store.upsert(id=uuid4(), vector=embedding, metadata=metadata)
    # Graph extraction step runs here (Phase 2)

Retrieve flow (pseudo):

def retrieve_memories(query, top_k=8):
    q_emb = embed_text(query)
    hits = vector_store.query(vector=q_emb, top_k=top_k, filter=None)
    return [hit.metadata for hit in hits]

Fallback: If POCKETPAW_MEMORY_BACKEND != "vector" or vector store unreachable, continue using file_store.py.

Embeddings generated via Ollama (nomic-embed-text) and stored as JSON blobs in SQLite. Retrieval is pure cosine similarity in Python — load candidate vectors, compute dot products, rank by score. Candidate fetch is bounded at min(max(limit * 10, 200), 2000) to prevent unbounded memory allocation as the store grows.

Implementation notes:

I wrapped vector-store clients behind a simple abstraction (VectorStore) with upsert/query/delete methods so new stores can be added easily.
Embeddings loaded lazily and cached so we don't re-init heavy processes unless needed.

Phase 2 — Knowledge graph (what I added next)

Why a graph? Vectors give relatedness but lack explicit relationships and typed entities. The graph stores facts like "Project X — uses → React" and allows queries such as "What projects use Python?" or "What is related to deployment issue #42?"

Graph choices:

In-memory/ephemeral: NetworkX for development and small-scale setups.
Persistent: SQLite with adjacency tables for production/local persistence without needing a separate DB server.

Graph schema (SQLite, simplified):

nodes: id, label, type, metadata (json)
edges: id, source_node_id, target_node_id, relation, metadata (json)
node_refs: vector_id, node_id (to link vector docs to graph nodes)

Entity/relationship extraction:

Use a light LLM extraction prompt that asks for JSON: entities (name, type, span), relations (source, relation, target), confidence.
We run this asynchronously (non-blocking) to avoid delaying writes; low-latency systems can await it.

Example extraction prompt:

Input: conversation snippet or saved fact
Output (JSON):

{
  "entities": [
    {"name": "Project X", "type": "project"},
    {"name": "React", "type": "library"}
  ],
  "relations": [
    {"source": "Project X", "relation": "uses", "target": "React"}
  ]
}

Graph upsert:

Node deduplication by name + type + fuzzy matching.
Edge upsert with relation type and provenance metadata (source fact id).

Using the graph at retrieval:

After vector hits are found, expand the result set by traversing 1–2 hops in the graph for high-confidence relations.
Optionally prioritize recent nodes (timestamp decay) or nodes connected to high-importance tags.
Entities and relationships are persisted in knowledge_graph.sqlite3 across 6 tables, with PRAGMA user_version used for schema versioning and migrations. A key engineering constraint was SQLite's 999-variable limit — the initial DELETE ... WHERE doc_id NOT IN (...) cleanup query would crash on stores with 1000+ memories. This was replaced with a temp-table approach: valid IDs are batch-inserted (500 per batch) into a temporary table, and the delete joins against it instead of using an unbounded IN clause.

Putting it together — RAG + Graph hybrid

When the agent is about to produce an answer:

Generate a query from the current user message + short system prompt.
Retrieve top-K vector memories.
Expand context with graph neighbors of the top nodes (configurable depth).
Compose a memory context block containing:
- Short summaries of top vector memories
- Notable graph facts (entity relations)
- Timestamps + provenance
Feed the context to the LLM with the user prompt for generation.

This hybrid approach gives semantic relevance and structured connections.

Example: A user conversation

User: "Remind me why Project X is blocked."

Retrieval:

Vector search returns: "Project X blocked due to API refactor" (saved 2026-02-10)
Graph shows: Project X — depends_on → internal-api-v2
Graph shows: internal-api-v2 — breaking_change_in → Auth module

Agent response (composed using memory context):
"Project X is blocked because it depends on internal-api-v2, which recently introduced breaking changes in the Auth module — see note saved on 2026-02-10. Suggested action: pin internal-api-v2 to last working commit or open a patch for Auth compatibility."

Memory management: pruning, TTL, and UX

TTL/decay: each memory gets a scoring function that decreases with age unless referenced frequently.
Manual pruning: an admin UI lets users edit, delete, or merge memories and graph nodes.
Automatic consolidation: similar memories within a time window can be merged into a single canonical fact (configurable).

Testing

Large vector indexes can be paginated with approximate nearest neighbor (ANN) backends (Chroma uses hnsw, Qdrant has optimized indexing).
I added unit tests for:
- Embedding pipeline (mocked embeddings)
- Vector upsert/query semantics
- Graph upserts and traversal
- The RAG composition pipeline (integration tests, small in-memory stores)

Privacy & local-first constraints

A core design constraint: nothing leaves the user's machine by default.

Embedding model chosen can run locally (Ollama or other local providers).
Vector store choices include local SQLite-vec.
Graph stored locally in SQLite or as JSON files.
Documented that any cloud-backed vector store requires explicit opt-in.

Lessons learned

Don’t conflate similarity with semantics — vectors are phenomenal at relatedness but we still need typed edges for deterministic queries.
Keep the ingestion pipeline idempotent — upserts with deterministic IDs make retry and backfill simple.
Async extraction helps UX: write latency shouldn't be dominated by a second LLM call for entity extraction.

- Config-driven backends reduce friction for contributors and users — they can pick Chroma, Qdrant, or SQLite-vec.

Closing

Building semantic memory for PocketPaw transformed its behavior: it became better at recall, context stitching, and proactive assistance. The hybrid approach — vectors for fuzzy recall and a graph for explicit relations — balances flexibility with structured reasoning.

If you want the migration script, config examples, or the exact prompts I used for entity extraction, I can paste them here or open a docs PR for the repo.

Acknowledgements: thanks to the PocketPaw contributors and to the issue that seeded this work — https://github.com/pocketpaw/pocketpaw/issues/455
PR - https://github.com/pocketpaw/pocketpaw/pull/707

Author: AMRITESH240304 — Contributor to pocketpaw/pocketpaw

DEV Community