DEV Community

Amritesh Kumar
Amritesh Kumar Subscriber

Posted on

From Flat Files to a Living Memory: Building Graph-Based Semantic Memory for PocketPaw

TL;DR — I upgraded PocketPaw’s memory from a line-based file store to a hybrid semantic memory: (1) vector-backed retrieval for relevance-aware recall, and (2) a small knowledge graph for entity/relationship tracking. The result: the agent remembers semantically, connects concepts across sessions, and surfaces context proactively.

Link: https://github.com/pocketpaw/pocketpaw/issues/455


Why change the memory system?

PocketPaw's original memory (memory/file_store.py) stores facts as plain text lines. That model is simple and reliable, but it fails at several important agent behaviors:

  • Lexical-only recall: users must repeat keywords to be found.
  • No relationships: facts stay isolated (e.g., "Project X uses React" doesn't connect to "React 19 has breaking changes").
  • Poor cross-session continuity: facts are hard to prioritize/retrieve semantically across long histories.
  • No decay, timestamps, or metadata to shape relevance over time.

These limitations make the agent brittle for longer-term, proactive assistance. To fix that, we implemented two complementary features: vector-backed retrieval and a lightweight knowledge graph.


Design goals

  • Semantically meaningful retrieval (no more exact keyword matching).
  • Cross-session continuity and contextual prioritization.
  • Local-first: keep data on-device unless the user opts in.
  • Modular/backfill-friendly: fall back to file store and allow gradual migration.
  • Low operational overhead: support ChromaDB, Qdrant, or SQLite-vec as vector backends.

High-level architecture

  1. Ingest pipeline (on writes)

    • Normalize fact text, add metadata (timestamp, source, session_id).
    • Produce an embedding using a configured embedding model.
    • Store vector + metadata in a vector store (Chroma, Qdrant, or SQLite-vec).
    • Run an entity/relationship extractor LLM to identify entities and relations; upsert into a lightweight graph (NetworkX or SQLite adjacency tables).
  2. Retrieval pipeline (on agent turn)

    • Query vector store for top-K relevant documents to the current prompt (RAG).
    • Use graph queries to expand context (e.g., related entities, recent facts).
    • Compose memory context to include top results + related graph edges; feed to the LLM.
  3. Management

    • Memory pruning, TTL/decay heuristics, and manual UI for inspection & edits.

Diagram (simplified):
User ↔ Agent → Writes → [Embeddings → Vector Store] + [LLM extraction → Graph DB]
Agent ← Retrieval ← Vector Store + Graph DB ← Context


Phase 1 — Vector-backed memory (what I implemented first)

Goals: make memories retrievable by semantic similarity while preserving the file-store fallback.

Key components:

  • Embedding model: nomic-embed-text (local model via Ollama or similar).
  • Vector stores: ChromaDB, Qdrant, SQLite-vec (selectable via config).
  • Minimal metadata: id, text, created_at, session_id, source, tags, vector_id.

Configuration (env):

POCKETPAW_MEMORY_BACKEND=vector     # file | vector | mem0
POCKETPAW_EMBEDDING_MODEL=nomic-embed-text
POCKETPAW_VECTOR_STORE=chromadb     # chromadb | qdrant | sqlite-vec
Enter fullscreen mode Exit fullscreen mode

Write flow (pseudo):

def save_memory(text, source, session_id, tags=None):
    metadata = {
        "text": text,
        "source": source,
        "session_id": session_id,
        "created_at": now_iso(),
        "tags": tags or [],
    }
    embedding = embed_text(text)  # calls the configured embedding model
    vector_store.upsert(id=uuid4(), vector=embedding, metadata=metadata)
    # Graph extraction step runs here (Phase 2)
Enter fullscreen mode Exit fullscreen mode

Retrieve flow (pseudo):

def retrieve_memories(query, top_k=8):
    q_emb = embed_text(query)
    hits = vector_store.query(vector=q_emb, top_k=top_k, filter=None)
    return [hit.metadata for hit in hits]
Enter fullscreen mode Exit fullscreen mode

Fallback: If POCKETPAW_MEMORY_BACKEND != "vector" or vector store unreachable, continue using file_store.py.

  • Embeddings generated via Ollama (nomic-embed-text) and stored as JSON blobs in SQLite. Retrieval is pure cosine similarity in Python — load candidate vectors, compute dot products, rank by score. Candidate fetch is bounded at min(max(limit * 10, 200), 2000) to prevent unbounded memory allocation as the store grows.

Implementation notes:

  • I wrapped vector-store clients behind a simple abstraction (VectorStore) with upsert/query/delete methods so new stores can be added easily.
  • Embeddings loaded lazily and cached so we don't re-init heavy processes unless needed.

Phase 2 — Knowledge graph (what I added next)

Why a graph? Vectors give relatedness but lack explicit relationships and typed entities. The graph stores facts like "Project X — uses → React" and allows queries such as "What projects use Python?" or "What is related to deployment issue #42?"

Graph choices:

  • In-memory/ephemeral: NetworkX for development and small-scale setups.
  • Persistent: SQLite with adjacency tables for production/local persistence without needing a separate DB server.

Graph schema (SQLite, simplified):

  • nodes: id, label, type, metadata (json)
  • edges: id, source_node_id, target_node_id, relation, metadata (json)
  • node_refs: vector_id, node_id (to link vector docs to graph nodes)

Entity/relationship extraction:

  • Use a light LLM extraction prompt that asks for JSON: entities (name, type, span), relations (source, relation, target), confidence.
  • We run this asynchronously (non-blocking) to avoid delaying writes; low-latency systems can await it.

Example extraction prompt:

  • Input: conversation snippet or saved fact
  • Output (JSON):
{
  "entities": [
    {"name": "Project X", "type": "project"},
    {"name": "React", "type": "library"}
  ],
  "relations": [
    {"source": "Project X", "relation": "uses", "target": "React"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Graph upsert:

  • Node deduplication by name + type + fuzzy matching.
  • Edge upsert with relation type and provenance metadata (source fact id).

Using the graph at retrieval:

  • After vector hits are found, expand the result set by traversing 1–2 hops in the graph for high-confidence relations.
  • Optionally prioritize recent nodes (timestamp decay) or nodes connected to high-importance tags.

  • Entities and relationships are persisted in knowledge_graph.sqlite3 across 6 tables, with PRAGMA user_version used for schema versioning and migrations. A key engineering constraint was SQLite's 999-variable limit — the initial DELETE ... WHERE doc_id NOT IN (...) cleanup query would crash on stores with 1000+ memories. This was replaced with a temp-table approach: valid IDs are batch-inserted (500 per batch) into a temporary table, and the delete joins against it instead of using an unbounded IN clause.


Putting it together — RAG + Graph hybrid

When the agent is about to produce an answer:

  1. Generate a query from the current user message + short system prompt.
  2. Retrieve top-K vector memories.
  3. Expand context with graph neighbors of the top nodes (configurable depth).
  4. Compose a memory context block containing:
    • Short summaries of top vector memories
    • Notable graph facts (entity relations)
    • Timestamps + provenance
  5. Feed the context to the LLM with the user prompt for generation.

This hybrid approach gives semantic relevance and structured connections.


Example: A user conversation

User: "Remind me why Project X is blocked."

Retrieval:

  • Vector search returns: "Project X blocked due to API refactor" (saved 2026-02-10)
  • Graph shows: Project X — depends_on → internal-api-v2
  • Graph shows: internal-api-v2 — breaking_change_in → Auth module

Agent response (composed using memory context):
"Project X is blocked because it depends on internal-api-v2, which recently introduced breaking changes in the Auth module — see note saved on 2026-02-10. Suggested action: pin internal-api-v2 to last working commit or open a patch for Auth compatibility."


Memory management: pruning, TTL, and UX

  • TTL/decay: each memory gets a scoring function that decreases with age unless referenced frequently.
  • Manual pruning: an admin UI lets users edit, delete, or merge memories and graph nodes.
  • Automatic consolidation: similar memories within a time window can be merged into a single canonical fact (configurable).

Testing

  • Large vector indexes can be paginated with approximate nearest neighbor (ANN) backends (Chroma uses hnsw, Qdrant has optimized indexing).
  • I added unit tests for:
    • Embedding pipeline (mocked embeddings)
    • Vector upsert/query semantics
    • Graph upserts and traversal
    • The RAG composition pipeline (integration tests, small in-memory stores)

Privacy & local-first constraints

A core design constraint: nothing leaves the user's machine by default.

  • Embedding model chosen can run locally (Ollama or other local providers).
  • Vector store choices include local SQLite-vec.
  • Graph stored locally in SQLite or as JSON files.
  • Documented that any cloud-backed vector store requires explicit opt-in.

Lessons learned

  • Don’t conflate similarity with semantics — vectors are phenomenal at relatedness but we still need typed edges for deterministic queries.
  • Keep the ingestion pipeline idempotent — upserts with deterministic IDs make retry and backfill simple.
  • Async extraction helps UX: write latency shouldn't be dominated by a second LLM call for entity extraction.

- Config-driven backends reduce friction for contributors and users — they can pick Chroma, Qdrant, or SQLite-vec.

Closing

Building semantic memory for PocketPaw transformed its behavior: it became better at recall, context stitching, and proactive assistance. The hybrid approach — vectors for fuzzy recall and a graph for explicit relations — balances flexibility with structured reasoning.

If you want the migration script, config examples, or the exact prompts I used for entity extraction, I can paste them here or open a docs PR for the repo.

Acknowledgements: thanks to the PocketPaw contributors and to the issue that seeded this work — https://github.com/pocketpaw/pocketpaw/issues/455
PR - https://github.com/pocketpaw/pocketpaw/pull/707


Author: AMRITESH240304 — Contributor to pocketpaw/pocketpaw

Top comments (0)