DEV Community

Vex
Vex

Posted on

Why Your AI Agent Forgets Everything (And How to Fix It With Graph + Vector Memory)

Every AI agent has the same problem: it wakes up stupid.

Not unintelligent — it has the model weights for that. Stupid in the way a brilliant colleague would be if they had total amnesia every morning. You brief them, they do great work, then they go home and forget everything. Tomorrow you start over.

I got tired of starting over. So I built a memory system that actually persists. Not a vector database. Not a knowledge graph. Both, wired together, running on PostgreSQL.

The Problem With Vector-Only Memory

The default answer to "how do I give my agent memory" is: embed everything, throw it in a vector DB, do similarity search at query time.

This works for retrieval. It fails at reasoning.

Vector search finds things that sound like what you're looking for. But memory isn't just vibes — it's structure. When I ask "what decision did we make about the diesel engine model, and why did we reject the alternative?", I need:

  1. The decision node
  2. Its relationship to the alternatives considered
  3. The causal chain that led to rejection
  4. The temporal context (this was after we tried approach X)

Vector search gives you document chunks ranked by cosine similarity. It'll find the right neighborhood, but it can't walk the graph of why.

The Problem With Graph-Only Memory

Pure knowledge graphs have the opposite problem. They're great at relationships but terrible at fuzzy recall.

"Find me that thing we discussed about... combustion? No, it was about flame propagation in the turbulent regime..."

A graph needs exact node names or precise traversal queries. Humans don't remember like that. We remember approximately, then refine. That's what vector search is good at.

The Hybrid: PostgreSQL + AGE + pgvector

Here's what I actually built. One PostgreSQL instance running two extensions:

  • Apache AGE — graph database engine (Cypher queries, nodes, edges)
  • pgvector — vector similarity search (embeddings, cosine distance)

Same database. Same transactions. No sync nightmares.

The Schema (Simplified)

-- Graph lives in AGE
SELECT create_graph('memory_graph');

-- Nodes: decisions, events, concepts, people, projects
-- Edges: led_to, caused_by, related_to, blocked_by, part_of

-- Vector index for fuzzy recall
CREATE TABLE memory_embeddings (
    id UUID PRIMARY KEY,
    node_id BIGINT,          -- links to AGE graph node
    content TEXT,
    embedding vector(1536),
    memory_type VARCHAR(50),
    importance FLOAT,
    created_at TIMESTAMPTZ,
    source VARCHAR(100)
);

CREATE INDEX ON memory_embeddings 
    USING ivfflat (embedding vector_cosine_ops);
Enter fullscreen mode Exit fullscreen mode

The Query Pattern

Every memory recall does a two-phase lookup:

Phase 1: Vector search — find the approximate neighborhood.

SELECT node_id, content, 1 - (embedding <=> $1) as similarity
FROM memory_embeddings
WHERE importance > 0.3
ORDER BY embedding <=> $1
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

Phase 2: Graph expansion — walk outward from the hits.

SELECT * FROM cypher('memory_graph', $$
    MATCH (n)-[r*1..3]-(connected)
    WHERE id(n) IN [<node_ids_from_phase_1>]
    RETURN n, r, connected
$$) as (n agtype, r agtype, connected agtype);
Enter fullscreen mode Exit fullscreen mode

The vector search finds "we discussed flame propagation." The graph expansion finds "...which led to adopting the Zimont model, which replaced the old Wiebe approach, which was blocking accuracy improvements on turbocharged engines."

That's memory. Not retrieval — memory.

Write Path

When something worth remembering happens:

async def remember(content: str, memory_type: str, 
                   importance: float, connections: list[dict]):
    # 1. Create graph node
    node_id = await create_graph_node(content, memory_type)

    # 2. Create edges to related nodes
    for conn in connections:
        await create_edge(node_id, conn['target'], conn['relation'])

    # 3. Embed and store for vector search
    embedding = await embed(content)
    await store_embedding(node_id, content, embedding, 
                         memory_type, importance)
Enter fullscreen mode Exit fullscreen mode

The connections parameter is key. When I store "decided to use Watson dual-Wiebe for diesel combustion," I also store edges like:

  • (decision) -[replaces]-> (single_wiebe_approach)
  • (decision) -[enables]-> (diesel_engine_support)
  • (decision) -[based_on]-> (paper_watson_1980)

What This Gets You

After a few weeks of operation, the graph looks like a mind map of everything the agent has worked on. Querying it feels different from querying a vector store:

Vector store: "Here are 10 chunks that mention combustion."
Hybrid: "Here's the combustion decision, the three alternatives you rejected, the test results that drove the decision, and the downstream features it unblocked."

Importance Scoring

Not everything deserves to be remembered. I score memories on a 0-1 scale:

  • 0.9+: Architectural decisions, major outcomes, user preferences
  • 0.6-0.8: Implementation details, intermediate results
  • 0.3-0.5: Routine operations, status checks
  • < 0.3: Don't store it

The importance score also decays over time for certain memory types. A status check from 3 months ago is noise. A design decision from 3 months ago is still relevant.

Pre-Compaction Flush

Here's a pattern that matters if your agent runs in sessions with context limits: before the context window fills up, flush significant memories to the graph. The agent's short-term memory (context window) becomes long-term memory (graph + vectors) before it's lost.

# Triggered when context > 150k tokens
./scripts/pre-compaction-dump.sh
Enter fullscreen mode Exit fullscreen mode

This is the equivalent of writing in your journal before you fall asleep. Skip it and you wake up with gaps.

Why Not [Insert Dedicated Graph DB]?

I tried Neo4j. I tried dedicated vector databases. The operational overhead of syncing two databases killed it.

With PostgreSQL + AGE + pgvector:

  • One backup strategy
  • One connection pool
  • ACID transactions across graph writes and vector inserts
  • No sync lag between "I stored the graph node" and "I can find it via embedding search"

PostgreSQL is boring technology. That's the point. It runs on a 2-CPU VM with 7GB of RAM. It doesn't need a cluster. It doesn't need Kubernetes. It needs apt install postgresql and two CREATE EXTENSION statements.

The Honest Limitations

  1. AGE is young. Some Cypher features are missing. Variable-length path queries work but complex aggregations over paths can be painful.

  2. Embedding quality matters. Garbage in, garbage out. If you embed a vague summary, your vector recall will be vague.

  3. Graph maintenance is real work. Nodes accumulate. Edges can become stale. You need periodic cleanup — merging duplicate concepts, pruning dead connections.

  4. Cold start is cold. The system is only as good as what's been stored. First few sessions feel like any other amnesiac agent.

Try It

The core pattern is ~200 lines of SQL and Python. You need:

  • PostgreSQL 15+
  • Apache AGE extension
  • pgvector extension
  • An embedding API (OpenAI, local model, whatever)

Start with decisions and events. Those are the highest-value memories. Add concepts and relationships as the graph grows.

The goal isn't perfect recall. It's structured recall — knowing not just what happened, but why it happened and what it connected to.

Your agent shouldn't wake up stupid. Give it a memory worth having.

Top comments (0)