DEV Community: Alex P

I Built a Graph-Based AI Memory. Then Its Brain Turned to Mush.

Alex P — Tue, 17 Mar 2026 00:29:15 +0000

A few weeks ago I wrote about using a graph database for AI memory instead of vector search. The mug cakes story, the CO_REFERENCED_WITH feedback loop, the overnight consolidation cycles. That post was about the architecture that convinced me graph-backed memory works.

So in the light of fairness, this one is about everything I got wrong.

v1 proved the concept could work! v2 was about finding out that a graph with loose types is just a fancier document store.

The RELATES_TO Problem

In v1, everything connected to everything via RELATES_TO.

Cooking RELATES_TO wife. Sleep RELATES_TO productivity. Diet RELATES_TO goals. All technically true. None of it useful.

The problem didn't show up immediately. With 50 nodes, traversal works fine even with generic edges. You ask "what's related to this goal?" and you get back a handful of things that are, in fact, related. It feels like it's working.

Turns out, though, at 300 nodes, your graph is a disgusting, chewed-up, hairball. Everything is 1-2 hops from everything else. You ask "what supports this goal?" and the answer is: we don't know, because RELATES_TO doesn't distinguish support from correlation from coincidence.

So I killed RELATES_TO entirely and replaced it with 18 typed edges across 6 categories:

Motivation & Causation: MOTIVATED_BY, CAUSED_BY
Structural: ENABLES, BLOCKS, REQUIRES, PART_OF
Knowledge: SUPPORTS, CONTRADICTS, INFLUENCES
Goal & Commitment: COMMITTED_TO, FOLLOWED_THROUGH, ADVANCES
Organization: TAGGED_WITH, ABOUT
Lifecycle: EVOLVED_INTO, SUPERSEDES, DUPLICATES
Provenance: EXTRACTED_FROM

Forcing the system to pick a typed edge forced it to actually reason about the relationship. A typed edge is a claim. SUPPORTS means something different than CONTRADICTS. When the LLM has to choose between them, it can't be lazy about it. BECAUSE IT WILL BE LAZY.

Of course, the LLM will drift back to inventing generic types within days if you let it. So there's a server-side guardrail. Regex validation against the 18 allowed types. Non-schema edges get rejected with an error listing the alternatives.

const VALID_EDGE_TYPES = new Set([
  'MOTIVATED_BY', 'CAUSED_BY', 'ENABLES', 'BLOCKS',
  'REQUIRES', 'PART_OF', 'SUPPORTS', 'CONTRADICTS',
  'INFLUENCES', 'COMMITTED_TO', 'FOLLOWED_THROUGH',
  'ADVANCES', 'TAGGED_WITH', 'ABOUT', 'EVOLVED_INTO',
  'SUPERSEDES', 'DUPLICATES', 'EXTRACTED_FROM'
]);

if (!VALID_EDGE_TYPES.has(edgeType)) {
  throw new Error(
    `Invalid edge type "${edgeType}". ` +
    `Allowed: ${[...VALID_EDGE_TYPES].join(', ')}`
  );
}

The LLM sees that error and corrects itself. Hard constraints where it counts.

15 Labels, 4 Layers

v1 had loosely typed nodes. "Facts, decisions, preferences, patterns, commitments, observations." No formal taxonomy, no lifecycle rules. It was pretty much just vibes; a classic "yeah that sounds right" decision. A "fact" and an "observation" had the same properties and the same lifespan.

I didn't realize why that was a problem until I tried to build consolidation. Which nodes should get archived after 30 days? Which ones should live forever? Turns out you can't write lifecycle rules for node types that don't have distinct lifecycles.

v2 has 15 knowledge labels across 4 semantic layers. The test for whether a label deserves to exist: what happens to this node in 6 months? If two labels have the same answer, they should be the same label.

Layer	Labels	Lifecycle
Identity (who you are)	Attribute, Preference, Belief, Value, Skill	Evergreen. Only supersession or contradiction removes them
Direction (where you're headed)	Goal, Project, Commitment, Decision	Active until completed or abandoned. Commitments never auto-archive
Intelligence (what the system observes)	Behavior, Insight, Opportunity, Event	Behaviors evolve into Insights. Opportunities expire. Events are immutable
World (external knowledge)	Reference, Resource	Stale after disuse

Here's what I axed (more informative, IMO):

Fact became Attribute. "Alex weighs 178 lbs" is an attribute, not a fact floating in space.
Observation became Behavior. An observation without a pattern is noise.
Pattern became Insight. Patterns only earn that label when confirmed across multiple data points.
Emotion got killed entirely. Stale 20 minutes after creation. No lifecycle, no value.
Idea became Reference. Ideas are external input until someone acts on them.
Metric became Attribute with source: "health_import". Just attributes with provenance tracking.

The Dedup Gate Got Serious

v1 had a write-time similarity check. Single threshold. It caught obvious duplicates and missed everything else.

The failure mode I didn't anticipate: same-session rewording. During a long conversation, the extraction system would generate multiple phrasings of the same insight minutes apart. "Alex works in bursts" at 2:15 PM and "Alex has a sprinter mentality" at 2:22 PM. Same thing. Different enough embeddings to slip past the threshold.

Two changes fixed it.

First, a two-tier threshold. Items created within the past hour get a 0.75 similarity threshold, aggressive enough to catch rewording. Items older than an hour get 0.85, loose enough to let genuinely similar but distinct items coexist. The recency window was the key insight. Duplicate risk is highest within the same conversation.

Second, the gate went cross-label. In v1, it only checked within the same node type. So "Alex prefers direct feedback" could exist as both a Preference and a Behavior because they got classified differently by the extraction LLM. v2 checks across all 15 labels. Existing node always wins, regardless of label mismatch. A DUPLICATES edge gets logged for audit.

The whole write pipeline is also mutex-serialized. One write at a time. I know that sounds like a bottleneck. But concurrent creates on a graph with a dedup gate produce race conditions that are miserable to debug. Two nodes with 0.92 similarity both passing the check because they ran simultaneously = a week later and you have SO MANY DUPES.

One more thing I learned the hard way: when a node gets superseded, you need to migrate its edges. All the context edges (ABOUT, TAGGED_WITH, SUPPORTS, etc.) move to the new node. Provenance edges (EXTRACTED_FROM, SUPERSEDES) stay on the old one for audit. Without this, supersession just creates orphans. Connected knowledge becomes disconnected, and your graph traversal hits dead ends you can't explain.

CO_REFERENCED_WITH Is Dead. Long Live co_reference_count.

The feedback loop was the centerpiece of the v1 post. It was also the first thing I had to tear out (RIP).

Quick recap: nodes retrieved together got a CO_REFERENCED_WITH edge. More co-retrieval meant a stronger edge. Stronger edges meant those nodes were more likely to show up in future retrievals. Retrieval fed co-retrieval, co-retrieval strengthened edges, stronger edges pulled in more nodes.

A self-reinforcing loop with no circuit breaker. Popular nodes got more popular. The graph calcified around whatever topics came up most in the first few weeks.

v1's fix (covered in the original post) replaced CO_REFERENCED_WITH with an access event log. That was better, but it was still a separate data structure tracking behavior that should just live on the edges themselves.

v2 killed the access event nodes too. Now co_reference_count is a property directly on typed edges. When graph expansion traverses an existing edge, that edge's count increments. No new edges created. No separate tracking nodes.

The reinforcement is still Hebbian (things that fire together wire together). But now it's scoped:

+0.05 confidence per co-retrieval, capped at 1.0
Only strengthens existing typed edges
Never creates new edges

That last constraint is the important one. The system can reinforce connections it already knows about. It cannot hallucinate new ones. No more phantom relationships. The graph learns which connections matter without inventing connections that don't exist.

Vector Similarity in Cypher Will Ruin Your Day

Early on, I computed cosine similarity inside Cypher using REDUCE. The embeddings were 3072-dimensional. With 500+ nodes, that's millions of interpreted loop operations per query.

250+ seconds per retrieval.

I moved the vector math to JavaScript. Native Float64 operations.

Under 1 second.

Graph databases are not vector databases. They're great at traversal. They are genuinely terrible at bulk numerical computation. I should have known this but I had to watch a query spin for four minutes before it sank in.

The pattern that works:

JS computes cosine similarity over all embeddings. Returns top-k seed nodes.
Graph traverses 1-2 hops along typed edges. Expands context around those seeds.
Reciprocal Rank Fusion merges both result sets.

Vector similarity finds the starting points. Graph traversal understands the connections between them. Each system does what it's built for.

Side note: Memgraph 3.8 shipped native vector indexes after I built this. I benchmarked it at 31x over my JS implementation, 100% recall@10. The Atomic GraphRAG pattern (single Cypher query: vector search into graph expansion) cuts the whole flow down to one call. The principle still holds: don't compute similarity in interpreted Cypher. Use native indexes or do it externally.

Forgetting Got Smarter Too

v1 had "nightly dedup, weekly patterns, monthly compression." Correct in concept. I hadn't actually specified what any of those did.

v2 has three named algorithms.

Hebbian Reinforcement runs on every retrieval. Co-retrieval strengthens typed edges, +0.05 confidence, capped at 1.0. Never creates new edges, only reinforces existing ones. This is the remembering side. Things that get used together become more tightly associated over time.

Synaptic Pruning runs monthly. Archives weak, unused edges. The criteria: not referenced in 30+ days, confidence below 0.3, co_reference_count below 3. But it protects critical edges: EXTRACTED_FROM, SUPERSEDES, COMMITTED_TO, EVOLVED_INTO. You can prune stale associations. You should not prune provenance or commitments, even old ones.

Schema Formation also runs monthly. This one compresses recurring patterns into canonical representations. Edges with co_reference_count >= 10 and confidence above 0.8 are candidates. Community detection (Louvain via Memgraph's MAGE library) finds clusters of 5+ related nodes that keep getting activated together. Those clusters become candidates for consolidation into higher-level knowledge.

There's also an auto-staleness layer for deprecated architecture. When I replaced my old dispatcher with the wake-up engine, every node that referenced the dispatcher became stale. Rather than cleaning those up manually, the monthly maintenance pass auto-archives items referencing systems that no longer exist. Evergreen types (Preference, Belief, Value, Skill, Behavior, Commitment) are protected from this. The system can forget what architecture it used to run on. It should not forget what it knows about the person it's built for.

What I'd Tell You Before You Build This

These are the things I figured out the slow way.

Type your edges or don't bother with a graph. A graph where everything RELATES_TO everything is a slower document store. The whole point is that edge types carry meaning. If your edges don't encode how things are connected, you're paying the complexity cost without getting the benefit.

Your dedup gate is your most important feature. Not your retrieval, not your embedding model, not your consolidation algorithm. A clean graph beats a complete graph. Noise accumulates faster than you think.

Don't compute vectors in your graph query language. Graph databases do traversal. Vector databases do similarity. If you're writing REDUCE over 3072-dimensional arrays in Cypher, stop. I lost a weekend to this.

Every label needs a lifecycle answer. "What happens to this node in 6 months?" If you don't know, you don't have a schema. You have a bucket.

Feedback loops hide in systems that learn from their own behavior. Any time your system creates signal from its own output, you need a circuit breaker. CO_REFERENCED_WITH taught me this. The system was getting better at retrieving the things it had already retrieved. That looks like learning. It isn't.

Server-side guardrails beat prompt engineering. The LLM will drift. It will invent new edge types. It will create nodes that don't match your schema. It will store duplicates with slightly different wording. Hard constraints at the write layer are the only defense that actually holds over time.

Where This Leaves Things

The v1 post ended with "bigger context windows won't save you." Still true. But I'd add that a messy graph won't save you either.

The gap isn't "use a graph instead of vectors." It's "use a graph with discipline." I suppose the same can be said about vector dbs...but the graph still just makes sense to me. Typed edges that carry meaning. Labels with lifecycle rules. A dedup gate that treats cleanliness as a first-class concern. And the willingness to kill features that create more noise than signal.

The system that asked "do you like to cook?" is the same system that now tracks 15 types of knowledge across 4 semantic layers with 18 typed edge categories. It still runs consolidation at 2 AM. It still mimics sleep. It just has a vocabulary for what it knows now, instead of a pile of things that are vaguely related.

*This is Part 2 of a series on building graph-backed AI memory. Part 1 is here.

Your AI Doesn't Have "Memory". It Has Search.

Alex P — Fri, 20 Feb 2026 01:13:21 +0000

Ask me what "John" said last month and I'll have no idea. Ask me what I remember about John, and I'll tell you he's the guy who reminds me of my college roommate and knows a lot about wine.

That's not retrieval. That's memory.

Every AI memory system today is building retrieval and calling it memory. I think that's broken, and here's what I think actually works (so far).

"Memory Sucks" — But What Does That Actually Mean?

"Memory sucks" is the #1 complaint in every AI assistant community. But when people say that, they mean different things:

"It forgot what I told it" → context window limit/bad retrieval
"It hallucinated a memory" → bad retrieval/bad LLM
"It doesn't know me" → this is the real one that I'm focused on

The third one is the one I haven't quite seen pop up too much yet. And the reason is: the fix isn't better search. It's better structure.

Here's what I actually wanted: an AI that recognizes patterns in how I think, act, and behave across my life without me having to spell them out. Not "remember this fact." Not "here's a document about me." I wanted it to notice that when I gorge myself at a buffet over the weekend, the rest of the week is shot. That my limiting beliefs on x actually affect my behavior towards y. That my wife and I have the same underlying drive toward building things, even though hers shows up in art and mine shows up in cooking.

None of those connections are obvious. They span work, health, relationships, habits, finances...none of these domains have anything to do with each other on the surface. But they're all me. And if your AI can't cross-connect across those domains, it can never actually know you. It can only search you.

Vector search finds things that are similar. But memory isn't about similarity, it's about connection. The fact that you sleep poorly before big demos and that your merge rates drop after you haven't been outside for a while aren't similar documents. They're connected insights. No amount of cosine similarity will link them. But your brain does it effortlessly.

Your Brain Is a Graph Database

Your brain doesn't store memories in a database. It stores them in a network.

Synapses and connections:

Every memory is a node. Every association is a connection.
Use a connection and it strengthens. Ignore it and it decays.
Use two things together enough and they merge; you don't remember the individual data points, you remember the pattern.

The three stages:

Short-term

Raw intake. Everything goes in.
Overnight (sleep), your brain deduplicates. Things that don't connect to anything get pruned. Things that reinforce existing knowledge get merged.
This is why you "sleep on it." Your brain is literally running a cleanup job.

Medium-term

Connections form across domains. Your work stress connects to your sleep quality connects to your eating habits.
These cross-domain links are where insight lives. They're not stored — they emerge from the structure.

Long-term

Nodes that are always accessed together consolidate. You don't remember 47 individual interactions with John. You remember "John = college roommate energy + wine guy."
This is compression, not data loss.

The key insight: forgetting is a feature, not a bug.

Every AI memory system treats forgetting as failure. Human memory treats it as signal extraction. The things that fade are the things that didn't connect to anything meaningful. That's not data loss! That's your brain telling you what matters.

SQL Stores Documents. Vector Stores Vibes. Neither Stores Memory.

SQL (relational databases):

Forces artificial links via foreign keys and join tables
Every possible connection needs a table designed for it in advance
Can't discover new connection types at runtime
"How is my sleep related to my deal close rate?" requires a schema that anticipated that question

Vector stores (RAG):

Great at "find me something similar to this query"
Terrible at "what connects these two unrelated things?"
No concept of connection strength, decay, or consolidation
Every retrieval is a fresh search — no learning from past access patterns

The problem isn't that these tools are bad. They're great at what they do. The problem is that what they do isn't exactly memory.

Graphs as Memory: The Data Model Your Brain Already Uses

A graph database is the memory model:

Nodes = memories (facts, decisions, preferences, patterns, observations)
Edges = synapses (connections between memories, typed and weighted)
Traversal = recall (follow the connections, not just search the content)

Mapping human memory to graph operations:

Human Memory	Graph Operation
Form a new memory	Create a node
Make an association	Create an edge
Strengthen a connection through use	Increment edge weight (Hebbian learning)
Forget unused details	Decay nodes with low access + weak connections
Sleep consolidation / dedup	Nightly: deduplicate similar nodes, merge redundant edges
Cross-domain insight	Multi-hop traversal across different node types
Long-term compression	Monthly: merge frequently co-accessed nodes into patterns
"What do I know about John?"	Traverse all edges from the John node — not search for "John"

The John Example — Retrieval vs. Memory

Vector search (what everyone's using):

Query: "What do I know about John?"
→ Finds documents that mention "John"
→ Returns: "John said he prefers the Q3 timeline" / "Met John at the conference" / "John's email is john@..."
→ You get documents. You don't get understanding.

Graph traversal (what your brain does):

Query: "What do I know about John?"
→ Starts at the John node
→ Traverses edges: MENTIONED_IN → meeting notes, RELATES_TO → wine preference, SIMILAR_TO → college roommate (via personality pattern), IMPACTS → Q3 deal timeline
→ You get connected knowledge. The system doesn't just find John — it tells you why John matters and how he connects to everything else.

"Do You Like to Cook?" or The Moment I Knew It Worked

I built the first version of this system on SQL and vector search. It stored everything I could put in there. Every fact, every preference, every offhand comment. I make Detroit-style pizza from scratch, from the dough up. I have a pellet smoker. I cook 90% of the meals for me and my wife. I've been toying with a spring roll food truck concept. All of that was in the database.

Then one day I mentioned mug cakes (valentine's day), just a question about a recipe. The system asked me: "Do you like to cook?"

Srsly, dude? It had all the data. But the data was in rows and embeddings. Those are all isolated records that didn't know about each other. "Mug cakes" didn't match "Detroit-style pizza" in vector space. There was no foreign key linking "pellet smoker" to "spring roll concept." Each fact existed in its own little silo, and no amount of searching could connect them into the obvious conclusion: Why yes, AI, I like to cook.

That was the moment I knew the storage model was wrong. Not the data. The structure.

After migrating to the graph, here's what happens now when cooking comes up in any context:

Before (SQL + vector search):

Query context: user mentioned mug cakes
→ Vector search: "mug cakes" → no strong matches above threshold
→ SQL lookup: no "mug cakes" row in preferences table
→ System has no cooking context → asks "Do you like to cook?"

After (graph with connected nodes):

Query context: user mentioned mug cakes
→ Nearest node: cooking preference
→ Traverse: cooking → Detroit pizza (from scratch) → pellet smoker → spring roll concept → food truck idea → FI goal → Wife's maker drive
→ System knows cooking is a core identity thread, not a casual hobby
→ Response builds on what it already knows instead of starting from zero

That's the difference between storage and memory. The SQL/vector version had every fact. The graph version understood what those facts meant together. One mention of food pulls in the entire identity thread: Cooking connects to a food truck dream, which connects to the financial independence plan, which connects to a shared drive with my wife to build things with our hands. Vector search would have returned "Alex likes Detroit-style pizza" if I'd searched for "pizza." The graph returns the whole picture without me having to ask the right question.

How it's wired:

                        ┌─────────────────────────────────────┐
                        │         KNOWLEDGE GRAPH             │
                        │                                     │
  Conversation ──────►  │   ┌──────┐    RELATES_TO    ┌──────┐│
  "I asked about        │   │ Pizza├───────────────►  │Food  ││
   Detroit-style        │   │ Pref │                  │Truck ││
   pizza"               │   └──┬───┘                  │Dream ││
                        │      │                      └──┬───┘│
                        │      │ TAGGED_WITH             │    │
                        │      ▼                     SUPPORTS │
                        │   ┌──────┐                     │    │
                        │   │Cook- │                     ▼    │
                        │   │ ing  │              ┌──────────┐│
                        │   └──────┘              │    FI    ││
                        │                         │   Goal   ││
                        │   ┌──────┐  SHARED_WITH └────┬─────┘│
                        │   │Wife's├──────────────────►│      │
                        │   │Maker │                   │      │
                        │   │Drive │◄──────────────────┘      │
                        │   └──────┘   RELATES_TO             │
                        └─────────────────────────────────────┘

  Vector search returns:  "Alex likes Detroit-style pizza"
  Graph traversal returns: pizza → food truck dream → FI plan
                           → Wife's maker drive → build identity

The overnight cycle (mimicking sleep):

  ┌────────────┐     ┌─────────────┐     ┌──────────────┐
  │  NIGHTLY   │     │   WEEKLY    │     │   MONTHLY    │
  │            │     │             │     │              │
  │ • Dedup    │     │ • Cross-    │     │ • Merge co-  │
  │   similar  │     │   domain    │     │   accessed   │
  │   nodes    │     │   pattern   │     │   nodes into │
  │ • Prune    │     │   detect    │     │   patterns   │
  │   orphans  │     │ • Profile   │     │ • Compress   │
  │ • Decay    │     │   update    │     │   old detail │
  │   unused   │     │ • Coaching  │     │   into       │
  │   edges    │     │   recalib   │     │   insight    │
  └────────────┘     └─────────────┘     └──────────────┘
       ▲                    ▲                    ▲
       │                    │                    │
    Like sleep          Like weekly           Like the way
    consolidation       reflection            you compress
                                              47 John moments
                                              into "wine guy +
                                              college roommate
                                              energy"

How it's actually built

The graph runs on Memgraph. It's an in-memory graph database that speaks openCypher. I chose it over Neo4j for three reasons: it's lighter weight (runs comfortably in a Docker container on a Mac Mini), it's genuinely in-memory so traversals are fast, and Cypher is a query language that maps naturally to "follow this connection, then that one, then that one." Graph traversal is the query.

The whole system runs as an always-on daemon on a Mac Mini sitting in my office. Knowledge gets into the graph in real time. As I'm talking to the AI during a conversation, it writes nodes and edges as they come up. No batch extraction, no end-of-day processing. If I mention something, it's in the graph before the conversation is over. A write-time dedup gate catches redundancy at the door: before any new node is created, it checks embedding similarity against existing nodes. If it's a duplicate or a rewording of something already stored, the existing node wins and its connections get reinforced instead. This means the graph stays clean without manual curation.

The edges use Hebbian learning. I had to look it up, too. It's the same principle your synapses use. Every time two nodes are accessed in the same conversation, the edge between them gets stronger. Mention pizza and Wife in the same thread enough times, and the system learns that connection matters without anyone explicitly telling it to. Edges that stop getting used decay over time, just like synapses that aren't firing.

And here's the part that maps directly to sleep: a cron job runs at 2 AM every night while I'm literally asleep. It deduplicates similar nodes, prunes orphans, decays weak edges, and fills in any gaps from the day's conversations that the real-time extraction missed. A weekly cycle does cross-domain pattern detection and profile updates. A monthly cycle merges nodes that are always accessed together into higher-order patterns. It feels like the same compression your brain does when 47 interactions with John become "wine guy + college roommate energy." The overnight cycle diagram above isn't a metaphor. It's the actual architecture. The system consolidates memories while I sleep, the same way my brain does.

The Feedback Loop I Didn't See Coming

The Hebbian learning sounded elegant in theory. In practice, it created a feedback loop that took me a while to catch.

Here's what happened: the retrieval system works in layers. First, vector similarity finds seed nodes that match the query. Then graph traversal expands 1-2 hops along typed edges to pull in connected context. Both of those are fine. But I'd added a third layer. It's an edge type called CO_REFERENCED_WITH that tracked which nodes were retrieved together in the same conversation. The idea was that co-retrieval was a signal: if two nodes keep showing up in the same conversations, they're probably related.

The problem is that retrieval creates co-retrieval. Node A gets pulled in by vector similarity. Node B gets pulled in because it's one hop away on a RELATES_TO edge. Now A and B were "co-retrieved," so the system creates a CO_REFERENCED_WITH edge between them. Next time A shows up, that edge pulls B in again...even if B wasn't relevant this time. Now the edge is stronger. Now B pulls in C, which was its neighbor. C and A get a CO_REFERENCED_WITH edge. Repeat.

Within a few weeks, I had nodes with CO_REFERENCED_WITH edges at 0.95 strength to nodes they had no real semantic relationship with. The graph was accumulating phantom connections. Edges that existed purely because the retrieval system kept seeing its own previous retrievals. The monthly consolidation cycle made it worse: it looked for node pairs with strong co-reference edges and merged them into patterns. So the system was compressing noise into "insights" that were actually just retrieval artifacts.

The fix wasn't to tune the thresholds. It was to kill the edge type entirely.

CO_REFERENCED_WITH had exactly one job that the other systems couldn't do: boost the ranking of expansion candidates during graph traversal. "These two nodes are always accessed together, so if you hit one, the other is probably relevant." But once I dug into it, the nodes that CO_REFERENCED_WITH was uniquely surfacing, nodes with no vector similarity to the query AND no structural edge path to a seed, were exactly the nodes that shouldn't be there. They were only showing up because the feedback loop had inflated their edges.

So I replaced it with something that can't self-inflate: an access event log. Every time a node contributes to a response, that gets logged with context: why was it accessed, what query triggered it, what role did it play. Instead of a dumb edge weight that doesn't distinguish signal from noise, I can now ask: "Of the nodes I already retrieved via vector similarity and structural edges, which ones have historically been useful together in actual responses?" That's a ranking boost based on real utility, not a circular reference based on retrieval proximity.

The access event log can't create a feedback loop because it doesn't create new retrieval paths. It only re-ranks nodes that were already pulled in by the two systems that actually work: vector similarity and typed edges. If neither of those surfaces a node, the access log can't conjure it into existence. The loop is broken by design.

This was the most important lesson of the whole project: in a system that learns from its own behavior, you have to be obsessive about separating the signal from the system's own echo. Hebbian learning works beautifully for typed edges...if I keep mentioning cooking and Wife in the same conversations, the RELATES_TO edge between them should absolutely get stronger. But applying that same principle to co-retrieval (an artifact of the system's own behavior) turned the learning mechanism into an amplifier for noise. The fix was knowing which signals come from the user and which come from the system, and only letting the first kind drive reinforcement.

Key design decisions:

Typed knowledge nodes: facts, decisions, preferences, patterns, commitments, observations — not just "documents"
Weighted edges with decay: connections that aren't used weaken over time, just like synapses
Write-time dedup gate: catch duplicates at ingestion, reinforce existing nodes instead of creating noise
Hebbian edge strengthening: co-access in conversation strengthens typed edges (RELATES_TO, SUPPORTS, etc.), but only between nodes the user connected, not nodes the system happened to retrieve together
Access event log over co-retrieval edges: tracks which nodes actually contributed to responses, replaces self-reinforcing edge types with a signal that can't create feedback loops
Nightly consolidation: deduplicate, merge, strengthen, which mimics sleep cycles (literally runs at 2 AM)
Weekly analysis: detect patterns across domains, update the user's evolving profile
Monthly compaction: merge nodes that are always accessed together into higher-order patterns

What surprised me:

The most valuable thing the graph did wasn't surfacing some brilliant cross-domain insight I never would have seen. It was stopping the system from asking a stupid question.

That mug cake moment? The system had everything it needed to know I'm a serious cook. It just didn't look. And that's the unsexy version of "memory works." It's not about impressive recalls or mind-blowing connections. It's about not making the user feel like they're talking to someone with amnesia every time the topic shifts slightly, because every time AI breaks the illusion that it "knows" you, it's jarring.

When I fixed the traversal so that any food-adjacent mention would pull in the cooking identity thread, the system stopped asking basic questions it should already know the answers to. That felt more like memory than any perfect recall ever could. Your brain doesn't impress you by remembering your phone number. It impresses you by not asking.

The second surprise: forgetting turned out to be the most important feature. Early versions hoarded everything: every offhand comment, every half-thought, every correction. The graph got noisy. Retrieval degraded because signal was buried in noise. Adding decay (weaken unused connections, prune orphan nodes, consolidate redundant facts) didn't just save storage, it made the system smarter. The things that faded were the things that didn't connect to anything meaningful. That's not data loss. That's your brain telling you what matters. And it turns out graph structure makes this trivially easy to implement: low edge weight + low access count + no inbound connections = safe to forget.

Bigger Context Windows Won't Save You

The AI memory problem isn't going away. Context windows are getting bigger, but bigger search isn't necessarily better memory. A million-token context window is a bigger filing cabinet, not a brain.

The community is actively building AI assistants, AI companions, AI coaches and all of them need memory that actually works like memory! The infrastructure exists (graph databases are mature, battle-tested technology). The mental model just hasn't crossed over yet.

Graph-backed memory isn't the only answer (I still use vector for initial search) But it's a lot closer to how memory actually works than anything else being widely discussed right now.