Discussion on: I Ran 500 More Agent Memory Experiments. The Real Problem Wasn’t Recall. It Was Binding.

View post

The binding problem is the real insight. Most memory systems treat recall as retrieval, find the relevant note and hand it over. But memory isn't a library. It's a network. The procedure without the episode is an instruction without evidence. The episode without the procedure is a story without structure. You built both systems separately. The gap wasn't recalled. It was a connection. That's fixable. Most people would have quit at 0/250 skills used. You kept going. That's the difference between a benchmark and a builder.

marcosomma • Apr 13

Exactly 😅. That was the turning point for me.

0/250 was not really a storage failure 😁. It was a binding failure. I had the procedure and I had the episode, but I did not yet have the mechanism that kept them attached when the next task arrived. Without that, skill memory is just archived text with a better label.

That is why I have stopped thinking about memory as retrieval alone. The real question is not just what the system can recall, but what prior should remain attached to the current decision, failure mode, and task shape.

And yes, I agree with your last point. A benchmark gives you the miss. Building starts when you treat that miss as the actual signal.

marcosomma • Apr 13

@theeagle You nailed it, "instruction without evidence" is exactly what 0/250 proved.

OrKa already stores skills as nodes in a graph (skill_graph.py). The episode system exists too, storage, semantic search, retention, scoring, all tested and production-ready. But today they're disconnected. Nodes with no edges. Evidence with no structure.

The fix is conceptually simple: episodes become the edges. "implement [target]" alone is empty. But an edge saying "applied to ETL, validation before dedup caught 30% of bad records" gives the node weight. Another edge: "applied to log analysis, filtering before aggregation cut false positives by 40%." Now the model has a reason to use the skill.

The harder problem, as you say, isn't building edges, it's graph maintenance. Stale edges must decay, failed episodes should weaken differently than successes, isolated nodes should expire. The graph has to self-organize around what's useful, not just accumulate.

The binding, skill_id on Episode, episode_ids on Skill, recall that returns nodes with their edges, is next.