Clive Wearing was a British musicologist. World-class conductor. In 1985 herpes encephalitis destroyed his hippocampus. The man plays Bach on the piano flawlessly - muscle memory intact, technique perfect, but doesn't recognize his wife when she leaves the room for thirty seconds. Every few minutes he writes in his diary: "NOW I am truly awake." Then crosses it out. Writes it again.
I think about Clive every time I open my agent logs. One of my OpenClaw agents has interacted with the same user 200+ times. The guy configured 15 automations. I personally fixed his billing issue on a Sunday evening. He came back from a trip last week, and the agent asked him for his timezone. For the fourth time.
My agents are Clive Wearing minus the love. The workflows execute. The automations fire. The cron jobs run on schedule. But between sessions, theres nobody home.
TL;DR: Current AI agent memory systems are databases pretending to be minds. Cognitive psychology (Conway, Damasio, Bruner) identified five components of human memory that nobody in AI implements. This article breaks down the five missing principles with concrete engineering analogs you can start building today. There's one you can ship tonight.

From n8n to OpenClaw: Same Failure, Better Architecture
Before OpenClaw, I had Telegram bots running through n8n with vector embeddings for memory. Six months of interactions with real users. And the retrieval was like opening a random drawer in a filing cabinet. The agent would pull fragments from three months ago that had nothing to do with the current conversation. Technically relevant by cosine similarity. Contextually insane. Like an NPC who responds to "where is the dungeon" with lore about a mushroom you picked in Act 1.
So I built an OpenClaw agent with a better architecture. I wrote about the complete OpenClaw architecture guide: cron jobs, dashboard, memory layer included. The memory layer is the part that doesn't work.
Its the same problem with better plumbing. The data is there. The retrieval is faster. The chunks are cleaner. And the user still gets asked for his timezone after 200 messages. Working with these agents over months feels like managing an Alzheimer patient. The procedural stuff runs fine. Just like Clive Wearing plays the piano. But there's zero emotional continuity, zero relational awareness. Between sessions, the lights are off.
My agents aren't worse than anyone else's. Letta, Mem0, Zep, every framework I tested does the same thing. They store data. They retrieve data. They call it memory. The plumbing is fine. The blueprint is broken.
And I only figured that out because of something unrelated to engineering.
I've always been obsessed with what consciousness actually is. Not the philosophy-major version. The mechanical version (i mean us...). What makes it so there's someone behind the eyes. And while debugging prompts on my terrace in Playa del Carmen, watching my agents forget people they've talked to hundreds of times, I realized the question I was asking about consciousness was the same question I was failing to answer in my code. Memory doesn't support consciousness. Memory IS the skeleton of consciousness. Conway, Damasio, Bruner - they all say the same thing from different angles. No structured memory, no self. No self, no continuity. No continuity, nobody home.
So I stopped reading GitHub docs and started reading psychology papers.
What Conway Figured Out in 2000 (and AI Still Ignores)
Martin Conway published the Self-Memory System in 2000. Updated it in 2005. It's the most cited framework in autobiographical memory research. He passed away in 2022 and left behind decades of work that maps exactly onto what our agents are missing.
Fair warning: I'm a dev, not a neuroscientist. I don't pretend to understand every mechanism Conway describes at the cellular level. But you don't need a PhD to see that his model maps directly onto what our agents lack. The engineering implications are what matter here.
Conway's core insight is that memory is not storage. It's reconstruction. Human memory is organized as a hierarchy: lifetime periods at the top ("when I lived in Thailand"), general events in the middle ("that month I was debugging the billing system"), specific episodes at the bottom ("the Sunday evening I fixed that billing issue"). When you recall something, your brain doesn't play back a recording. It reconstructs a memory from pieces across these levels, filtered by what Conway calls the "working self," your current goals, your active identity, your present situation.
This means the same event gets remembered differently depending on who you are right now. A job interview you had five years ago feels different when you're a manager hiring someone than when you're unemployed. The memory hasn't changed. Your self has. And your self reshapes the retrieval.
When my OpenClaw agent retrieves an embedding, none of this happens. It runs a cosine similarity on flat text.
No hierarchy. No goal-filtering. No reconstruction.
Every memory chunk sits at the same level, equally accessible regardless of context. It's SELECT * FROM memories ORDER BY similarity DESC LIMIT 5. That's not remembering. That's grepping.
Klein and Nichols made this even more concrete in 2012: the self and memory bootstrap each other. You need a self to organize memories. You need organized memories to maintain a self. Remove either one and the whole thing collapses. Our agents have neither.
Then there's Rathbone and the reminiscence bump, the finding that humans disproportionately remember identity transitions. Your first job. Moving to a new country. The birth of a child. These moments anchor your timeline because they changed who you are. An agent that treats every interaction with equal weight violates this principle at the most basic level. The 200th message from a power user should not weigh the same as a one-off question from a stranger.
Conway gave us the blueprint. We built a filing cabinet.
Damasio and the Missing Emotional GPS
A kid was building a Lego set next to me on the terrace the other day. Four years old, zero instructions, vibes only. Basically the original vibe coder. He doesn't reason through which piece goes where. He grabs one, holds it near the structure, and either it feels right or it doesn't. If it doesn't, he drops it immediately. No analysis. No stack overflow search. Pure intuition trained by thousands of hours of play.
Antonio Damasio would call those somatic markers.
And this is the part engineers don't want to hear.
Damasio's Somatic Marker Hypothesis, first laid out in 1994, says that we treat emotion as noise. Damasio proved it's the signal. Emotion is the shortcut that pre-filters your options before conscious reasoning even boots up. The Iowa Gambling Task nailed this experimentally: participants started avoiding bad card decks long before they could explain why. Their skin conductance response shifted first. The gut feeling arrived before the prefrontal cortex had time to open a JIRA ticket.
We like to think we're rational actors who sometimes get derailed by feelings. Damasio showed it's the other way around. Or wait, let me put it differently: he showed that feelings aren't the derailment. They're the rails. And Overskeid pushed it further in 2021, arguing that Damasio actually undersold his own theory. His paper title says it all: "Can Damasio's Somatic Marker Hypothesis Explain More Than Its Originator Will Admit?" Drawing on Hume: emotion doesn't just nudge you at the start of a decision. It rides shotgun the entire way. Reason is the slave of the passions. Always was.
Now think about what this means for agents.
My OpenClaw agent treats every interaction with the same emotional weight. Which is zero. The billing incident I resolved on a Sunday evening while everyone else was at the beach? Same retrieval priority as someone asking "what's the weather in Paris." A nurse who forgets whether the last surgery saved or killed the patient would lose her license. Our agents do this on every single query by default.
I don't think agents need subjective feelings. They don't need to experience anxiety or joy. But they need a salience signal (a fast marker that says "this matters, pay attention") that functions the way emotion does for humans. An automatic tag that says "this interaction mattered more than that one." Without it, every memory is equally flat, equally gray, equally forgettable.
Damasio proved emotion is the shortcut. We're building agents that take the long way every time.
The Five Missing Principles (With Engineering Analogs)
In December 2025, a team of 47 researchers published "Memory in the Age of AI Agents" on arXiv. Hit #1 on Hugging Face Daily Papers. 1,200+ GitHub stars. The ICLR 2026 MemAgents workshop in Rio this April is the first academic venue dedicated entirely to agent memory. The field is finally paying attention.
And it's still building filing cabinets.
The arXiv survey maps memory into forms, functions, and dynamics. Solid engineering taxonomy. But all three axes describe what gets stored and how it gets retrieved. None address why certain memories matter more than others. None mention identity construction. None reference Conway. Letta gives you self-editing memory blocks. Mem0 gives you vector search plus graph relationships. Zep gives you a temporal knowledge graph. All three are real engineering achievements. All three treat memory as data to retrieve, not identity to construct.
Jerome Bruner argued that narrative is the fundamental instrument of human thought. We remember in stories. Bruner figured that out decades ago, and none of these systems generate stories.
So here are the five principles cognitive psychology nailed decades ago that no agent memory framework implements. Each with what the research says, what our agents do instead, and what it would take to fix it.
1. Temporal hierarchy
Conway organizes autobiographical memory into three levels: lifetime periods, general events, specific episodes. Your brain doesn't dump everything into one flat timeline. It nests experiences inside contexts inside eras.
What our agents have instead: a vector store where every chunk sits at the same level. A message from yesterday and a message from six months ago are equally flat nodes in the same embedding space.
The graph database fix is almost obvious once you see it. Interactions become nodes in a hierarchical graph: session level, project level, relationship level. When the agent recalls something, it traverses levels instead of running cosine similarity on a flat index. A returning user first resolves to the relationship level (power user, 15 automations, billing history), then drills into specific episodes if needed. The retrieval path mirrors how you actually remember a person. You don't recall every conversation, you recall who they are and then zoom in.
I tested this partially with Neo4j on a side branch of OpenClaw. Even a crude two-level hierarchy (user-level summary + episode nodes) cut irrelevant retrievals by roughly half. Not scientific. But noticeable enough that users stopped getting asked the same questions.
2. Goal-filtering
Conway's "working self" actively filters what memories are accessible based on current goals. You don't remember everything. You remember what's relevant to what you're doing right now. But our agents don't do this. The embedding query is static. The same vector returns the same chunks regardless of whether the agent is debugging, onboarding, or handling a complaint.
So you need a pre-prompt layer that reshapes the retrieval query based on the agent's current context. Before searching memory, the agent asks itself "what do I need to know given what I'm doing right now." If the user is asking about billing, the query gets rewritten to prioritize billing-related memories. If they're setting up a new automation, the query shifts to their technical preferences. This is essentially what Prompt Contracts do at the code level. The agent negotiates what it needs to know before executing.
3. Emotional weighting
Already covered this with Damasio, so I'll keep it short. The Sunday-evening billing crisis weighs more than the timezone question. Every current memory framework treats them identically. importance: undefined.
The fix is a sentiment_score FLOAT computed at write time. Derive it from tone analysis, interaction type (complaint vs. casual question), urgency signals, resolution status. The retrieval pipeline multiplies relevance by this score. I prototyped this with a simple 1-5 scale derived from keyword matching (words like "urgent," "broken," "frustrated" push the score up). Crude. But even that crude version changed the retrieval order enough that a returning user's first response felt less like talking to a stranger.
4. Narrative layer
This is the one that keeps me up at night. Bruner says we organize experience in narrative form. Not in JSON. Not in knowledge graphs. In stories with characters and arcs and turning points. And right now every agent memory system stores structured logs, extractive summaries, entity-relationship tuples. Accurate. Soulless.
What I want is a cron job, daily or weekly, that generates a narrative summary per user or per project. Not extractive. Narrative.
"This user came back three times about the billing issue last week. Each time more frustrated. Resolved it Sunday evening. He configured two new automations the next day. Been quiet since. Probably means it worked."
This summary gets injected into the context at the next interaction. The agent doesn't just know facts about the user. It knows the user's story.
But this is also the hardest principle to implement well. A cron job that hallucinates narratives about your users is worse than no narratives at all. I haven't cracked this one yet. The generation needs to be grounded strictly in interaction logs, with a verification step. Still working on it.
5. Strategic forgetting
Most counterintuitive one. Forgetting isn't a bug. It's a feature.
The brain actively prunes memories that are obsolete, contradictory, or no longer relevant to the current self. Conway calls this maintaining "self-coherence." Without pruning, old memories pollute current reasoning. And every agent memory system I've seen is append-only. Nothing gets deleted. Six-month-old preferences contradict current ones. Outdated context competes with fresh context during retrieval.
You need automated pruning with a decay score. Age times access frequency times relevance to current goals. Memories that haven't been accessed in months and dont connect to any active project get archived, then deleted. A garbage collector for the mind. Java got this right in 1995 and we're still running append-only logs in 2026. Your agent doesn't need to remember that a user was in UTC-5 if he's moved to UTC+1. The old fact actively hurts if it sticks around.
Honest status: principles 1, 3, and 5 are testable this week. Graph epochs, sentiment scoring, and decay pruning are straightforward engineering. Principle 2 (goal-filtering) needs careful prompt design. Principle 4 (narrative generation) needs serious work to avoid hallucination. This framework is a direction, not a fisnished product.
What You Can Do Tonight
You don't need a graph database to start. You need ten minutes and a text editor.
If you use Claude Code, open your CLAUDE.md right now. Add a section called ## Who I Am To This Agent. Don't write a config file. Write a paragraph. Not this:
timezone: UTC-5
language: EN
experience: senior
This:
Phil is a dev/devops based in Playa del Carmen who builds AI automations
daily. Hes been working with Claude Code for 8+ months. He gets
frustrated when tools forget context between sessions. He cares about
shipping fast and hates unnecessary abstractions. When he asks a
question, he usually already tried the obvious solution and it didnt work.
That's principle #4, the narrative layer, applied at the simplest possible level. The agent doesn't just know facts about you. It knows your story. Even a two-paragraph story changes how the model responds.
If you build agents, open your system prompt or soul file. Add a field called relationship_summary and update it at the end of every session:
## Session Evaluation & Memory Update Rules
At the end of each session, before entering standby mode, you must evaluate the interaction state.
Use your file editing tools to silently update the user profile file with a "relationship summary".
You must extract, synthesize, and record the following exact data points :
- total_sessions: [Increment the known session count]
- trust_level: [Determine trust level derived from interaction history, e.g., low, medium, high]
- last_interaction: [Brief summary of the completed task, e.g., "billing escalation, resolved"]
- emotional_tone: [User final emotional state, e.g., "tense but grateful after fix"]
- next_likely_need: [Predictive analysis of the next required task, e.g., "new automation setup"]
Do not output this summary to the user interface. This is strictly for internal context persistence.
Not a log. A one-sentence narrative. "This user has been here 47 times. He trusts us with billing. Last interaction was tense but resolved." That's your minimum viable memory identity. Your agent's version of ~/.bashrc. Except it remembers who it's talking to, not just how to alias ls.
This won't solve the memory problem. It's a band-aid on a broken architecture. But it's a band-aid that makes your agent feel dramatically more human in ten minutes.
The best memory system is the one you ship tonight. The perfect one is the one nobody builds.
Why We Keep Building Filing Cabinets
The reason the industry keeps ignoring psychology is simple. Engineers read docs, not journals. "Memory" in CS means RAM and cache invalidation, not identity and narrative. And the benchmarks (LoCoMo, LongMemEval) measure retrieval accuracy, not identity coherence. You optimize what you measure. If your test suite only checks "did it find the right chunk," congrats, you've built a very expensive search engine.
The real test of agent memory isn't "did it retrieve the correct fact." It's "does the user feel known." There's a gap between a server that stores your photo and a friend who knows why you laugh at that joke. Every current memory framework lives on the server side of that gap.
Conway didn't build a database. He described a self. That's the part we're missing.
I write about what I build, break, and fix with AI agents. No theory without code, no code without scars. Follow if you want the engineering details nobody puts in the docs.
Top comments (0)