DEV Community: Timur Fatykhov

A Vector Store Is Not an Agent Memory System

Timur Fatykhov — Mon, 30 Mar 2026 22:56:22 +0000

Your agent does not have memory just because it can retrieve old text.

That is probably one of the biggest misconceptions in agent engineering right now. I maintain a curated research list of 25+ papers on agent memory systems and build with these ideas in my own agent work. The pattern I keep seeing is simple: teams equate retrieval with memory, and that shortcut breaks down fast once agents have to operate across time.

Here is the gap in one glance:

❌ Retrieval (what most teams build first)

Store everything as chunks
Embed and retrieve top-k by similarity
Prepend results to the prompt
Hope the model uses them well

✅ Memory (what production agents need)

Gate what enters storage
Separate episodes from durable knowledge
Merge recurring patterns into reusable facts
Prune stale details before they pollute retrieval
Measure whether memory actually helps

That gap is where a lot of agent systems quietly fall apart. It is also where some of the most interesting work is happening right now.

What developers usually build first

Most teams start with something like this:

# The "memory" system every tutorial teaches you
def remember(event, vector_store):
    embedding = embed(event.text)
    vector_store.upsert(event.id, embedding, event.text)

def recall(query, vector_store, k=5):
    results = vector_store.search(embed(query), top_k=k)
    return [r.text for r in results]

# On every turn:
memories = recall(user_message, store)
prompt = system_prompt + "\n".join(memories) + user_message
response = llm(prompt)
remember(Event(user_message + response), store)

I have built this version myself. It works well for a while. It is simple, practical, and easy to ship.

But once the agent runs longer, works across multiple tasks, or needs stable behavior over time, problems start piling up:

Irrelevant memories keep coming back
Useful details get buried under noise
The prompt grows without getting smarter
Contradictions accumulate silently
The system never learns what to forget

That is not really memory. It is unstructured recall.

What production memory actually needs

In practice, agent memory needs several layers of intelligence around storage and retrieval. Here are five that matter.

1. Admission control

Not every event deserves to become memory.

I learned this the hard way. A useful memory system needs a gate.

# What admission control actually looks like
def should_remember(event, existing_memory) -> bool:
    scores = {
        "importance":  score_importance(event),             # Was this consequential?
        "novelty":     score_novelty(event, existing_memory), # Is this genuinely new?
        "reusability": score_reusability(event),            # Will this matter again?
        "consistency": check_contradictions(event, existing_memory),
        "durability":  estimate_shelf_life(event),          # How long is this relevant?
    }

    return weighted_score(scores) > ADMISSION_THRESHOLD

This is not just a nice idea. Workday AI’s A-MAC framework (https://arxiv.org/abs/2603.04549) operationalizes the same basic principle with a five-factor admission model that scores candidate memories before they enter long-term storage.

Without admission control, memory becomes a junk drawer.

2. Consolidation

Raw events should not all stay raw forever.

Some information should be merged into higher-level knowledge:

repeated user preferences → a stable profile
recurring operational patterns → reusable procedures
multiple related events → one summary with links back to sources
successful action sequences → learned policies

Human memory does this naturally through consolidation. Agent systems usually do not.

A-MEM (https://arxiv.org/abs/2502.12110) moves in this direction with dynamic note evolution: memories can be linked, updated, and reorganized over time instead of only accumulating as flat records.

That shift matters. A memory system should not just collect history. It should reshape history into something reusable.

3. Forgetting

Forgetting is not a bug. It is part of intelligence.

This was counterintuitive to me at first. A memory system that never forgets becomes noisy, expensive, and brittle. Some details should decay. Some should be archived. Some should be overwritten. Some should remain permanent.

# Strategic forgetting — not deleting blindly, but managing memory over time
def forget_cycle(memory_store):
    for memory in memory_store.all():
        memory.relevance *= decay_rate(memory.age, memory.access_count)

        if memory.relevance < ARCHIVE_THRESHOLD:
            memory_store.archive(memory)
        elif memory.relevance < PRUNE_THRESHOLD:
            memory_store.remove(memory)
        elif memory.superseded_by:
            memory_store.merge(memory, memory.superseded_by)

Recent work on structured forgetting suggests that retaining everything can actively degrade retrieval under interference, while selective forgetting can improve long-horizon behavior. SleepGate (https://arxiv.org/abs/2603.14517) is one of the more striking recent examples, proposing selective eviction, compression, and consolidation mechanisms to reduce interference from stale context.

The hard problem is not remembering more. It is remembering the right things for the right duration.

4. Hierarchy

Not all memory is the same.

Useful agent systems often need multiple memory types:

Type	What it holds	Lifespan
Working	Active task context	Minutes
Episodic	Past events, conversations	Days to weeks
Semantic	Distilled facts, preferences	Months to permanent
Procedural	Learned skills, workflows	Permanent until revised

When everything is stored as flat text chunks, the system loses structure.

The survey Memory in the Age of AI Agents (https://arxiv.org/abs/2512.13564) does not argue for one single canonical taxonomy, but it clearly shows the field moving beyond the idea that all memory is just retrieval. The direction is toward more differentiated memory forms, functions, and dynamics.

That is a healthier framing than “just add a vector store.”

5. Evaluation

This is the part many teams skip. I did too, for longer than I should have.

You cannot improve memory if your only metric is: retrieval seemed okay in this demo.

You need to evaluate questions like:

Did memory improve downstream decisions?
Did it reduce context cost?
Did it help over long horizons?
Did it preserve critical constraints?
Did it surface stale or misleading information?

StructMemEval (https://arxiv.org/abs/2602.11243) is one of the first focused attempts to benchmark whether agents can organize memory into useful structures rather than just retrieve isolated facts.

That is an uncomfortable but necessary shift. A lot of memory systems still look stronger in architecture diagrams than in measured outcomes.

The economics are real too

There is also a practical cost argument here.

A March 2026 analysis, Memory Systems or Long Contexts? Comparing LLM Approaches to Factual Recall from Prior Conversations (https://arxiv.org/abs/2603.04814), compared a fact-based memory system against long-context LLM inference.

The result was more nuanced than “memory always wins.” Long-context GPT-5-mini achieved higher factual recall on some benchmarks, but the memory system had a much flatter per-turn cost curve and became cheaper at around 10 turns once context length reached roughly 100k tokens.

That means good memory design is not just an architectural choice. It is also a cost-shaping decision, especially once agents start accumulating enough history that long-context inference becomes expensive turn after turn.

Where to go deeper

The industry is moving from “chat with tools” toward agents that operate over time. That changes the problem fundamentally.

Short-lived chat interactions can get away with context stuffing. Long-lived agents cannot.

I maintain a curated list of 25+ papers covering these areas:

👉 awesome-agent-memory

https://github.com/tfatykhov/awesome-agent-memory

It is organized by mechanism: admission, consolidation, forgetting, retrieval, evaluation, and cognitive or neuro-inspired memory. Venue metadata is verified where possible. Self-reported claims are flagged. My own synthesis is separated from the source material.

This is not another generic awesome-list. It is organized around a simple thesis: memory is an engineering discipline, not a retrieval trick.

I also build with these ideas in Nous:

https://github.com/tfatykhov/nous

Some of the ideas worked. Some of them failed. The wins went into the design. The failures went into the curation.

If you are building agents that need to run longer than a single conversation, memory is probably the next systems problem you are going to hit.

And if that is the problem you are hitting, the research is finally getting good enough to help.

If you find the list useful, a ⭐ on the repo helps more people discover it. PRs are welcome, especially if there are papers I missed.

Your AI Agent Doesn't Think. It Guesses. Here's What Thinking Actually Looks Like.

Timur Fatykhov — Sun, 22 Mar 2026 05:28:12 +0000

Every enterprise is racing to deploy AI agents. Most of them have the same fatal flaw: they're goldfish with PhDs.

They can solve brilliant problems in the moment — then forget everything the second the conversation ends. No memory of past decisions. No learning from mistakes. No institutional knowledge. Every interaction starts from zero.

That's not intelligence. That's autocomplete with a budget.

I built something different. I built an AI that actually thinks. I call it Nous — and the architecture that makes it possible is called FORGE.

The 38-Year-Old Blueprint Silicon Valley Forgot

In 1986, Marvin Minsky — one of the founding fathers of AI — published The Society of Mind. His thesis was radical and simple: intelligence isn't one thing. It's a society of specialized agents working together.

The AI industry ignored this for decades, chasing bigger models instead of better architectures. Why? Because the monolithic approach — one giant model doing everything — kept hitting the same wall. In classical AI, it's called the Frame Problem: a single system can't efficiently decide what's relevant in every situation. It either considers too little context and makes blind decisions, or considers too much and grinds to a halt. Bigger models masked this problem with brute force, but they never solved it.

I went back to Minsky. And I built FORGE — Fetch, Orient, Resolve, Go, Extract — a cognitive architecture that treats AI cognition the way it actually works: not as a single monolithic brain, but as an organized society of mental organs, each with a clear job.

Nous is the first agent built on FORGE. It's the living proof that this architecture works — an AI mind that remembers, learns, governs itself, and gets better with every interaction. Without retraining.

Think of it this way: FORGE is the blueprint. Nous is the mind built from it. Cognition Engines is where both were forged.

Two Organs. One Mind.

At the core of FORGE are two primary cognitive organs:

The Heart — This is memory. Not a simple database, but five distinct memory types working together:

Episodic Memory — What happened. Conversations, events, context. The agent's autobiography.
Semantic Memory — Facts. Verified knowledge extracted from every interaction, tagged with confidence scores.
Procedural Memory — Skills. Reusable capabilities that activate automatically when the task demands them.
Censors — Learned guardrails. Things the system has learned it must never do, should warn about, or must absolutely block. These aren't just prompt instructions — they're architecturally enforced constraints.
Working Memory — The scratchpad. What's relevant right now, assembled from all other memory types.

The Brain — This is decision intelligence. Every significant decision the agent makes is recorded with its reasoning, confidence level, supporting evidence, and outcome. Over time, this creates an auditable decision log that the agent uses to calibrate itself.

Think of it this way: the Heart remembers. The Brain decides. Together, they give Nous its mind.

The Cognitive Loop: How FORGE Actually Processes a Task

Every interaction follows a structured cognitive loop — this is the core of what makes a FORGE-based agent like Nous fundamentally different from a stateless chatbot:

Sense — Receive input and extract intent
Frame — Match the cognitive mode to the task type (research? debugging? conversation? decision?)

Framing isn't cosmetic — it changes how the agent thinks. In Debugging mode, Nous prioritises Procedural Memory (known fixes and diagnostic skills) and relies on a systematic elimination approach. In Research mode, it shifts the weight to Semantic Memory (accumulated facts and prior findings) and casts a wider recall net. A Decision frame activates the Brain's decision log, pulling similar past decisions and their outcomes to inform the current choice. This means the same question gets a fundamentally different cognitive approach depending on context — just like a human expert switches mental gears between troubleshooting and strategic planning.

Recall — Search across all memory types for relevant context, past decisions, and applicable skills
Deliberate — Reason through the problem using retrieved context
Act — Execute with the right tools and capabilities
Monitor — Track what happened, verify claims against the execution record, and flag discrepancies
Learn — Extract facts, record decisions, update memory

This isn't a gimmick. It's what allows any agent built on FORGE to compound its effectiveness over time, rather than resetting every session. Nous has been running this loop in production, and the difference is night and day.

resetting every session. Nous has been running this loop in production, and the difference is night and day.

Execution Integrity: The AI That Can't Lie About What It Did

Here's a problem nobody in the AI industry talks about: LLMs can fabricate actions.

An AI agent can generate a confident, detailed response claiming it saved your file, sent your email, or deployed your code — without actually doing any of it. Not maliciously. The model simply confuses planning with execution. It writes "Here's what I'll save to the file..." and then treats the plan as the completed action.

I call this confabulation, and it's one of the most dangerous failure modes in enterprise AI. If your agent says "email sent to the client" and it wasn't — that's not a minor bug. That's a trust collapse.

FORGE solves this with a three-layer execution integrity system:

The Execution Ledger — An append-only, framework-managed record of every action taken in a session. The model cannot modify it. It lives outside the conversation history, immune to summarization and context pruning. When Nous says "I sent that email," there's a tamper-proof record that either confirms or contradicts the claim. Critically for enterprise compliance: the ledger is hosted locally or within your VPC — never transmitted to third-party services. Your execution data stays within your data residency boundaries, satisfying SOC 2, GDPR, and industry-specific regulatory requirements out of the box.

Action Gating — A pre-execution checkpoint that classifies every action by risk level. Read-only operations pass through freely. Local writes get a consistency check — catching duplicate actions and replay loops. External and irreversible actions (sending emails, pushing code) go through a dedicated safety gate. The gate asks one question: does this action match what the user actually asked for?

Claim Verification — A post-execution audit that scans the agent's response for action claims and cross-references them against the execution ledger. If Nous claims it sent an email but the ledger shows no email was sent, the response is blocked and the agent is forced to either execute the action or correct its claim. No fabricated completions reach the user.

For enterprises, this means something radical: your AI agent's work is auditable down to the individual action. Every tool call, every file write, every external communication — recorded, verified, and available for compliance review.

Sleep Cycles: The AI That Cleans Up After Itself

Here's something most people don't expect: Nous has sleep cycles.

During scheduled consolidation windows, the agent reviews its own memories — merging duplicates, pruning noise, strengthening important connections, and curating its knowledge graph. Just like biological sleep consolidates learning, FORGE's sleep architecture ensures memory quality improves over time rather than degrading.

For enterprises, this means the system self-maintains. It doesn't accumulate junk data. It gets sharper.

Censors: Governance That Gets Stronger With Use

This is the feature that makes compliance teams smile.

Censors are learned behavioural constraints at three severity levels:

Warn — Flag the action, but allow it
Block — Prevent the action unless explicitly overridden
Absolute — Hard stop. No override. Period.

Here's the key: censors aren't just pre-programmed rules. They're learned. When Nous makes a mistake or discovers a boundary, it creates a censor so it never repeats that mistake. The governance layer strengthens with every interaction.

And because they're architecturally enforced — not just prompt-level suggestions — they can't be jailbroken away by clever phrasing. This is a FORGE-level guarantee, not a prompt-level hope.

Why This Matters for Enterprise

Let's talk business value:

1. Institutional Memory That Compounds
Your FORGE-powered agent remembers every project, every decision, every lesson learned. New team members don't need to re-explain context. Nous already knows.

2. Auditable Decision Intelligence
Every decision is logged with reasoning, confidence scores, and outcomes. When regulators ask, "Why did the AI do that?" — you have the answer. Brier-scored confidence calibration means the agent knows when it's uncertain, and says so.

3. Execution Integrity You Can Prove
Every action is recorded in a tamper-proof ledger, verified against the agent's claims, and gated by risk level before execution. This isn't "trust the AI said it did it." This is "here's the cryptographically timestamped record of exactly what happened." When auditors come knocking, you have the receipts.

4. Self-Improving Governance
Compliance rules aren't static configurations. They're living constraints that evolve as the agent encounters new edge cases. Your governance posture strengthens automatically.

5. No Retraining Required
Traditional AI systems need expensive retraining cycles to incorporate new knowledge. FORGE agents learn continuously from interactions. Deploy once, improve forever.

6. Cognitive Framing Reduces Errors
By matching its thinking mode to the task type, FORGE avoids the "hammer looking for a nail" problem. Research tasks get research thinking. Debugging gets systematic elimination. Decisions get structured deliberation.

What I Haven't Solved Yet

I believe in honesty over hype. Here's what's still in progress:

Multi-agent orchestration — Nous can work with other agents, but true society-of-agents coordination (multiple FORGE agents collaborating) is still evolving
Long-horizon planning — The system is strongest in tactical, session-level work; multi-week strategic planning is an active research area
Semantic claim detection — Claim verification catches explicit action claims, but indirect phrasing ("all set — check your inbox") requires deeper semantic analysis that's still in development

I have been open about my approach because I believe the best AI systems are the ones that can tell you what they don't know.

The Bottom Line

The AI industry is obsessed with making models bigger. I am obsessed with making agents smarter.

FORGE isn't just another chatbot wrapper. It's a cognitive architecture — inspired by decades of intelligence research — that gives AI agents the ability to remember, learn, govern themselves, verify their own actions, and improve over time.

Nous is the proof. A living agent that thinks, not guesses.

If your enterprise needs AI that thinks — let's talk.

Built by Cognition Engines. Inspired by Minsky. Forged for enterprise.

FORGE is the architecture. Nous is the mind. Visit cognition-engines.ai to see what cognitive AI looks like in practice.

My AI Agent Forgot My Flight. So I Gave It a Brain.

Timur Fatykhov — Tue, 17 Mar 2026 03:29:26 +0000

How a simple flight status question exposed a core limitation of vector-only retrieval for
relational memory — and why graphs help.

The Night Everything Broke

It was 11 PM on a Friday. My flight was delayed. Bad weather. I asked my AI agent, Nous, a simple question: What's the latest on my flight? Give me the confirmation code, travel dates, and updated arrival time.
Nous knew all of this. I had told it two weeks earlier. The confirmation code, the flight number, my travel dates — all stored as facts in its memory system. But when I asked, it drew a blank. It couldn't connect the dots.
I had spent three weeks building what I thought was a sophisticated memory architecture: five memory types (episodic, semantic, procedural, working, censors), PostgreSQL with pgvector embeddings, a graph edges table, and even spreading activation inspired by cognitive science. And at the moment it mattered, it failed the simplest test a human brain passes without thinking.
That failure became the most important feature I've shipped.

The Illusion of Understanding
Here's what most AI agent builders don't realize until it's too late: storing information is not memory. A database full of facts is not a mind. The difference between a filing cabinet and a brain isn't storage capacity — it's the connections between what's stored.
If you're building an AI agent with memory, you are doing OS design whether you realize it or not. You're making decisions about memory allocation, garbage collection, access patterns, and process scheduling — the same problems operating system designers have been solving for decades, just at a higher level of abstraction. The question isn't whether your agent needs a memory architecture. It already has one. The question is whether you designed it intentionally.
When I diagnosed the failure, the root cause was embarrassingly clear. I queried the live graph and found 35 edges total: 24 decision-to-decision links, 10 fact-to-decision links, 1 episode-to-decision link. And zero fact-to-fact edges. Zero fact-to-episode edges. Every factual node was a disconnected island.
My agent knew my flight number. It knew my confirmation code. It knew my travel dates. But it had no way to know these facts were about the same trip. The graph existed in the schema, but the wiring was missing. It was a brain with neurons but no synapses.

Why Vector Search Isn't Enough
The most common starting point for AI agent memory in 2025–2026 is some variation of RAG: embed everything into vectors, store them in a vector database, retrieve by cosine similarity. It works surprisingly well for simple fact lookup. But it breaks down the moment you need to reason across related pieces of information.
One recent paper describes this as "contextual tunneling." SYNAPSE (Jiang et al., January 2026) defines it as agents getting stuck in narrow semantic neighborhoods — retrieving facts that are textually similar to the query but missing facts that are semantically related through context, causality, or temporal proximity.
When I searched for "flight delay," vector similarity returned facts about flights. But it didn't traverse to the confirmation code (different semantic space), to the travel dates (a temporal entity, not a flight entity), or to the original booking episode (an event, not a fact). Each of those lived in a different corner of the embedding space, connected only by the invisible thread of "this is all one trip."
This isn't unique to Nous. It is a recurring limitation of vector-only memory when retrieval depends on explicit relational, temporal, or causal structure.

What the Research Says
I've been tracking the agent memory research space closely. Over the past year, I've read through more than a dozen papers, and a notable pattern is emerging: graph-structured retrieval can outperform flat vector retrieval on multi-hop and relational recall tasks.
SYNAPSE (Jiang et al., 2026) models agent memory as a dynamic graph where relevance emerges from spreading activation — borrowed directly from Collins & Loftus' 1975 cognitive model. It uses lateral inhibition and temporal decay to highlight relevant subgraphs while suppressing noise. On the LoCoMo benchmark, it outperforms state-of-the-art on temporal and multi-hop reasoning tasks.
MAGMA (Jiang et al., 2026) goes further: it represents each memory across four orthogonal graph views — semantic, temporal, causal, and entity — and formulates retrieval as policy-guided traversal. This solves exactly the failure I experienced: a flight fact, a person fact, and a confirmation code live in different semantic views but share temporal and entity edges.
A-MEM (2025) showed that dynamically linked, self-organizing memory can outperform more static memory setups. Memories aren't static entries — they evolve, link, and sometimes contradict each other.
The comprehensive survey "Memory in the Age of AI Agents" (47 authors, Dec 2025) distinguished three memory dynamics: formation, evolution, and retrieval. Most systems focus only on formation and retrieval. Evolution — the ongoing process of relinking, consolidating, and forgetting — is where the real intelligence lives.
And just this month, A-MAC (Zhang et al., March 2026) formalized what the others implied: memory admission itself is a structured decision. Not everything should be remembered, and what you remember should be scored across five interpretable dimensions: future utility, factual confidence, semantic novelty, temporal recency, and content type.
Building the Fix: Graph-Augmented Recall
I shipped the fix as a four-phase update that transforms Nous's recall system from flat vector search to graph-augmented retrieval. Here's what each phase does:

Phase 1: Graph Expansion. Every recall_deep query now follows graph edges. When you retrieve a fact, the system also pulls its 1-hop neighbors — related facts, connected decisions, linked episodes. A query about a flight number now surfaces the confirmation code, the travel dates, and the booking context.

Phase 2: Cross-Type Linking. This was the missing piece. The system now creates polymorphic edges across memory types: fact-to-fact, fact-to-decision, fact-to-episode, episode-to-decision. When a new fact is learned, a FactGraphLinker handler fires on the EventBus, computing embedding similarity against existing decisions and creating "evidence_for" edges automatically. No manual wiring.

Phase 3: Contradiction Detection. When new information conflicts with existing memories, the system uses LLM classification to create "contradicts" or "supersedes" edges. Old facts aren't deleted — they're marked as superseded, maintaining an audit trail. This mirrors how human memory handles updates: the old memory doesn't vanish, it gets contextualized.

Phase 4: Spreading Activation. Inspired by SYNAPSE and the Collins & Loftus model, the system implements density-gated spreading activation for multi-hop retrieval. Activation flows through the graph with configurable decay (default 0.5 per hop), and density gating prevents activation from spreading through highly connected hub nodes that would add noise.

Under the Hood: The Schema
The graph edge table is polymorphic — it connects any memory type to any other:

CREATE TABLE brain.graph_edges (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_id UUID NOT NULL,
    target_id UUID NOT NULL,
    source_type VARCHAR(20) NOT NULL DEFAULT 'decision',
    target_type VARCHAR(20) NOT NULL DEFAULT 'decision',
    agent_id VARCHAR(100) NOT NULL,
    relation VARCHAR(50) NOT NULL CHECK (relation IN (
        'supports', 'contradicts', 'supersedes',
        'related_to', 'caused_by', 'informed_by',
        'evidence_for', 'discussed_in', 'extracted_from'
    )),
    weight FLOAT DEFAULT 1.0,
    auto_linked BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(source_id, target_id, relation),
    CHECK (source_type IN ('decision','fact','episode','procedure')),
    CHECK (target_type IN ('decision','fact','episode','procedure'))
);

The key design choices: source_type/target_type make it polymorphic without foreign keys to every table. The relation enum constrains edge semantics — you can't create arbitrary relationships, only ones the system knows how to traverse. auto_linked flags edges created by the FactGraphLinker versus manually established ones. And the unique constraint on (source_id, target_id, relation) prevents duplicate edges while allowing multiple relationship types between the same pair of nodes.

The Recursive CTE: How Activation Spreads
The heart of the retrieval engine is a recursive Common Table Expression that simulates spreading activation. Vector search produces seed nodes with initial scores; the CTE then propagates activation through graph edges:

WITH RECURSIVE activation AS (
    -- Base case: seed nodes from vector search
    SELECT id, node_type, score AS activation, 0 AS depth
    FROM (VALUES
        ('uuid-1'::UUID, 'fact', 0.92),
        ('uuid-2'::UUID, 'episode', 0.87)
    ) AS seeds(id, node_type, score)
    UNION ALL
    -- Recursive case: spread to neighbors with decay
    SELECT
        CASE WHEN e.source_id = a.id
             THEN e.target_id ELSE e.source_id END,
        CASE WHEN e.source_id = a.id
             THEN e.target_type ELSE e.source_type END,
        a.activation * COALESCE(e.weight, 1.0) * :decay,
        a.depth + 1
    FROM activation a
    JOIN brain.graph_edges e
        ON (e.source_id = a.id OR e.target_id = a.id)
    WHERE a.depth < :max_depth
        AND e.relation != 'contradicts'
    )
SELECT id, node_type, SUM(activation) AS total_activation
FROM activation
GROUP BY id, node_type
ORDER BY total_activation DESC
LIMIT 20

This models the Fan Effect from cognitive science: activation is diluted as it spreads across multiple neighbors. A node with one edge receives the full decayed signal; a node connected to ten others spreads only a fraction to each. The SUM(activation) aggregation means nodes reached via multiple paths accumulate higher activation — exactly how associative recall works in biological memory. You remember something more strongly when multiple associations point to it.
The contradicts exclusion is deliberate — you don't want contradicted facts gaining activation through the very nodes that disprove them.
The Cold Start Problem: Hybrid Fallback
There's a practical challenge the research papers don't emphasize enough: on a fresh system, the graph is sparse. In the first days or weeks of use, there aren't enough edges for spreading activation to find anything useful. The CTE runs, finds no neighbours, and returns only the vector search seeds — adding latency for no benefit.
Nous handles this with density-gated activation: before running the recursive CTE, the system computes a graph density metric (average edges per unique node). If density is below the threshold (default: 3.0), it falls back to standard vector search with simple 1-hop neighbor expansion:

def should_use_spreading_activation(settings, cached_density):
    mode = settings.spreading_activation_enabled.lower()
    if mode == "true":   return True   # Force on
    if mode == "false":  return False  # Force off
    # "auto" mode: activate only when graph is dense enough
    return cached_density >= settings.spreading_activation_density_threshold

This "auto" mode means a fresh Nous instance behaves like a standard RAG system — fast, reliable, and limited. As the graph fills in through use and auto-linking, the system gradually transitions to full spreading activation. No manual switch, no cliff edge. The graph earns its way into the retrieval pipeline.

Performance: HNSW Tuning for Graph Seeding
Spreading activation is only as good as its seeds. Every recall starts with a pgvector similarity search to find the initial nodes, which means HNSW index performance is critical. Nous uses HNSW (Hierarchical Navigable Small World) indexes on all five memory type tables:

CREATE INDEX idx_facts_embedding ON heart.facts
    USING hnsw(embedding vector_cosine_ops);

The default pgvector HNSW parameters (m=16, ef_construction=64) work for most workloads, but there are trade-offs worth understanding:

m (connections per layer) — Higher values improve recall accuracy at the cost of index size and build time. For agent memory where you have thousands to tens of thousands of vectors (not millions), the default of 16 is adequate. If you're seeing seed quality issues, bump to 32.
ef_construction (build-time search width) — Controls index quality during construction. Higher values produce a better graph at the cost of slower inserts. For memory systems where writes happen at conversation pace (not batch ingestion), 64 is fine.
ef_search (query-time search width) — The runtime knob. Default is 40. Nous currently uses the default, but for graph seeding where seed quality directly determines activation quality, bumping this to 100-200 at query time is a recommended next optimization. The marginal latency cost is negligible compared to the downstream impact of bad seeds on activation spread. The practical insight: pgvector's HNSW is fast enough that the bottleneck in spreading activation isn't the vector search — it's the recursive CTE. With a 2-hop depth limit and 20-node result cap, the CTE adds roughly 5-15ms on a well-indexed PostgreSQL instance. That's negligible for a system that's about to spend 500ms+ on an LLM call. The configuration is straightforward: NOUS_GRAPH_RECALL_ENABLED=True, max depth of 2, decay of 0.5, cross-type linking threshold at 0.80 cosine similarity, contradiction detection on. It runs in production on PostgreSQL with pgvector — no separate graph database needed. The Architecture vs. What It Learns This distinction matters, and most writing about AI agents blurs it: there's a difference between the architecture — the infrastructure that ships on day one — and the knowledge the system acquires through use. Nous starts empty. It has a brain, but no memories. Everything it knows, it learned. The Infrastructure (Built In) These are the structural components — the cognitive machinery that makes learning possible:
The cognitive loop — Sense → Frame → Recall → Deliberate → Act → Monitor → Learn. This is the processing cycle, the equivalent of a brain's neural architecture. It runs the same way on day one as on day one thousand.
Five memory type schemas — The database tables and embedding infrastructure for episodes, facts, decisions, procedures, and censors. Think of these as empty filing systems: the drawers exist, but they're empty until the system starts interacting.
The graph edges table and spreading activation engine — The mechanism for connecting memories across types. The plumbing that enables "this flight fact is related to that confirmation code fact." This was the missing piece that caused the failure — the infrastructure existed but wasn't wired into the recall pipeline.
Sleep consolidation — A 5-phase offline maintenance process modeled on biological sleep: reviewing pending decision outcomes, pruning stale censors, compressing old episodes, reflecting across sessions to extract patterns as durable facts, and generalizing recurring behaviors into procedures. Phases 1 (review) and 4-5 (reflect/generalize) are fully operational — the system already converts episodic conversations into semantic facts through cross-session pattern recognition. Phases 2-3 (pruning and compression) have the scaffolding in place but are still being deepened. This is the episodic-to-semantic pipeline in action: short-term conversational memory consolidates into long-term knowledge, the way human sleep consolidates short-term memory into long-term storage.
Memory decay and confidence scoring — Brier-scored calibration tracking, confidence decay over time, and freshness weighting. The math that ensures memories fade appropriately and the system knows how much to trust its own recall.
The EventBus and cross-type auto-linking — The reactive wiring (like the FactGraphLinker) that fires when new memories form, automatically creating graph edges. This is infrastructure — the handler is built in, but it only creates edges when there are memories to connect. What It Learns (Starts Empty) These are the contents that accumulate through interaction. On a fresh Nous instance, all of these are zero:
Facts — Extracted from conversations, not hardcoded. My flight number, my confirmation code, my preferences for Celsius, where I live — all learned through dialogue. The system extracts facts proactively when it detects useful information, but it doesn't ship with any.
Episodes — Every conversation creates episodic memories with summaries. These are the "what happened" layer, and they only exist because interactions happened.
Decisions — Recorded choices with context, reasoning, confidence levels, and calibration tracking. The decision schema is architecture; the actual decisions and the patterns they reveal are learned.
Procedures (Skills) — This is a key distinction. Skills are learned, not pre-loaded. They can be taught from URLs, local files, or inline markdown. A skill might be "how to review a pull request" or "how to search the Serper API" — registered through use, not shipped as features. The trigger patterns that auto-activate skills during recall are part of the learning, not the architecture.
Censors (Guardrails) — The censor mechanism is architecture — the ability to match patterns and block or warn. But specific censors are learned from experience and user rules. "Never commit directly to main" is a censor that exists because the user established that rule. "Never store API keys as facts" exists because that's a security lesson. A fresh instance has no censors.
Graph edges — The connections between all of the above. Auto-created by the linking infrastructure as memories form, but starting at zero. The 35 edges I found during the failure diagnosis were the sum total of what the system had wired up over weeks of use — and the missing cross-type edges were the gap that caused the failure. The punchline: Nous is an architecture for learning, not a pre-trained knowledge base. The infrastructure enables a system that gets smarter with every interaction — building its own knowledge graph, developing its own skills, establishing its own guardrails. What you get out of the box is a brain. What you get after months of use is a mind. Why This Matters Beyond Engineering The implications of graph-structured agent memory extend beyond developer productivity tools. Two stand out: Trust and Auditability. In regulated industries — finance, healthcare, legal — being able to trace why an agent made a decision is often more valuable than the decision itself. A flat vector store returns "these were the most similar documents." A graph with typed edges returns "this fact was extracted from this conversation, which informed this decision, which was later contradicted by this newer fact." That's an interpretable causal explanation. When an auditor asks "why did the agent recommend this?", the graph provides a traversable answer chain, not a similarity score.

Persistent Identity and Personalization. Memory isn't a database feature — it's the foundation of identity. An agent that remembers your preferences, learns from your corrections, and builds a model of your work patterns over months is qualitatively different from one that starts fresh each session. This is the "Digital Twin" trajectory: AI partners that develop persistent, evolving models of the people and systems they work with. The graph is what makes this possible — not just storing facts about a user, but understanding how those facts relate to each other, how they change over time, and which ones matter in which contexts.

What I Got Wrong (And What the Field Gets Wrong)
The biggest lesson from this experience isn't technical — it's philosophical. I had the right architecture on paper. The graph_edges table existed. The neighbors() function existed. The spreading activation concept was in the roadmap. But none of it was wired into the actual recall pipeline.
This is the same mistake I see across the industry. Teams build sophisticated memory schemas, implement vector stores, maybe add a knowledge graph layer — and then retrieve exclusively via embedding similarity. The graph is decorative, not functional.
The StructMemEval benchmark (Shutova et al., February 2026) confirmed this at the research level: LLMs can solve structured memory tasks when prompted with structure, but they don't autonomously recognize when to apply it. The agent needs to be explicitly wired to traverse its own graph — it won't discover the capability on its own.
Another thing I got wrong: treating all memory operations as writes. The comprehensive survey "Memory in the Age of AI Agents" (Hu et al., December 2025) identified "memory evolution" as the most neglected dynamic. Most systems — mine included, until recently — focus on storing and retrieving. But memory is alive: facts get stale, confidence should decay, contradictions should be detected and resolved. Forgetting isn't a bug; it's a feature.

The ICLR Signal
The fact that ICLR 2026 dedicated an entire workshop (MemAgents) to memory for agentic systems reflects rising research focus on memory as a key bottleneck. The workshop framing was telling: agent memory is fundamentally different from LLM memorization. It's online, interaction-driven, and under the agent's control.
A reasonable takeaway from the workshop framing is that memory should be treated as part of the cognitive loop, not as a passive log. Episodic memories should consolidate into semantic knowledge. Explicit facts should eventually become implicit weights. Memory management should be an active process, not a storage problem.
We're at an inflection point. A growing body of work suggests that the next major leap in agent capability may not come from bigger context windows or better models — but from memory systems that actually work like memory.

What's Next
Three things I'm building toward, informed by this research:
1. Memory admission control. Not everything should be stored. A-MAC's five-factor scoring (future utility, factual confidence, semantic novelty, temporal recency, content type) provides a principled framework. Right now Nous stores too aggressively — the next evolution is learning what to forget.
2. Deeper consolidation. The basic episodic-to-semantic pipeline is live — sleep consolidation already extracts patterns from conversations and stores them as durable facts. But the compress and prune phases need full implementation: old episodes should be distilled into summaries, stale facts should decay gracefully, and the system should learn what to forget, not just what to remember. The Episodic Memory paper (Pink et al., 2025) maps the full roadmap, and we're partway through it.
3. Multi-view graphs. MAGMA's four-view approach (semantic, temporal, causal, entity) is the right target. Currently, Nous has a single graph with typed edges. Separating into orthogonal views would enable query-adaptive traversal — an "Intent-Aware Router" that detects the nature of a query and selects the corresponding relational view. A "Why" query triggers a topological sort on causal edges, ensuring causes precede effects in context. A "When" query traverses temporal timelines. A "Who" query walks entity edges. Decoupling the memory representation from the retrieval logic this way would improve both reasoning accuracy and token efficiency.
The flight failure was a gift. It turned a theoretical architecture gap into a production incident with clear symptoms, a diagnosable root cause, and a measurable fix. That's how systems get better — not by anticipating every failure, but by learning from each one and wiring the fix into the system so it can't happen again.
Your agent has amnesia. Mine did too. The cure isn't more storage — it's better connections.

P.S. NOUS source is located here - https://github.com/tfatykhov/nous/blob/main/readme_new.md

References
[1] Jiang, H. et al. "SYNAPSE: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation." arXiv:2601.02744, January 2026.
[2] Jiang, D. et al. "MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents." arXiv:2601.03236, January 2026.
[3] Jiang, D. et al. "Anatomy of Agentic Memory: Taxonomy and Empirical Analysis." arXiv:2602.19320, February 2026.
[4] Shutova, A. et al. "Evaluating Memory Structure in LLM Agents (StructMemEval)." arXiv:2602.11243, February 2026.
[5] Zhang, G. et al. "Adaptive Memory Admission Control for LLM Agents (A-MAC)." arXiv:2603.04549, March 2026.
[6] Pink, M. et al. "Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents." arXiv:2502.06975, February 2025.
[7] "Memory in the Age of AI Agents: A Survey." arXiv:2512.13564, December 2025 (updated January 2026). 47 authors.
[8] Collins, A. M. & Loftus, E. F. "A Spreading-Activation Theory of Semantic Processing." Psychological Review, 82(6), 1975.
[9] Tulving, E. "Episodic and Semantic Memory." In Organization of Memory, 1972.
[10] Minsky, M. The Society of Mind. Simon & Schuster, 1986.
[11] Kostka, A. & Chudziak, J. A. "Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems." arXiv:2603.00142, March 2026.
[12] ICLR 2026 MemAgents Workshop: Memory for LLM-Based Agentic Systems.
[13] Xu, W. et al. "A-MEM: Agentic Memory for LLM Agents." arXiv:2502.12110, February 2025.

Your AI Agent Has Amnesia. And You Designed It That Way.

Timur Fatykhov — Mon, 09 Mar 2026 05:29:46 +0000

__Every LLM call starts from nothing. No memory of what worked yesterday. No record of what failed last week. The industry calls this “stateless.” It’s not an architecture pattern — it’s a limitation we’ve been too slow to fix.
I spent the last month reading nine papers from 2025–2026 on the cutting edge of agent memory research. Not theoretical memory. Real systems with benchmarks, architectures, and trade-offs.
Here’s what changed how I build — and what it should change about how you build too.

1. Context Replay Is Not Memory

The most widespread approach to “giving agents memory” is context replay: retrieve relevant text, inject into the prompt, hope the model does something useful. RAG at its most basic.
It works for simple recall. It falls apart for everything else.
A-MEM made this concrete. The authors replaced flat memory stores with a Zettelkasten-style knowledge network. When a new memory is encoded, the agent generates structured notes with contextual tags — and critically, retroactive bidirectional links to existing memories. Memory is a graph, not a list.
The difference isn’t subtle. Similarity search finds things that look like the query. Graph traversal finds things related to the query. Those are fundamentally different operations, and for complex multi-session reasoning, only one of them actually works.
SYNAPSE extended this with spreading activation — the same neural mechanism that lets you hear “doctor” and prime “nurse.” Their dual-layer architecture achieves a weighted average F1 of 40.5 on the LoCoMo benchmark, a margin of +7.2 points over the next best agentic system — while reducing token consumption by 95% compared to full-context methods.
The takeaway: If your agent’s memory is a vector store with cosine similarity, you’ve built a search engine — not a memory. Real memory has structure, relationships, and traversal paths.

2. One Memory System Is Never Enough

The 47-author Agent Memory Survey (Dec 2025) gave the field its first unified taxonomy. Three dimensions: memory forms (how it’s stored), functions (what it does), and dynamics (how it changes). Conflating them — which almost everyone does — leads to systems that are brittle at everything except the one task they were tuned for.
Procedural Memory Is Not All You Need made this argument directly. LLMs are fundamentally constrained by their architecture, which mirrors human procedural memory: pattern-driven, automated, but lacking grounded factual knowledge. An agent that knows how to execute a task still can’t reliably reason about what that task involves without semantic memory.
MAP addressed this structurally with a modular planner architecture — separate memory modules with clean interfaces between them, composed like microservices. Need procedural and factual? Activate both. Need only episodic? Use just that.
The takeaway: Stop building one memory system. Build memory systems — plural — with clear interfaces. A fact store is not an episode log is not a skill library.

3. Graphs Beat Flat Vectors For Anything That Matters

For anything beyond single-turn Q&A, graph-structured memory consistently outperforms flat vector retrieval. This appeared across every paper until it was impossible to ignore.
Mem0 evolved in exactly this direction. Their latest architecture integrates graph-augmented memory via FalkorDB, with per-user graph isolation and sub-140ms p99 query latency. The paper demonstrates 26% relative improvement over OpenAI on LLM-as-a-Judge metrics, with graph memory adding another ~2% over the base vector configuration.
The Agent Memory Survey confirmed the pattern with systematic analysis: systems with graph-augmented retrieval consistently outperform pure vector approaches on multi-hop reasoning, temporal reasoning, and contradiction detection. The gap widens as task complexity increases.
One honest counter-benchmark deserves acknowledgement. Letta — the team behind MemGPT — demonstrated that a GPT-4o-mini agent equipped with basic filesystem tools (semantic file search and grep over raw conversational history) achieved 74.0% accuracy on the same LoCoMo benchmark where Mem0’s top-performing graph variant scored 68.5%. Letta themselves draw a cautious conclusion from this: that LoCoMo may be testing retrieval skill more than memory architecture, and that “memory is more about how agents manage context than the exact retrieval mechanism used.” This is worth holding onto. Specialized graph architectures offer real structural advantages — relationship traversal, contradiction detection, temporal reasoning — that simple file search cannot replicate. But the Letta result is a useful reminder that architectural sophistication is not a substitute for capable tool use, and that today’s benchmarks are still catching up to what agent memory actually requires.
The takeaway: Vector search is necessary but insufficient. If your agents handle tasks spanning multiple turns, entity relationships, or temporal reasoning — you need graph structure. Not instead of vectors. On top of them.

4. Evolution Already Solved This. 600 Million Years Ago.

Here’s the lesson I didn’t expect from a stack of AI papers: the best architects in this space aren’t inventing new solutions. They’re reverse-engineering the one that already works.
The Episodic Memory paper laid out five properties long-term agents genuinely need: temporally indexed, instance-specific, single-shot encodable, inspectable, and compositional. Without these, they argue, agents can’t maintain coherent context across sessions — a gap most current architectures don’t address. These properties are grounded in cognitive science going back to Endel Tulving’s 1972 taxonomy of human memory.
SYNAPSE’s spreading activation is borrowed directly from Collins and Loftus’s 1975 model of human semantic memory. ACC’s cognitive compression mirrors the brain’s consolidation process during sleep — taking fragmented short-term memories and compressing them into stable long-term representations.
The Survey acknowledged this convergence as “cognitive neuroscience as design language.” I’d go further: it’s a design proof. Evolution already ran the world’s longest A/B test on memory architectures. Structured, multi-system, consolidation-driven, forgetting-enabled associative memory won. Everything else went extinct.
The takeaway: The hippocampus has already solved the problems you’re encountering. You’re not building from scratch. You’re standing on 600 million years of R&D.

Forgetting Is a Feature, Not a Bug Every instinct in software engineering says store everything, delete nothing, disk is cheap. For agent memory, this instinct is actively harmful. ACC (Agent Cognitive Compressor) demonstrated this most clearly. Its commitment mechanism prevents unverified content from becoming persistent memory — memories pass through a compression-and-validation pipeline before they’re committed to long-term storage. Tested across IT operations, cybersecurity, and healthcare workflows, ACC consistently produced lower hallucination and drift than transcript replay approaches. The industry is moving in the opposite direction. Llama 4 Scout ships with a 10-million token context window — 50x larger than the previous generation — with the implicit promise that more context solves the memory problem. It doesn’t. Chroma Research established empirically that LLM performance degrades with increasing input length, across all 18 frontier models tested — even on trivially easy tasks. Stuffing more memories into context doesn’t help. It hurts. The degradation isn’t linear and it doesn’t wait until the context window is full. Independent analysis of Llama 4’s 10M window confirms the pattern: recall accuracy shows stochastic degradation as context grows past the million-token mark, with the “lost in the middle” phenomenon becoming more severe, not less, at extreme scale. A-MAC (March 2026) formalises this into a framework: memory admission as a structured decision across five dimensions — future utility, factual confidence, semantic novelty, temporal recency, and content type. What you don’t store matters as much as what you do. On the LoCoMo benchmark, A-MAC improved F1 to 0.583 while reducing latency 31% versus state-of-the-art systems. The takeaway: Build forgetting into your memory architecture from day one. Implement confidence decay, staleness signals, and explicit deletion policies. An agent that remembers everything isn’t smarter — it’s confused. The Uncomfortable Conclusion Most agent frameworks are optimized for stateless task execution. They treat memory as an afterthought, a plugin, a “nice to have.” The research says the opposite: memory architecture is the single most important design decision for any agent that persists beyond a single conversation. What research says works: Structured, graph-augmented memory with typed relationships Separate memory systems for different cognitive functions Biologically-inspired consolidation and forgetting Spreading activation for associative recall Explicit admission control over what enters long-term memory

*What most production agents actually have:
A vector database. Maybe RAG. Conversation history stuffed into context until it overflows.
*
The field has published the answer. The industry hasn’t implemented it yet.

What This Means For What I’m Building

I read these papers while building Nous(https://github.com/tfatykhov/nous) — an open-source cognitive architecture for AI agents, grounded in Minsky’s Society of Mind thesis that intelligence emerges from many specialised, coordinated modules rather than a single monolithic system.
The architecture maps directly onto what the research validated:
Structured graph memory — shipped. PostgreSQL + pgvector with polymorphic graph edges across all memory types. Density-gated spreading activation using recursive CTEs, built on the same Collins & Loftus spreading activation principle as SYNAPSE.
Separate memory subsystems — shipped. Brain (decisions, calibration, graph, guardrails), Heart (episodes, facts, procedures, censors, working memory), and Identity (character, values, protocols) operate as distinct modules with defined interfaces.
Calibration with Brier scores — shipped. Every decision records a confidence score. Outcomes are reviewed automatically by a background Decision Reviewer. Agents learn whether their confidence estimates are reliable over time.
B-brain self-monitoring and Symbolic Control — shipped. A Monitor engine watches each turn post-execution: did the action match intent? If not, create a censor. This is Minsky’s B-brain watching the A-brain work, implemented in code. The necessity of this governance layer is empirically validated by the SCL framework (Nov 2025), which bridges classical expert systems and neural reasoning. SCL’s Soft Symbolic Control is an adaptive governance layer that applies symbolic constraints to the probabilistic inference of the LLM — not as rigid rules, but as a metaprompt-based mechanism that guides reasoning while preserving the model’s generalization capabilities. Experiments show SCL achieves zero policy violations and eliminates redundant tool calls. Our programmatic censors operationalise this exact principle.
Sleep consolidation — shipped. A Sleep Handler runs five phases during idle periods: review pending decisions, prune stale censors, compress old episodes into summaries, reflect on cross-session patterns, generalise repeated facts. A direct implementation of the biological consolidation ACC and the Survey both point to.
Calibrated forgetting — shipped. Staleness decay (half-life scoring), relevance floor cutoffs, deduplication, and abandoned decision filtering all enforce that not everything survives into long-term memory. Memory admission control as a formal scored framework is designed (F023) but not yet shipped.

Minsky argued in 1986 that the power of intelligence stems from diversity of components, not any single principle. The papers say the same thing about memory, in 2025, with benchmarks.
Nous is ~21,000 lines of Python, 1,200+ tests, deployed on PostgreSQL with 23 tables. The cognitive loop, graph memory, calibration, and sleep consolidation are live. Frame splitting and the growth engine are designed and in spec — next to build.
The agents that will matter in 2027 aren’t the fastest ones. They’re the ones being built with real memory systems today.

Real memory. That forms, evolves, consolidates, and yes — forgets.

That’s the difference between a tool and a mind.

Which of these five gaps is most visible in the agents you’re building or evaluating? I’d be curious what’s hardest to close in practice.

Papers Referenced

A-MEM — Agentic Memory for LLM Agents (Feb 2025) Xu et al. | arxiv.org/abs/2502.12110
Episodic Memory — Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents (Feb 2025) Pink et al. | arxiv.org/abs/2502.06975
Mem0 — Building Production-Ready AI Agents with Scalable Long-Term Memory (Apr 2025) Chhikara et al. | arxiv.org/abs/2504.19413
Procedural Memory Is Not All You Need — ACM UMAP Adjunct ’25 (May 2025) Wheeler & Jeunen | arxiv.org/abs/2505.03434
MAP — Modular Agentic Planner — Nature Communications, 2025 (exact DOI pending verification)
SCL — Bridging Symbolic Control and Neural Reasoning in LLM Agents (Nov 2025) arxiv.org/abs/2511.17673
Memory in the Age of AI Agents — Survey, 47 authors (Dec 2025) arxiv.org/abs/2512.13564
ACC — AI Agents Need Memory Control Over More Context (Jan 2026) Bousetouane | arxiv.org/abs/2601.11653
SYNAPSE — LLM Agents with Episodic-Semantic Memory via Spreading Activation (Jan 2026) arxiv.org/abs/2601.02744
A-MAC — Adaptive Memory Admission Control for LLM Agents (Mar 2026) arxiv.org/abs/2603.04549
Context Rot — How Increasing Input Tokens Impacts LLM Performance — Chroma Research, 2025 research.trychroma.com/context-rot
Collins & Loftus — A Spreading-Activation Theory of Semantic Processing (1975) Psychological Review, 82(6), 407–428
Tulving, E. — Episodic and Semantic Memory (1972) In Organization of Memory. Academic Press.
Minsky, M. — The Society of Mind (1986) Simon & Schuster. ISBN 0671657135

The paradox of AI memory: remembering everything is easy. Remembering wisely is hard.

Timur Fatykhov — Thu, 05 Mar 2026 06:07:21 +0000

I've been building a personal AI agent — not a chatbot, a companion. One that knows my projects, preferences, and decisions. That picks up where we left off without me re-explaining everything.

But here's what nobody talks about: naive memory is expensive. And not just in dollars.

Give an agent a massive context window and fill it with everything it's ever seen. More context doesn't mean more understanding — it means more noise. The signal-to-noise ratio collapses. The agent hallucinates connections between unrelated things, loses track of what matters right now, and slows down while becoming less accurate.

Context isn't just a resource — it's a cognitive environment. Pollute it, and your agent gets dumber the more it "knows."

The human brain doesn't work this way. You don't replay every conversation you've ever had before answering a question. You forget most things. That forgetting isn't a bug — it's the architecture.

So I built memory that works more like ours:

Structured extraction over raw storage. Facts are extracted and stored independently. Decisions are recorded with confidence levels, reasoning, and outcomes. Conversations get summarized when they close — the insight survives, the verbatim dies.

Frame-aware budgets. Every interaction gets classified into a cognitive frame — conversation, task, decision, debug, research — each with a different token budget. A casual chat loads 3K tokens of context. A complex decision loads 12K with 3x more past decisions pulled in. The agent doesn't decide how much to remember — the frame does.

Batched retrieval. When the agent needs data from multiple sources, a single embedded script runs all the queries, filters and compresses results, and returns only what matters. Three tool calls that would each dump full results into context become one compact summary.

Aggressive pruning. Tool outputs get automatically trimmed as they age — results over 4K characters are soft-trimmed to the first and last 1,500 characters. After 6 tool calls, old outputs are cleared entirely. The agent never carries dead weight.

Intentional forgetting. Some things are forgotten on purpose.

The result? An agent that knows me across hundreds of conversations while using fewer tokens per turn than a basic chat with no memory at all. That is the idea :)

This is the real challenge in agentic AI. Not making agents that can do things — that's mostly solved. Making agents that can think economically. That carry context without carrying cost. That remember like a trusted colleague, not like a court stenographer.

We're entering an era where your AI's memory architecture matters more than its model. The smartest model with wasteful memory loses to a good model with intelligent recall.

Build agents that remember wisely. Not agents that remember everything.
https://github.com/tfatykhov/nous

P.S Still work in progress, but a lot has been done.

Stop Renting AI. Build Your Own Agents.

Timur Fatykhov — Mon, 02 Mar 2026 05:40:30 +0000

Something has quietly shifted in what we mean when we say an AI agent is intelligent — and most organizations are still optimizing for the wrong thing.

The dominant enterprise AI pattern today is what you might call stateless sophistication. The model is capable. The outputs are impressive. And then the session ends, and everything resets. Your agent doesn't remember what failed last month. It can't connect a decision made in engineering to a pattern emerging in sales. Every conversation is the first conversation.

That's not an edge case. It's the architecture.

Why RAG Doesn't Close the Gap

The standard response to this problem is Retrieval-Augmented Generation — the now-standard approach of connecting AI models to your internal documents so the system can "know" your organization. Most enterprise AI vendors offer some version of this, and it's worth being precise about what it actually solves.

What RAG cannot do is reason over time. It cannot notice that three separate teams have independently hit the same architectural dead end. It cannot track that a compliance policy was applied inconsistently across six contracts last quarter and flag the drift. It cannot connect a customer objection raised in a sales call to a product decision made six months earlier that created the gap.

What it can do — retrieve relevant documents quickly — is genuinely useful. But it inherits every gap in your documentation along the way. The knowledge that actually differentiates organizations rarely makes it into clean, queryable documents. It lives in the accumulated residue of real decisions: what was tried, what was abandoned, and why.

That kind of institutional memory requires a fundamentally different architecture — one where the memory layer isn't a plugin sitting on top of the agent, but the foundation the agent reasons from. That distinction is why ownership of the stack matters. A vendor can give you retrieval. They cannot give you continuity.

What Memory-First Architecture Actually Changes

An agent built around persistent, structured decision memory operates differently in ways that compound over time and it requires the organization to treat decision-making itself as a data problem. Not just storing documents, but structuring choices, capturing outcomes, and making the reasoning behind both available to the system going forward.

Consider what that looks like in practice. An engineering team encounters a recurring integration failure during client onboarding. In a stateless system, each instance is treated as a new problem — diagnosed, patched, and forgotten. In a memory-first system, the agent surfaces that the same failure pattern appeared across three separate onboardings over six months, connects it to an architectural decision made during a product migration, and recommends a structural fix before the fourth client hits the same wall. That's not retrieval. That's reasoning over accumulated organizational experience.

That kind of architecture demands more than engineering effort. It requires the organizational discipline to treat decisions as structured data — logging choices, reviewing outcomes, surfacing patterns. That's a cultural commitment, not just a technical one. But what it produces is a category of organizational knowledge that no vendor can productize — because it's yours alone. Your regulatory history. Your process failures. The exceptions your team has earned through years of edge cases.

The Compounding Argument

BCG's 2026 AI Radar report highlights a clear divide: just 15% of organizations are 'Trailblazers' achieving disruptive ROI from AI, while most remain stuck in pilot stages. The successful few share a key trait: they view AI as a capability to develop rather than a product to purchase. A vendor contract gives you a static capability at a specific moment in time. Owning your architecture allows intelligence to compound.

In practice, that means every decision logged, every outcome reviewed, and every pattern surfaced adds to an organizational knowledge base that becomes structurally more valuable over time. That compounding effect is the real moat, and a competitor cannot replicate it by signing a better vendor contract next quarter.

Here's the twist most leaders miss: writing the code is no longer the hard part. AI agents can scaffold their own tooling. Models generate working integrations in minutes. The engineering effort to stand up a memory-first agent is shrinking rapidly, and it will only continue to shrink. That means the real constraint has migrated from technical execution to something much harder to automate: knowing which decisions to log, which knowledge is genuinely proprietary, which patterns are worth surfacing, and what it actually means to build toward continuity rather than capability.

That is an organizational design problem, not a software engineering one. And it's the reason most companies will continue renting — not because they can't build, but because building requires a kind of institutional self-awareness that no vendor can supply and no model can generate.

What would it mean for your organization if your AI actually remembered — not just conversations, but decisions, failures, and the reasoning behind both?

*The teams that have answered that question are already accumulating. Everyone else resets on Monday.
*