After three months of watching my AI trading bot re-reason from scratch every single session, I built something to fix it. This is the technical story of what I built, why the obvious solutions didn’t work, and what we learned along the way.
The problem no one talks about honestly
Every AI agent framework demo looks impressive. The agent reasons well, remembers context within a conversation, and produces coherent output.
Then you restart it.
Everything is gone. Every preference the user stated. Every decision the agent made. Every pattern it noticed. The agent wakes up like it was born five minutes ago, ready to re-discover everything it already learned.
We call this goldfish syndrome. And it’s not a minor inconvenience — it’s a fundamental architectural problem that makes most production AI agents significantly less useful than they could be.
The session window is not memory. Stuffing previous conversations into the context window is not memory. It’s expensive, it has hard limits, and it doesn’t scale. Real memory means the agent builds a persistent model of the world that grows smarter over time, not a transcript it re-reads every morning.
Why the existing solutions didn’t work for me
When I started looking for solutions I found three main players: Mem0, Zep, and Letta. I evaluated all three seriously.
Mem0 is well-engineered but Python-first. My agent stack is Node.js. The Python bridge options are ugly and the cloud API charges per memory operation, which means costs scale with every agent interaction — the opposite of what infrastructure should do.
Zep has similar problems. Cloud-dependent, Python-first, subscription pricing. It also focuses heavily on conversation history rather than structured knowledge — useful for chatbots, less useful for agents that need to reason about past decisions.
Letta (formerly MemGPT) is the most ambitious of the three. The architecture is genuinely interesting. But it’s a full agent framework, not a memory layer. I didn’t want to rebuild my agent inside someone else’s framework. I wanted to add memory to the agent I already had.
All three share a deeper problem: they treat memory as vector search. Store embeddings, retrieve by similarity, inject into context. This works for surface-level recall but fails for the kind of reasoning I needed.
My trading bot doesn’t just need to remember what happened. It needs to remember why it made decisions, who the relevant entities were, and how events relate causally to outcomes. Vector search alone can’t reconstruct that.
The architecture I ended up building
We call it VEKTOR, from vector memory. The core insight is that agent memory isn’t one problem — it’s four problems that need to be solved simultaneously.
Graph 1: Semantic edges
The foundation. Every memory gets embedded using a local model (all-MiniLM-L6-v2, runs entirely on-device) and connected to semantically similar memories via weighted edges. This handles the “find things like this” retrieval that vector search is good at.
The key difference from standard RAG is that I’m building a graph of relationships between memories, not just an index of individual embeddings. A memory doesn’t just exist in isolation — it exists in relation to every other memory the agent has formed.
Graph 2: Causal chains
This is where it gets interesting. When an agent makes a decision, it reasons about why. I extract that reasoning and build directed edges between the triggering conditions and the decision outcomes.
Example from my trading bot: “Fear index dropped to 22 → entered long position → BTC rallied 4.2% → closed with profit.” That’s a causal chain. Three months later, when the fear index drops again, the agent can recall not just that this situation is similar to a past situation, but specifically what happened and what worked.
Vector search would retrieve the memory. The causal graph tells the agent what to do with it.
Graph 3: Entity relationships
Agents interact with entities — people, assets, concepts, systems. Over time they should build a model of those entities and how they relate to each other.
My trading bot tracks assets, indicators, and market conditions as entities with properties and relationships. When BTC and ETH start decorrelating, that’s a relationship change the entity graph can capture and make available for future reasoning.
Graph 4: Scene memory
Raw memories are noisy. Individual events need to be grouped into coherent episodic chunks — scenes — that represent meaningful units of experience.
The scene layer sits between raw input and the semantic graph. New memories are first grouped into scenes by temporal and thematic proximity, then the scenes are integrated into the semantic and causal graphs. This compression keeps the graph manageable as it grows and improves retrieval quality by providing episodic context.
The memory lifecycle
Memories don’t just get written and forgotten. They move through a pipeline:
Raw → every input gets stored immediately in its original form.
Scene → a background process groups recent raw memories into coherent episodes, compresses them, and extracts key entities and causal relationships.
Graph → scene-level memories get integrated into all four graphs, with edges created to existing memories based on semantic similarity, causal relationships, and entity overlap.
The AUDN (Autonomous Update Decision Network) layer runs before every write and classifies each candidate memory as ADD or NOOP. If a memory is too similar to something already in the graph, it gets dropped rather than creating noise. This deduplication step turned out to be more important than I initially expected — without it, the graph fills with near-identical memories and retrieval quality degrades quickly.
What surprised me
Three things I didn’t expect going in:
Deduplication matters more than retrieval. I spent most of my early effort optimising the retrieval algorithm. The bigger win came from being more aggressive about what gets written in the first place. A clean graph with 500 high-quality memories outperforms a noisy graph with 5,000.
Causal memory changes agent behaviour qualitatively. With only semantic memory, the agent would recall that a situation was similar to a past situation. With causal memory, it recalls what it decided and what happened as a result. The difference in reasoning quality is significant.
Local embeddings are good enough. I was concerned that all-MiniLM-L6-v2 would produce inferior embeddings compared to OpenAI’s models. In practice, for the kind of agent memory retrieval I’m doing, the quality difference is negligible and the latency and cost advantages are substantial.
Results after three months
My trading agent has accumulated 1,847 semantic edges, 501 causal chain links, and 16 tracked entities across four months of operation. Memory consumption is around 180MB. Query latency is under 50ms on the server it runs on.
More importantly: the agent reasons differently. It references specific past trades. It notices when current conditions match historical patterns. It doesn’t repeat analyses it’s already done. The improvement in output quality is noticeable and consistent.
The implementation
The full system is Node.js, built on sqlite-vec for graph storage, better-sqlite3 for the database layer, and the Transformers.js port of all-MiniLM-L6-v2 for local embeddings. It works with any LLM via adapters for Groq, OpenAI, and Ollama.
The drop-in API is three lines:
javascriptconst vektor = require(’vektor-memory’);
await vektor.remember(’agent-id’, { event: ‘BTC broke 95k support’, signal: ‘fear_index_low’ });
const context = await vektor.recall(’agent-id’, ‘what happened near 95k?’);
We have packaged it as a commercial library at vektormemory.com. But the architectural ideas here are the more interesting part — I’d encourage anyone building agents to think carefully about what kind of memory their agents actually need, rather than defaulting to vector search because it’s what everyone else is doing.
What’s next
A few directions I’m exploring:
Federated memory — multiple agents sharing a memory graph, contributing observations and learning from each other’s experiences.
Memory pruning — intelligently forgetting low-value memories as the graph grows, analogous to how human memory consolidates during sleep.
Cross-modal memory — storing and retrieving memories that include structured data, not just text.
If you’re building agents and have hit the memory wall, I’d genuinely like to hear how you’re approaching it. The space is early and the right architecture isn’t obvious yet.

Top comments (0)