Atomic memories vs context dumps: how memory granularity affects recall quality

#ai #agents #embeddings #architecture

You just had a productive session with your AI agent. It learned your deploy workflow, your naming conventions, and that you hate tabs. Time to store all of that. Do you dump the whole session summary in one big memory, or break it into individual facts?

This isn't a style question. It changes how well your agent remembers things later.

What happens when you store a context dump

Say your agent stores this after a session:

memoclaw store "Session with Ana on March 6. Discussed the deploy pipeline, prefers 2-space indentation, wants PR descriptions to include testing section, moving API from Railway to Fly.io next month, likes English for work but Portuguese for casual chat." \
  --importance 0.7 --namespace personal

That's 5 distinct facts crammed into one memory. MemoClaw generates a single embedding vector for the whole block. That vector represents the average meaning of everything in there.

Two weeks later, the agent needs to recall indentation preferences:

memoclaw recall "code formatting preferences" --namespace personal

Will it find the session dump? Maybe. The embedding for "code formatting preferences" has to match a vector that also encodes deploy pipelines, PR templates, hosting plans, and language preferences. The semantic similarity might be high enough. Or it might not.

Ten sessions of dumped context means ten blobs, each encoding 5-8 different topics. Recall becomes a lottery.

What happens with atomic memories

Break that session into individual facts:

memoclaw store "Ana prefers 2-space indentation. No tabs." \
  --importance 0.8 --namespace personal --tags "code-style"

memoclaw store "PR descriptions should include a testing section" \
  --importance 0.7 --namespace personal --tags "code-review,pr"

memoclaw store "Planning to migrate API from Railway to Fly.io (April 2026)" \
  --importance 0.5 --namespace personal --tags "infra,migration"

Five memories instead of one. Each has a focused embedding vector. When the agent recalls "code formatting preferences," the 2-space indentation memory lights up because that's exactly what it's about.

Five stores cost $0.025 instead of $0.005. That extra two cents buys you dramatically better recall accuracy.

How embeddings work (the short version)

MemoClaw converts each memory into a vector: a list of numbers representing meaning. When you recall, it converts your query into a vector too, and finds stored memories whose vectors are closest.

Short, focused text produces tight, specific vectors. Long, multi-topic text produces diffuse vectors. Its vector lands somewhere in the middle of all those topics. Not close to any of them specifically.

This is why atomic memories recall better. Their vectors are sharp.

Practical splitting rules

One fact = one memory. "User prefers dark mode" is one memory. "User prefers dark mode and uses vim bindings and likes responses in Portuguese" is three memories.

One decision = one memory. Don't combine it with the reasoning. The decision and the reasoning are separate memories with different importance levels.

One correction = one memory. When the user says "actually, don't do X anymore," that's a high-importance standalone memory. Don't bury it in a session summary.

Importance scores compound the problem

Context dumps with wrong importance scores are doubly bad. If you store a 5-topic dump at importance 0.8, all five topics get boosted equally.

Atomic memories let you score accurately:

"Prefers 2-space indentation": 0.8 (permanent)
"Migrating to Fly.io in April": 0.5 (temporary)
"Hates tabs": 0.9 (the agent really shouldn't forget this)

After April, you delete the Fly.io memory. The others stay. If they were all in one dump, you'd have to rewrite the entire memory to remove the stale part.

When bigger memories are fine

Atomic isn't always the answer. Meeting notes with a single topic, code snippets with explanation, step-by-step procedures: these work as larger memories because the content is cohesive.

The test: would someone recalling this memory always want all of it? If yes, keep it together. If they might want just one part, split it.

Tl;dr

Store small, specific facts. One idea per memory. Score importance individually. It costs a couple extra cents per session and makes recall dramatically more accurate.

Your agent's memory should work like flash cards, not like a diary.