DEV Community

Cover image for I Built a Memory API That Beats Mem0 on LongMemEval Without Using a Single LLM Token
Becomer.net
Becomer.net

Posted on

I Built a Memory API That Beats Mem0 on LongMemEval Without Using a Single LLM Token

The problem I kept hitting
Every time I built a multi-agent pipeline I hit the same four walls.
Memory dies when the process ends. Agents can't share context without message passing. Every recall burns 500-7,000 tokens on an LLM reasoning pass. And memory is locked to one LLM provider — your GPT app and your Claude app can't share the same user context.
I spent months building something that fixes all four.

What I built
BECOMER is a persistent memory API for AI agents. Zero tokens per recall. Works with any LLM. Agents share memory across a namespace without coordination code.
The open source client framework — becomer-agents — attaches to any existing agent stack without replacing it. LangChain, CrewAI, AutoGen, LlamaIndex, LangGraph. You don't rebuild anything. You just add memory to what you already have.
pythonpip install becomer-agents

The benchmark result
94.4% on LongMemEval. Full n=500, no sampling, string-based scoring — no LLM judge inflating the numbers.
Mem0 scores 93.4% on the same benchmark. They spend approximately 6,787 tokens per query to get there. BECOMER spends zero.
SystemLongMemEvalTokens/queryBECOMER94.4%0Mem0 v293.4%~6,787Standard RAG~75-80%0
On LOCOMO BECOMER scores 69.5% retrieval only. Mem0 scores 91.6% but runs an LLM reasoning pass to get there. The gap is architectural not a retrieval failure — BECOMER retrieves, your LLM reasons on top. That's by design.
Full methodology, 30 iterations, zero regressions: becomer.net/benchmarks.html

How the architecture works
Most memory systems run an LLM under the hood. Every recall is actually an LLM call that extracts, reasons, and returns. That's why they burn thousands of tokens per query.
BECOMER is a purpose built retrieval engine. No LLM in the memory layer. Storage and retrieval happen in our engine. Your LLM receives perfectly prepared context and reasons on top of it.
The result — zero token cost at retrieval, ~150ms P50, and memory that persists across sessions, LLMs, and processes.

The multi-agent namespace pattern
This is the part I'm most proud of.
Every agent gets a namespace keyed by task ID and role.
task-abc.researcher ← private to researcher
task-abc.executor ← private to executor

task-abc.shared ← readable and writable by all agents
A researcher agent stores findings. An executor agent recalls them. No message passing. No coordination code. Different LLMs can hit the same namespace simultaneously.
pythonfrom becomer_agents import MultiAgentPipeline

def researcher(task, own_ns, shared_ns):
own_ns.store("API endpoint: POST /v1/payments, OAuth2 bearer")
own_ns.store("Rate limit: 100 req/s")
return "Research complete"

def executor(task, own_ns, shared_ns):
# Recall what researcher found — zero tokens, no message passing
findings = shared_ns.recall("payment API details", top_k=5)
return "Plan written"

pipeline = MultiAgentPipeline(
api_key=os.environ["BECOMER_API_KEY"],
task_id="payments-task-001",
roles=["researcher", "executor", "reviewer"],
)

LangChain drop-in — 3 lines
pythonfrom becomer_agents import BecomerMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=BecomerMemory(api_key="your-key")
)
Memory persists across sessions automatically. Works with CrewAI, AutoGen, LlamaIndex — same pattern.

Self-improving agents
pythonfrom becomer_agents import SelfImprovingPipeline

pipeline = SelfImprovingPipeline(
api_key=os.environ["BECOMER_API_KEY"],
task_id="optimizer-001",
)

def my_agent(task, context):
# context = what worked in previous iterations
approach = "zero-shot" if not context else "few-shot + CoT"
score = run_eval(task, approach)
return {"approach": approach, "score": score}

for i in range(5):
result = pipeline.run_iteration(
task="classify customer sentiment",
fn=my_agent,
)
Each iteration stores its outcome. The next iteration recalls what scored highest. The system compounds in intelligence across runs at zero extra token cost.

Multi-tenant — one key, many users
pythonfrom becomer_agents import AgentNamespace

Alice's agent

alice = AgentNamespace(api_key, task_id="app", role="alice-123")
alice.store("Alice prefers TypeScript and dark mode")

Bob's agent — completely isolated

bob = AgentNamespace(api_key, task_id="app", role="bob-456")
bob.recall("preferences") # → [] — can't see Alice's memories
Isolation enforced at database level. One master key covers your entire user base.

The honest part
LOCOMO multi-hop scores 59.6% for BECOMER vs 93.3% for Mem0. That gap is real. Multi-hop questions require bridging terms — "Vancouver" → "Canada" — that aren't in stored content. Mem0 runs an LLM reasoning pass to bridge that gap. BECOMER returns the retrieved context and your LLM bridges it. Different architecture, different trade-off.
If you need inference inside the memory layer — use Mem0. If you need zero token cost, any LLM, and shared agent memory — use BECOMER.

The backend agnostic part
The framework works with BECOMER's retrieval engine out of the box. Bring your own backend if you prefer — the framework is backend agnostic. The namespace pattern, pipeline architecture, and multi-agent coordination logic all have value independent of the API.

Getting started
pythonpip install becomer-agents
export BECOMER_API_KEY=bcm_your-key-here
python examples/demo.py
Free API key — 1,000 calls/month: becomer.net/signup.html
GitHub: github.com/Becomer-net/Becomer-Agents
Full benchmark methodology: becomer.net/benchmarks.html

Top comments (2)

Collapse
 
zep1997 profile image
Self-Correcting Systems

The zero-token retrieval architecture is the right call. Every LLM pass inside the
memory layer is a latency and cost tax that compounds fast across a multi-agent
pipeline — especially with concurrent shared namespace hits. Separating retrieval from
reasoning is architecturally cleaner than entangling them.

The LOCOMO multi-hop gap is honest and I respect you naming it directly. That gap isn't
a retrieval failure — it's a signal that multi-hop bridging belongs in the LLM layer.
BECOMER retrieves, your LLM reasons. Clean boundary.

One question the shared namespace pattern raises for me: within task-abc.shared, a
researcher agent can store something and an executor agent retrieves and acts on it.
The identity isolation is clearly enforced (alice vs bob). But is there anything in the
architecture that determines which stored memories are authorized to govern which
action types? The namespace controls who can read and write. But authorization — "is
this retrieved item allowed to trigger an execute-class action?" — seems like a
separate question.

Asking because I've been working on exactly this from the other direction: attribution
traces that check whether a retrieved memory's metadata is sufficient to authorize the
action taken on it. Your architecture actually makes this tractable in a way entangled
systems can't — you can't run a meaningful authorization gate if retrieval and
reasoning happen in the same LLM pass. The split you've built is the prerequisite.

The namespace pattern plus explicit authority metadata per stored item might be closer
to a complete solution than either approach alone. Following the project.

Collapse
 
becomernet profile image
Becomer.net

You've identified the exact gap accurately. Namespace isolation handles identity — alice vs bob, task-abc.researcher
vs task-abc.executor — but there's no authorization metadata at the memory item level. A retrieved fact in
task-abc.shared carries no signal about what action classes it's permitted to trigger. The namespace says who can read
it, not what it can authorize.

The attribution trace framing is the right abstraction for this. The separation between retrieval and reasoning is the
prerequisite you named — if those happen in the same LLM pass you can't meaningfully inspect the causal chain between
"what was retrieved" and "what action was taken." The split makes the gate tractable. The gate itself isn't built
here yet.

The natural extension is per-item metadata at store time — action_scope or authority_level alongside content.
Retrieval returns the item plus its authority metadata, and the orchestration layer checks before acting. The hard
problem is that authority metadata needs to be trustworthy at store time, which means the storing agent must already
be authorized to grant the authority it's claiming — otherwise you've just moved the problem one level up.

What does your attribution trace approach use as ground truth for authorization at store time? Caller identity,
namespace position, or something external to the memory layer entirely?