DEV Community: Becomer.net

Opensource agent framework with persistent memory with 0 token costs

Becomer.net — Tue, 02 Jun 2026 04:55:20 +0000

Becomer.net

Jun 2

I Built a Memory API That Beats Mem0 on LongMemEval Without Using a Single LLM Token

#showdev #agents #api #llm

4 min read

I Built a Memory API That Beats Mem0 on LongMemEval Without Using a Single LLM Token

Becomer.net — Tue, 02 Jun 2026 04:19:02 +0000

The problem I kept hitting
Every time I built a multi-agent pipeline I hit the same four walls.
Memory dies when the process ends. Agents can't share context without message passing. Every recall burns 500-7,000 tokens on an LLM reasoning pass. And memory is locked to one LLM provider — your GPT app and your Claude app can't share the same user context.
I spent months building something that fixes all four.

What I built
BECOMER is a persistent memory API for AI agents. Zero tokens per recall. Works with any LLM. Agents share memory across a namespace without coordination code.
The open source client framework — becomer-agents — attaches to any existing agent stack without replacing it. LangChain, CrewAI, AutoGen, LlamaIndex, LangGraph. You don't rebuild anything. You just add memory to what you already have.
pythonpip install becomer-agents

The benchmark result
94.4% on LongMemEval. Full n=500, no sampling, string-based scoring — no LLM judge inflating the numbers.
Mem0 scores 93.4% on the same benchmark. They spend approximately 6,787 tokens per query to get there. BECOMER spends zero.
SystemLongMemEvalTokens/queryBECOMER94.4%0Mem0 v293.4%~6,787Standard RAG~75-80%0
On LOCOMO BECOMER scores 69.5% retrieval only. Mem0 scores 91.6% but runs an LLM reasoning pass to get there. The gap is architectural not a retrieval failure — BECOMER retrieves, your LLM reasons on top. That's by design.
Full methodology, 30 iterations, zero regressions: becomer.net/benchmarks.html

How the architecture works
Most memory systems run an LLM under the hood. Every recall is actually an LLM call that extracts, reasons, and returns. That's why they burn thousands of tokens per query.
BECOMER is a purpose built retrieval engine. No LLM in the memory layer. Storage and retrieval happen in our engine. Your LLM receives perfectly prepared context and reasons on top of it.
The result — zero token cost at retrieval, ~150ms P50, and memory that persists across sessions, LLMs, and processes.

The multi-agent namespace pattern
This is the part I'm most proud of.
Every agent gets a namespace keyed by task ID and role.
task-abc.researcher ← private to researcher
task-abc.executor ← private to executor

task-abc.shared ← readable and writable by all agents
A researcher agent stores findings. An executor agent recalls them. No message passing. No coordination code. Different LLMs can hit the same namespace simultaneously.
pythonfrom becomer_agents import MultiAgentPipeline

def researcher(task, own_ns, shared_ns):
own_ns.store("API endpoint: POST /v1/payments, OAuth2 bearer")
own_ns.store("Rate limit: 100 req/s")
return "Research complete"

def executor(task, own_ns, shared_ns):
# Recall what researcher found — zero tokens, no message passing
findings = shared_ns.recall("payment API details", top_k=5)
return "Plan written"

pipeline = MultiAgentPipeline(
api_key=os.environ["BECOMER_API_KEY"],
task_id="payments-task-001",
roles=["researcher", "executor", "reviewer"],
)

LangChain drop-in — 3 lines
pythonfrom becomer_agents import BecomerMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=BecomerMemory(api_key="your-key")
)
Memory persists across sessions automatically. Works with CrewAI, AutoGen, LlamaIndex — same pattern.

Self-improving agents
pythonfrom becomer_agents import SelfImprovingPipeline

pipeline = SelfImprovingPipeline(
api_key=os.environ["BECOMER_API_KEY"],
task_id="optimizer-001",
)

def my_agent(task, context):
# context = what worked in previous iterations
approach = "zero-shot" if not context else "few-shot + CoT"
score = run_eval(task, approach)
return {"approach": approach, "score": score}

for i in range(5):
result = pipeline.run_iteration(
task="classify customer sentiment",
fn=my_agent,
)
Each iteration stores its outcome. The next iteration recalls what scored highest. The system compounds in intelligence across runs at zero extra token cost.

Multi-tenant — one key, many users
pythonfrom becomer_agents import AgentNamespace

Alice's agent

alice = AgentNamespace(api_key, task_id="app", role="alice-123")
alice.store("Alice prefers TypeScript and dark mode")

Bob's agent — completely isolated

bob = AgentNamespace(api_key, task_id="app", role="bob-456")
bob.recall("preferences") # → [] — can't see Alice's memories
Isolation enforced at database level. One master key covers your entire user base.

The honest part
LOCOMO multi-hop scores 59.6% for BECOMER vs 93.3% for Mem0. That gap is real. Multi-hop questions require bridging terms — "Vancouver" → "Canada" — that aren't in stored content. Mem0 runs an LLM reasoning pass to bridge that gap. BECOMER returns the retrieved context and your LLM bridges it. Different architecture, different trade-off.
If you need inference inside the memory layer — use Mem0. If you need zero token cost, any LLM, and shared agent memory — use BECOMER.

The backend agnostic part
The framework works with BECOMER's retrieval engine out of the box. Bring your own backend if you prefer — the framework is backend agnostic. The namespace pattern, pipeline architecture, and multi-agent coordination logic all have value independent of the API.

Getting started
pythonpip install becomer-agents
export BECOMER_API_KEY=bcm_your-key-here
python examples/demo.py
Free API key — 1,000 calls/month: becomer.net/signup.html
GitHub: github.com/Becomer-net/Becomer-Agents
Full benchmark methodology: becomer.net/benchmarks.html

How I built a zero-token memory layer for LLMs (and why it outperforms vector store approaches)

Becomer.net — Mon, 01 Jun 2026 14:35:50 +0000

If you've built an AI chatbot or agent, you've hit the same problem: the LLM forgets everything between sessions. The standard solution is to stuff your conversation history into a vector store and retrieve relevant chunks before each call. It works — but it has a hidden cost.

The token problem nobody talks about

Every popular memory solution — mem0, Zep, Langchain ConversationSummaryMemory — runs an LLM under the hood when you recall. That's anywhere from 500 to 7,000 tokens per recall call, on top of your actual LLM call.

For a chatbot with 1,000 daily active users doing 10 messages each, that's 10,000 recall calls × ~2,000 tokens = 20 million extra tokens per day. Before your LLM has said a single word.

The retrieval-only approach

I built BECOMER around a different idea: semantic retrieval using embeddings, no LLM inside the memory layer. Store → embed → index → retrieve. Your LLM receives the retrieved context and reasons over it — exactly what it's already doing.

from becomer import Client

mem = Client("bcm_your-api-key")

# Before your LLM call
context = mem.recall("what does this user prefer?", top_k=5)

# Inject into your system prompt
system_prompt = f"User context:\n{chr(10).join(context)}"

# After your LLM call
mem.store("User asked about Python decorators, found list comprehension more intuitive")

Benchmark results

Tested against LongMemEval (n=500) — the academic standard for conversational memory:

System	Score	Tokens/recall
BECOMER	94.4%	0
mem0	93.4%	~6,787
Hindsight	91.4%	~6,787

The honest caveat: on LOCOMO's multi-hop reasoning questions, mem0 scores 91.6% vs our 69.5%. Their system adds an LLM reasoning pass over retrieved results. We return the context; your LLM reasons. For most agent use cases where you control the final LLM call, this gap disappears.

Multi-tenant in two lines

For developers building apps with multiple end-users, pass a user_id:

# Each user gets a fully isolated namespace
mem_alice = Client("bcm_key", user_id="alice-123")
mem_alice.store("Alice prefers TypeScript and dark mode")

mem_bob = Client("bcm_key", user_id="bob-456")
mem_bob.recall("preferences")  # → [] — completely isolated

Isolation is enforced at the database layer, not just application code. One master key covers your entire user base.

Agent use cases

The pattern that makes BECOMER useful beyond chatbots is shared namespaces for multi-agent systems:

# Research agent (GPT-4o) stores findings
mem = Client("bcm_key", user_id="task-abc")
mem.store("API endpoint: POST /v2/payments, OAuth2")
mem.store("Rate limit: 100 req/min")

# Executor agent (Claude) — different process, same namespace
ctx = Client("bcm_key", user_id="task-abc").recall("payment API details")
# → gets exactly what the research agent found
# No message passing. No state files. No coordination code.

Self-improving systems work the same way: store every attempt with its outcome, recall what worked before the next run.

What's available today

REST API
Python SDK: pip install becomer
JS/Node SDK: npm install @becomerpackage/sdk (zero deps, TypeScript types)
MCP: works with Claude Desktop and Cursor, set BECOMER_API_KEY and go
Framework adapters: LangChain, LlamaIndex, LangGraph, CrewAI, AutoGen

Free tier: 1,000 calls/month. Pro: $12/month.

https://becomer.net — full docs, benchmarks, and free API key.

I'm curious how others are handling the token cost problem for memory. What approaches have you found that work at scale?