Most agent memory systems are bags of strings. Markdown files, JSONL
journals, key-value stores with vector embeddings bolted on. They work
for demos. They break when you need to answer questions like "what did
the agent know last Tuesday?" or "why does it think I prefer tabs over
spaces?"
We wanted something better. After
surveying how five products handle persistent memory,
we designed OpenWalrus's memory around a single idea: everything is a
temporal knowledge graph.
No SOUL.md. No User.toml. No journal files. One embedded database.
Six tools. And a schema that grows with the agent's capabilities —
without touching framework code.
The case for graphs
Agent memory has structure. "User prefers async/await" isn't a string —
it's a relationship between a user entity and a coding pattern entity.
"The auth system uses JWT RS256" connects a system component to an
implementation decision. "User is on vacation until March 15" has a
temporal bound.
Flat files lose this structure. Vector databases lose it too — they can
find semantically similar text, but they can't traverse relationships
or answer temporal queries. The research is clear on where graphs win:
[Interactive chart — see original post]
The data comes from multiple sources:
FalkorDB benchmarks
for industry/healthcare/schema-bound,
Zep's LongMemEval for temporal (+38.4%
improvement), and a
real-world migration case study
for multi-hop (43% → 91%).
The pattern is consistent: graph RAG dominates on multi-hop, temporal,
and relationship queries. Vector RAG is often sufficient for simple
single-hop factual lookup — but agent memory is rarely simple. Agents
need to traverse decisions, track changing preferences, and connect
entities across sessions.
The critical finding from
Zep's temporal knowledge graph paper:
bi-temporal tracking (separating when a fact was recorded from when
it was actually true) achieves 18.5% higher accuracy and ~90%
lower latency compared to vector-only retrieval on temporal reasoning
tasks.
The landscape of graph-based memory
Graph-based agent memory has gone from academic curiosity to funded
infrastructure in 2025-2026. Here's where the major systems stand:
[Interactive chart — see original post]
| System | Approach | Key metric | Scale |
|--------|----------|-----------|-------|
| Graphiti (Zep) | Temporal KG, Neo4j backend, bi-temporal | 94.8% DMR accuracy, 300ms P95 | 20K+ stars |
| Mem0 | Vector + Graph variants, hierarchical | 68.4% LOCOMO, 0.48s P95 | $24M raised, 41K stars |
| Cognee | KG triplets + vector (LanceDB), CoT retrieval | 92.5% human-like correctness | $7.5M seed, 70+ companies |
| Microsoft GraphRAG | Hierarchical community summaries | 72-83% comprehensiveness | 29.8K stars |
| LightRAG | Lightweight graph RAG | ~30% latency reduction | EMNLP 2025 |
Two recent papers push the field further:
- MAGMA (Jan 2026) maintains four orthogonal graph views (semantic, temporal, causal, entity) and achieves +45.5% reasoning accuracy with 95%+ token reduction.
- SimpleMem (Jan 2026) uses LanceDB with semantic lossless compression, achieving 43.24% LOCOMO F1 with only 531 tokens/query — vs Mem0's 34.20 F1 with 973 tokens.
The trend is unmistakable. Gartner named Knowledge Graphs a "Critical
Enabler" with immediate GenAI impact. The
ICLR 2026 MemAgents Workshop
is dedicated entirely to memory for LLM-based agentic systems. A
comprehensive survey from DEEP-PolyU
(Feb 2026) covers the full taxonomy of graph-based agent memory.
The important caveat
Graph RAG isn't universally better. A
systematic evaluation (Feb 2025)
found that "GraphRAG frequently underperforms vanilla RAG on many
real-world tasks." The advantage concentrates in temporal reasoning,
multi-hop inference, and relationship queries. For simple factual
lookup, vector RAG often wins on both speed and accuracy.
GraphRAG also averages 2.4x higher latency than vector RAG. And
the original Microsoft GraphRAG indexing cost ~$33K for large datasets —
though LazyGraphRAG
reduced this to 0.1% of that cost while winning all 96 comparisons
on 5,590 AP news articles.
This is why we chose a hybrid approach — graph traversal for
structural queries, vector similarity for semantic search, combined in
a single pipeline.
Why LanceDB + lance-graph
Every graph memory system in the landscape above requires a separate
server — Neo4j for Graphiti, various backends for Mem0, cloud services
for Microsoft GraphRAG. That's a non-starter for a single-binary
runtime.
We needed a graph database that:
- Embeds in-process (no separate server — walrus is a single binary)
- Is Rust-native (walrus is written in Rust)
- Supports both vector search and graph traversal
- Provides versioning at the storage layer
LanceDB +
lance-graph checks every
box.
[Interactive chart — see original post]
LanceDB: battle-tested at scale
LanceDB isn't a toy. It's backed by $41M in funding (Series A led
by Theory Ventures, June 2025) and used in production by Midjourney,
Netflix, Uber, ByteDance, and Character.AI — handling billion-scale
vector search.
| Metric | Value |
|---|---|
| GitHub stars | 15.2K combined (lancedb + lance format) |
| PyPI downloads | ~2.6M/month |
| Monthly contributors | 40+, from Uber, Netflix, Hugging Face, ByteDance |
| Production users | Midjourney, Netflix, Uber, Character.AI, Harvey |
| Vector search latency | 25ms typical, 3ms at 0.9 recall |
| File read throughput | 6-9x faster than Parquet |
| Storage IOPS | 1.5M peak on NVMe |
The Lance columnar format
(SDK 1.0 since Dec 2025) provides the foundation: ACID transactions,
zero-copy schema evolution, and immutable versioning at the storage
layer. You can query "what did the agent know yesterday?" without
journal files — Lance versioning handles it natively.
LanceDB is the default vector database for
AnythingLLM,
powers local semantic search in
Continue.dev
(40% improvement in auto-completion relevance), and is the default
vector store for
Microsoft GraphRAG.
lance-graph: graph queries over columnar data
lance-graph adds a
Cypher query engine on top — graph nodes and edges stored as Lance
tables, with hybrid GraphRAG queries built in.
| Capability | Details |
|---|---|
| Query language | Cypher (read-only subset with MATCH, WHERE, WITH, aggregation) |
| Vector search | Native ANN, L2/Cosine/Dot metrics |
| GraphRAG |
execute_with_vector_rerank() — graph filter then vector rank |
| Basic filter latency | ~680µs (100 items) to ~743µs (1M items) |
| Single-hop expand | ~3.70ms (1M nodes) |
| Two-hop expand | ~6.16ms (1M nodes) |
| Language | Rust 83.7%, Python bindings via PyO3 |
The sub-linear latency growth — 680µs to 743µs for a 10,000x data
increase on basic filters — reflects DataFusion's columnar batch
processing and predicate pushdown.
Honest assessment of maturity
lance-graph is young. v0.5.3, ~128 GitHub stars, still an
incubating subproject under the Lance
governance model. APIs may change without notice. No confirmed
production deployments of lance-graph specifically (though LanceDB
itself is heavily battle-tested).
The query engine is read-only — no CREATE, DELETE, or MERGE in
Cypher. Writes go through the LanceGraphStore Python/Rust API. For
an agent runtime that handles writes through its own tools (not Cypher),
this constraint doesn't matter. For other use cases, it might.
We're betting on a young project with a strong parent ecosystem. The
risk is real, but the alternative — requiring users to run a Neo4j
server alongside walrus — contradicts our
single-binary philosophy.
Comparison with alternatives
| LanceDB + lance-graph | SQLite + sqlite-vec | Neo4j + Graphiti | |
|---|---|---|---|
| Deployment | Embedded, in-process | Embedded, in-process | Separate server |
| Graph queries | Cypher (read-only) | Manual SQL joins | Full Cypher |
| Vector search | Native ANN | sqlite-vec extension | Plugin |
| Temporal tracking | Lance versioning | Manual | Built-in |
| Rust native | Yes | Via bindings | No |
| Fits single binary | Yes | Yes | No |
| Maturity | Early (v0.5.3, $41M ecosystem) | Very mature | Very mature (13K+ stars) |
SQLite + sqlite-vec is the pragmatic alternative — mature, embedded,
battle-tested. But graph queries require manual SQL joins, and there's
no native hybrid GraphRAG pattern. For traversing relationships and
searching semantically in a single query, the graph-native approach
wins.
Neo4j with Graphiti is the most
capable graph memory system available — 94.8% DMR accuracy, full Cypher,
production-proven. But it requires a separate server — a non-starter
for a single-binary runtime.
Everything is the graph
Agent identity, user preferences, conversation history, extracted facts,
relationships between entities — all stored in two tables in one LanceDB
database: entities and relations. Compacted conversation summaries
live in a third table: journals.
The framework ships seven built-in entity types:
-
identity— the agent's identity and personality. Queried at session start and injected into the system prompt. This is what SOUL.md used to be. -
profile— the user's profile. Also injected at session start. This replaces User.toml. -
preference— user preferences. Linked toprofilevia relations. -
fact— general facts the agent has learned. -
person,event,concept— structured entity types for people, events, and abstract concepts the agent encounters.
Entity types are configurable — you can add domain-specific types in
walrus.toml without touching framework code.
Six tools, no queries
The agent interacts with memory through six tools. It never writes Cypher
or SQL — the framework handles storage and retrieval internally.
| Tool | What it does |
|---|---|
remember |
Store a typed entity (type, key, value). Upserts with FTS indexing. |
recall |
Full-text search across entities, optionally filtered by type. Returns top-K matches. |
relate |
Create a directed edge between two existing entities. |
connections |
Traverse the graph from a given entity — 1-hop via lance-graph Cypher, optionally filtered by relation or direction. |
compact |
Trigger compaction: summarize the conversation, embed the summary, store as a journal entry. |
distill |
Semantic search over past journal summaries. Find relevant context from previous sessions. |
Why not expose raw Cypher? Because
Text2Cypher is unreliable even
with frontier LLMs — they hallucinate syntax and miss schema constraints.
The framework knows the schema because it created it.
Two retrieval paths
The six tools reflect two distinct retrieval mechanisms:
Graph + FTS path (recall, connections): Entity nodes are stored
with full-text search indexing. recall runs FTS on key and value
fields, agent-scoped. connections runs a 1-hop Cypher traversal via
lance-graph. These are fast (sub-millisecond) and exact.
Vector semantic path (distill): Journal entries — compaction
summaries — are embedded with all-MiniLM-L6-v2 (384 dimensions via
fastembed). distill runs ANN search over these embeddings to find
semantically similar past context.
This means different queries hit different stores:
At session start, walrus injects identity entities, profile entities, and
the three most recent journal summaries into the system prompt — giving the
agent its identity and continuity context before the first turn.
Temporal tracking
Every entity carries created_at and updated_at. Relations carry
created_at. Journal entries carry created_at and their 384-dim embedding.
This is enough to answer "what did the agent learn this session?" or "when
was this fact recorded?" — without separate journal files.
Full bi-temporal tracking (valid_from, valid_until) — the
Zep approach that achieves +38.4%
on temporal reasoning — is on the roadmap. The current implementation
tracks creation time; expiration is a future addition.
Extending the schema
The entity and relation types are open and configurable. The defaults give
you a working memory system. Your walrus.toml extends it without touching
framework code:
[memory]
entities = ["ticket", "sprint", "arch_decision"]
relations = ["blocks", "supersedes", "owned_by"]
These merge with the built-in types — the framework never loses the default
fact, preference, identity, and profile types. Extensions only add.
This follows the same pattern as everything else in walrus (see
less code, more skills): the framework ships
a working default, skills and configuration extend it, MCP servers add
entity types at runtime.
Compaction as memory formation
When the context window fills up, the agent calls compact. The runtime
summarizes the full conversation history with the LLM, embeds the summary
with all-MiniLM-L6-v2, stores it as a journal entry, and replaces the
conversation history with just that summary. The agent continues with a
clean context window and the summary as continuity.
This turns compaction from a lossy operation into a memory-formation event.
Past journal entries are searchable via distill in future sessions —
semantic search over the embedding finds relevant past context even when the
agent doesn't remember the exact session it came from.
Mem0's research shows that smart
compaction improves reasoning: 91% lower P95 latency and 90%+
token reduction compared to full-context approaches, while achieving
a 26% relative improvement in LLM-as-a-Judge scores.
What you lose, what you gain
This design has a real tradeoff: you can't cat SOUL.md anymore.
What you lose:
- Human-readable files you can open in a text editor
- Git-diffable memory changes
- The simplicity of
echo "prefer tabs" >> CLAUDE.md
What you gain:
- Queryable structure —
recallandconnectionsover a typed entity graph - Temporal tracking —
created_aton everything, full bi-temporal expiration on the roadmap - Relationship-aware traversal —
connectionsfollows edges from decisions to reasons to related entities - Semantic history —
distillfinds relevant past journal summaries across sessions - Versioning via Lance — the storage layer tracks history at the columnar level
Every product in our
memory survey that gained
developer trust stores memory in human-readable formats. We're breaking
that pattern. The bet is that queryable, structured memory is worth more
than cat-ability — and that the recall, connections, and distill
tools give the agent (and eventually the user) direct inspection paths.
References
- Zep temporal knowledge graph — bi-temporal tracking, LongMemEval benchmarks
- Mem0 production memory — LOCOMO benchmarks, graph vs vector comparison
- RAG vs GraphRAG: systematic evaluation — when graphs help and when they don't
- MAGMA: multi-graph agentic memory — four-view graph architecture
- SimpleMem: efficient memory with LanceDB — lossless compression, token efficiency
- Graph-based agent memory survey — comprehensive taxonomy (DEEP-PolyU)
- lance-graph on GitHub — the graph engine we're building on
- LanceDB docs — the vector database layer
- How AI agents remember — our survey of five products
- Less code, more skills — the design principle behind the three-layer extension model
Originally published at OpenWalrus.
Top comments (0)