clearloop for OpenWalrus

Posted on Mar 15 • Originally published at openwalrus.xyz

Graph + vector: how OpenWalrus agents remember

#ai #architecture #opensource #openwalrus

Most agent memory systems are bags of strings. Markdown files, JSONL
journals, key-value stores with vector embeddings bolted on. They work
for demos. They break when you need to answer questions like "what did
the agent know last Tuesday?" or "why does it think I prefer tabs over
spaces?"

We wanted something better. After
surveying how five products handle persistent memory,
we designed OpenWalrus's memory around a single idea: everything is a
temporal knowledge graph.

No SOUL.md. No User.toml. No journal files. One embedded database.
Six tools. And a schema that grows with the agent's capabilities —
without touching framework code.

The case for graphs

Agent memory has structure. "User prefers async/await" isn't a string —
it's a relationship between a user entity and a coding pattern entity.
"The auth system uses JWT RS256" connects a system component to an
implementation decision. "User is on vacation until March 15" has a
temporal bound.

Flat files lose this structure. Vector databases lose it too — they can
find semantically similar text, but they can't traverse relationships
or answer temporal queries. The research is clear on where graphs win:
[Interactive chart — see original post]
The data comes from multiple sources:
FalkorDB benchmarks
for industry/healthcare/schema-bound,
Zep's LongMemEval for temporal (+38.4%
improvement), and a
real-world migration case study
for multi-hop (43% → 91%).

The pattern is consistent: graph RAG dominates on multi-hop, temporal,
and relationship queries. Vector RAG is often sufficient for simple
single-hop factual lookup — but agent memory is rarely simple. Agents
need to traverse decisions, track changing preferences, and connect
entities across sessions.

The critical finding from
Zep's temporal knowledge graph paper:
bi-temporal tracking (separating when a fact was recorded from when
it was actually true) achieves 18.5% higher accuracy and ~90%
lower latency compared to vector-only retrieval on temporal reasoning
tasks.

Diagram — see original post

The landscape of graph-based memory

Graph-based agent memory has gone from academic curiosity to funded
infrastructure in 2025-2026. Here's where the major systems stand:
[Interactive chart — see original post]
| System | Approach | Key metric | Scale |
|--------|----------|-----------|-------|
| Graphiti (Zep) | Temporal KG, Neo4j backend, bi-temporal | 94.8% DMR accuracy, 300ms P95 | 20K+ stars |
| Mem0 | Vector + Graph variants, hierarchical | 68.4% LOCOMO, 0.48s P95 | $24M raised, 41K stars |
| Cognee | KG triplets + vector (LanceDB), CoT retrieval | 92.5% human-like correctness | $7.5M seed, 70+ companies |
| Microsoft GraphRAG | Hierarchical community summaries | 72-83% comprehensiveness | 29.8K stars |
| LightRAG | Lightweight graph RAG | ~30% latency reduction | EMNLP 2025 |

Two recent papers push the field further:

MAGMA (Jan 2026) maintains four orthogonal graph views (semantic, temporal, causal, entity) and achieves +45.5% reasoning accuracy with 95%+ token reduction.
SimpleMem (Jan 2026) uses LanceDB with semantic lossless compression, achieving 43.24% LOCOMO F1 with only 531 tokens/query — vs Mem0's 34.20 F1 with 973 tokens.

The trend is unmistakable. Gartner named Knowledge Graphs a "Critical
Enabler" with immediate GenAI impact. The
ICLR 2026 MemAgents Workshop
is dedicated entirely to memory for LLM-based agentic systems. A
comprehensive survey from DEEP-PolyU
(Feb 2026) covers the full taxonomy of graph-based agent memory.

The important caveat

Graph RAG isn't universally better. A
systematic evaluation (Feb 2025)
found that "GraphRAG frequently underperforms vanilla RAG on many
real-world tasks." The advantage concentrates in temporal reasoning,
multi-hop inference, and relationship queries. For simple factual
lookup, vector RAG often wins on both speed and accuracy.

GraphRAG also averages 2.4x higher latency than vector RAG. And
the original Microsoft GraphRAG indexing cost ~$33K for large datasets —
though LazyGraphRAG
reduced this to 0.1% of that cost while winning all 96 comparisons
on 5,590 AP news articles.

This is why we chose a hybrid approach — graph traversal for
structural queries, vector similarity for semantic search, combined in
a single pipeline.

Why LanceDB + lance-graph

Every graph memory system in the landscape above requires a separate
server — Neo4j for Graphiti, various backends for Mem0, cloud services
for Microsoft GraphRAG. That's a non-starter for a single-binary
runtime.

We needed a graph database that:

Embeds in-process (no separate server — walrus is a single binary)
Is Rust-native (walrus is written in Rust)
Supports both vector search and graph traversal
Provides versioning at the storage layer

LanceDB +
lance-graph checks every
box.
[Interactive chart — see original post]

LanceDB: battle-tested at scale

LanceDB isn't a toy. It's backed by $41M in funding (Series A led
by Theory Ventures, June 2025) and used in production by Midjourney,
Netflix, Uber, ByteDance, and Character.AI — handling billion-scale
vector search.

Metric	Value
GitHub stars	15.2K combined (lancedb + lance format)
PyPI downloads	~2.6M/month
Monthly contributors	40+, from Uber, Netflix, Hugging Face, ByteDance
Production users	Midjourney, Netflix, Uber, Character.AI, Harvey
Vector search latency	25ms typical, 3ms at 0.9 recall
File read throughput	6-9x faster than Parquet
Storage IOPS	1.5M peak on NVMe

The Lance columnar format
(SDK 1.0 since Dec 2025) provides the foundation: ACID transactions,
zero-copy schema evolution, and immutable versioning at the storage
layer. You can query "what did the agent know yesterday?" without
journal files — Lance versioning handles it natively.

LanceDB is the default vector database for
AnythingLLM,
powers local semantic search in
Continue.dev
(40% improvement in auto-completion relevance), and is the default
vector store for
Microsoft GraphRAG.

lance-graph: graph queries over columnar data

lance-graph adds a
Cypher query engine on top — graph nodes and edges stored as Lance
tables, with hybrid GraphRAG queries built in.

Capability	Details
Query language	Cypher (read-only subset with `MATCH`, `WHERE`, `WITH`, aggregation)
Vector search	Native ANN, L2/Cosine/Dot metrics
GraphRAG	`execute_with_vector_rerank()` — graph filter then vector rank
Basic filter latency	~680µs (100 items) to ~743µs (1M items)
Single-hop expand	~3.70ms (1M nodes)
Two-hop expand	~6.16ms (1M nodes)
Language	Rust 83.7%, Python bindings via PyO3

The sub-linear latency growth — 680µs to 743µs for a 10,000x data
increase on basic filters — reflects DataFusion's columnar batch
processing and predicate pushdown.

Honest assessment of maturity

lance-graph is young. v0.5.3, ~128 GitHub stars, still an
incubating subproject under the Lance
governance model. APIs may change without notice. No confirmed
production deployments of lance-graph specifically (though LanceDB
itself is heavily battle-tested).

The query engine is read-only — no CREATE, DELETE, or MERGE in
Cypher. Writes go through the LanceGraphStore Python/Rust API. For
an agent runtime that handles writes through its own tools (not Cypher),
this constraint doesn't matter. For other use cases, it might.

We're betting on a young project with a strong parent ecosystem. The
risk is real, but the alternative — requiring users to run a Neo4j
server alongside walrus — contradicts our
single-binary philosophy.

Comparison with alternatives

	LanceDB + lance-graph	SQLite + sqlite-vec	Neo4j + Graphiti
Deployment	Embedded, in-process	Embedded, in-process	Separate server
Graph queries	Cypher (read-only)	Manual SQL joins	Full Cypher
Vector search	Native ANN	sqlite-vec extension	Plugin
Temporal tracking	Lance versioning	Manual	Built-in
Rust native	Yes	Via bindings	No
Fits single binary	Yes	Yes	No
Maturity	Early (v0.5.3, $41M ecosystem)	Very mature	Very mature (13K+ stars)

SQLite + sqlite-vec is the pragmatic alternative — mature, embedded,
battle-tested. But graph queries require manual SQL joins, and there's
no native hybrid GraphRAG pattern. For traversing relationships and
searching semantically in a single query, the graph-native approach
wins.

Neo4j with Graphiti is the most
capable graph memory system available — 94.8% DMR accuracy, full Cypher,
production-proven. But it requires a separate server — a non-starter
for a single-binary runtime.

Everything is the graph

Agent identity, user preferences, conversation history, extracted facts,
relationships between entities — all stored in two tables in one LanceDB
database: entities and relations. Compacted conversation summaries
live in a third table: journals.

The framework ships seven built-in entity types:

identity — the agent's identity and personality. Queried at session start and injected into the system prompt. This is what SOUL.md used to be.
profile — the user's profile. Also injected at session start. This replaces User.toml.
preference — user preferences. Linked to profile via relations.
fact — general facts the agent has learned.
person, event, concept — structured entity types for people, events, and abstract concepts the agent encounters.

Entity types are configurable — you can add domain-specific types in
walrus.toml without touching framework code.

Six tools, no queries

The agent interacts with memory through six tools. It never writes Cypher
or SQL — the framework handles storage and retrieval internally.

Tool	What it does
`remember`	Store a typed entity (type, key, value). Upserts with FTS indexing.
`recall`	Full-text search across entities, optionally filtered by type. Returns top-K matches.
`relate`	Create a directed edge between two existing entities.
`connections`	Traverse the graph from a given entity — 1-hop via lance-graph Cypher, optionally filtered by relation or direction.
`compact`	Trigger compaction: summarize the conversation, embed the summary, store as a journal entry.
`distill`	Semantic search over past journal summaries. Find relevant context from previous sessions.

Why not expose raw Cypher? Because
Text2Cypher is unreliable even
with frontier LLMs — they hallucinate syntax and miss schema constraints.
The framework knows the schema because it created it.

Two retrieval paths

The six tools reflect two distinct retrieval mechanisms:

Graph + FTS path (recall, connections): Entity nodes are stored
with full-text search indexing. recall runs FTS on key and value
fields, agent-scoped. connections runs a 1-hop Cypher traversal via
lance-graph. These are fast (sub-millisecond) and exact.

Vector semantic path (distill): Journal entries — compaction
summaries — are embedded with all-MiniLM-L6-v2 (384 dimensions via
fastembed). distill runs ANN search over these embeddings to find
semantically similar past context.

This means different queries hit different stores:

Diagram — see original post

At session start, walrus injects identity entities, profile entities, and
the three most recent journal summaries into the system prompt — giving the
agent its identity and continuity context before the first turn.

Temporal tracking

Every entity carries created_at and updated_at. Relations carry
created_at. Journal entries carry created_at and their 384-dim embedding.
This is enough to answer "what did the agent learn this session?" or "when
was this fact recorded?" — without separate journal files.

Full bi-temporal tracking (valid_from, valid_until) — the
Zep approach that achieves +38.4%
on temporal reasoning — is on the roadmap. The current implementation
tracks creation time; expiration is a future addition.

Extending the schema

The entity and relation types are open and configurable. The defaults give
you a working memory system. Your walrus.toml extends it without touching
framework code:

[memory]
entities = ["ticket", "sprint", "arch_decision"]
relations = ["blocks", "supersedes", "owned_by"]

These merge with the built-in types — the framework never loses the default
fact, preference, identity, and profile types. Extensions only add.

This follows the same pattern as everything else in walrus (see
less code, more skills): the framework ships
a working default, skills and configuration extend it, MCP servers add
entity types at runtime.

Compaction as memory formation

When the context window fills up, the agent calls compact. The runtime
summarizes the full conversation history with the LLM, embeds the summary
with all-MiniLM-L6-v2, stores it as a journal entry, and replaces the
conversation history with just that summary. The agent continues with a
clean context window and the summary as continuity.

This turns compaction from a lossy operation into a memory-formation event.
Past journal entries are searchable via distill in future sessions —
semantic search over the embedding finds relevant past context even when the
agent doesn't remember the exact session it came from.

Mem0's research shows that smart
compaction improves reasoning: 91% lower P95 latency and 90%+
token reduction compared to full-context approaches, while achieving
a 26% relative improvement in LLM-as-a-Judge scores.

Diagram — see original post

What you lose, what you gain

This design has a real tradeoff: you can't cat SOUL.md anymore.

What you lose:

Human-readable files you can open in a text editor
Git-diffable memory changes
The simplicity of echo "prefer tabs" >> CLAUDE.md

What you gain:

Queryable structure — recall and connections over a typed entity graph
Temporal tracking — created_at on everything, full bi-temporal expiration on the roadmap
Relationship-aware traversal — connections follows edges from decisions to reasons to related entities
Semantic history — distill finds relevant past journal summaries across sessions
Versioning via Lance — the storage layer tracks history at the columnar level

Every product in our
memory survey that gained
developer trust stores memory in human-readable formats. We're breaking
that pattern. The bet is that queryable, structured memory is worth more
than cat-ability — and that the recall, connections, and distill
tools give the agent (and eventually the user) direct inspection paths.

References

Zep temporal knowledge graph — bi-temporal tracking, LongMemEval benchmarks
Mem0 production memory — LOCOMO benchmarks, graph vs vector comparison
RAG vs GraphRAG: systematic evaluation — when graphs help and when they don't
MAGMA: multi-graph agentic memory — four-view graph architecture
SimpleMem: efficient memory with LanceDB — lossless compression, token efficiency
Graph-based agent memory survey — comprehensive taxonomy (DEEP-PolyU)
lance-graph on GitHub — the graph engine we're building on
LanceDB docs — the vector database layer
How AI agents remember — our survey of five products
Less code, more skills — the design principle behind the three-layer extension model

Get started with OpenWalrus →

Originally published at OpenWalrus.

DEV Community

Graph + vector: how OpenWalrus agents remember

The case for graphs

The landscape of graph-based memory

The important caveat

Why LanceDB + lance-graph

LanceDB: battle-tested at scale

lance-graph: graph queries over columnar data

Honest assessment of maturity

Comparison with alternatives

Everything is the graph

Six tools, no queries

Two retrieval paths

Temporal tracking

Extending the schema

Compaction as memory formation

What you lose, what you gain

References

Top comments (0)