DEV Community

Julien L for WiScale

Posted on

Your AI agent forgets. Mine doesn't - and it works on a plane, in a hospital, with wifi off.

Six months ago you recommended switching your client's invoicing tool. Last week they asked why. You have no idea - the conversation happened in three meetings, a Slack thread, and a spreadsheet comparison no one archived. Your AI assistant is useless here too: it only knows what you paste into the prompt.

This is not a context-window problem. It is a memory architecture problem.

Why vector search alone is not enough

Most "persistent memory" solutions for LLMs work by storing past exchanges as text chunks and retrieving them by cosine similarity. Ask "what did we decide about the invoicing tool?" and a chunk mentioning the decision floats to the top - if your query looks like the answer.

It breaks the moment you ask why. The reason the CFO pushed back on the original tool was buried in a budget meeting note that shares no words with "invoicing decision". Pure vector search is blind to it by construction.

What you actually need is three distinct memory structures - the same ones cognitive science has described since the 1970s:

+-----------------+------------------------------+----------------------------------+
| Type            | What it stores               | Answers                          |
+-----------------+------------------------------+----------------------------------+
| Semantic        | Facts, decisions             | What? Why? What is our position? |
| Episodic        | Events with a timestamp      | When? Who said what?             |
| Procedural      | Learned patterns + steps     | How do we usually handle this?   |
+-----------------+------------------------------+----------------------------------+
Enter fullscreen mode Exit fullscreen mode

velesdb-memory is an MCP server that exposes exactly these three subsystems - as five high-level tools your agent can call without knowing anything about vectors, graphs, or databases.


What velesdb-memory actually is

It is a single binary that speaks the Model Context Protocol over stdio. Client and server run on the same machine. Memory never leaves your machine.

+------------------+        stdio/MCP        +-------------------+
| Claude Code      |  ───────────────────►   | velesdb-memory    |
| Cursor           |                         | (one binary)      |
| Cline / Zed      |  ◄───────────────────   |                   |
| Codex / opencode |                         | vector + graph    |
+------------------+                         | + columnar store  |
                                             +-------------------+
                                                      │
                                               ~/.velesdb-memory/
                                               (stays on your disk)
Enter fullscreen mode Exit fullscreen mode

Five tools, all JSON:

Tool What it does
remember store a fact, optionally tagged and linked to other memories
recall semantic search, with optional metadata filter
relate create a typed edge between two memories
forget delete a memory by id
why recall + multi-hop graph traversal (the differentiator)

There is a sixth tool, remember_extracted, that passes raw text through a local LLM and builds the graph automatically - but you do not need it to understand the core idea.


A scenario: Sofia, management consultant

Sofia advises companies on digital transformation. She runs three to five simultaneous engagements, each lasting six months. She needs her AI assistant to remember:

  • Strategic decisions and their rationale (semantic)
  • Key conversations: the CFO meeting, the risk workshop, the board presentation (episodic)
  • Learned procedures: how she runs a vendor selection, her risk assessment checklist (procedural)

Let us build her memory layer.


Setting up

# build the binary (Rust toolchain required)
cargo build --release -p velesdb-memory

# or: cargo install velesdb-memory (when published on crates.io)
Enter fullscreen mode Exit fullscreen mode

The default build is dependency-free. For real semantic recall, build with Ollama support:

cargo build --release -p velesdb-memory --features ollama
ollama pull all-minilm
Enter fullscreen mode Exit fullscreen mode

Then configure your client. For Claude Code:

claude mcp add velesdb-memory \
  --env VELESDB_MEMORY_PATH="$HOME/.velesdb-memory" \
  -- /path/to/velesdb-memory
Enter fullscreen mode Exit fullscreen mode

For Cursor (~/.cursor/mcp.json), Cline (cline_mcp_settings.json), or any other MCP client:

{
  "mcpServers": {
    "velesdb-memory": {
      "command": "/path/to/velesdb-memory",
      "env": { "VELESDB_MEMORY_PATH": "/home/you/.velesdb-memory" }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Zed uses a slightly different key (context_servers), Codex uses codex mcp add or a TOML config - full snippets in the README.

Once configured, the agent discovers the tools automatically. No restarts, no plugins, no API keys.


What the agent does with the tools

Storing a strategic decision

At the end of a vendor selection meeting, Sofia's agent calls:

// remember - store a fact with metadata and a typed link to another memory
remember {
  "fact": "We recommended Pennylane over Sage for Acme Corp invoicing because Sage lacks multi-currency support and Pennylane's API team offered a 6-month implementation guarantee.",
  "metadata": { "project": "acme-corp", "type": "decision", "author": "sofia" },
  "links": [ { "target": 4820193847, "relation": "follows_from" } ]
}
 { "id": 9876543210 }
Enter fullscreen mode Exit fullscreen mode

The returned id is stable and derived from the content - storing the same fact twice is idempotent.

Recording a key conversation

// remember - the CFO meeting that triggered the re-evaluation
remember {
  "fact": "CFO at Acme Corp: budget cap is 12k EUR per year. Sage renewal is 14.8k. This is the hard constraint that ruled out Sage.",
  "metadata": { "project": "acme-corp", "type": "meeting", "date": "2026-01-15" },
  "links": [ { "target": 9876543210, "relation": "motivated" } ]
}
 { "id": 4820193847 }
Enter fullscreen mode Exit fullscreen mode

Storing a learned procedure

remember {
  "fact": "Vendor selection for SME finance tools - step 1: map hard constraints (budget, compliance, integration). Step 2: shortlist to 3. Step 3: run a 2-week pilot on live data. Step 4: present with a documented decision matrix.",
  "metadata": { "type": "procedure", "domain": "vendor-selection" }
}
 { "id": 1122334455 }
Enter fullscreen mode Exit fullscreen mode

Linking things together

After the client signed the contract:

relate {
  "from": 9876543210,
  "to": 4820193847,
  "relation": "decided_in"
}
 { "edge_id": 7 }
Enter fullscreen mode Exit fullscreen mode

The why query: what changes everything

Six months later, Acme Corp asks Sofia why they switched invoicing tools. She asks her agent:

why {
  "decision": "why did we switch from Sage to Pennylane",
  "filter": { "project": "acme-corp" },
  "max_hops": 2
}
Enter fullscreen mode Exit fullscreen mode

The response:

{
  "nodes": [
    { "id": 9876543210, "hop": 0, "content": "We recommended Pennylane over Sage... multi-currency... 6-month implementation guarantee." },
    { "id": 4820193847, "hop": 1, "content": "CFO at Acme Corp: budget cap is 12k EUR... Sage renewal is 14.8k. This is the hard constraint that ruled out Sage." }
  ],
  "edges": [
    { "from": 9876543210, "to": 4820193847, "relation": "decided_in" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

A plain recall query would have returned the decision text (hop 0, shares words with the query). It would not have returned the CFO meeting note (hop 1) - that note contains "budget cap" and "14.8k", no words in common with "why did we switch from Sage to Pennylane".

The graph reaches it because the relation exists. That is the gap.


How big is the gap, exactly?

The why wedge is not a claim - it is measured. The repo ships three reproducible benchmarks with no LLM in the scoring loop (pure retrieval metrics on public datasets):

Multi-hop recall (graph engine) - HotpotQA, 3000 dev questions:

vector only:   both bridge facts recalled  →  baseline
vector + graph: both bridge facts recalled →  +7.2 percentage points on bridge questions
Enter fullscreen mode Exit fullscreen mode

The win replicates on 2WikiMultiHopQA (+3.1pp on bridged types).

Time-scoped recall (ColumnStore) - TimeQA (real Wikipedia bios):

vector only:   gold-sentence recall  →  baseline
vector + filter: year-range predicate →  +9.7 percentage points
Enter fullscreen mode Exit fullscreen mode

A pure cosine score cannot distinguish "she won the award in 1987" from "she won the award in 2003". A numeric filter can.

The engines compound (tri-engine benchmark):

On a task that requires both multi-hop traversal and time-scoped filtering:

graph alone:       +7.2pp
columnstore alone: +9.7pp
both together:     +29pp  (more than the sum)
Enter fullscreen mode Exit fullscreen mode

Run any of these yourself:

# multi-hop benchmark
cargo run --release -p velesdb-memory --example bench_multihop

# time-scoped benchmark
cargo run --release -p velesdb-memory --example timeqa
Enter fullscreen mode Exit fullscreen mode

What "offline" means in practice

The default binary has zero network dependencies. The memory store is a directory on your disk (~/.velesdb-memory/). The binary is around 9 MB.

With the default hash embedder, recall is keyword-style (deterministic, good for why because the graph does the heavy lifting). For real semantic recall, add Ollama - the model runs locally, so memory still never reaches the internet:

VELESDB_MEMORY_EMBEDDER=ollama \
VELESDB_MEMORY_OLLAMA_MODEL=all-minilm \
  /path/to/velesdb-memory
Enter fullscreen mode Exit fullscreen mode

This is not "privacy-preserving mode" - it is the only mode. There is no cloud path.


The auto-extraction shortcut

If you do not want to call remember and relate manually, the remember_extracted tool does it in one step. It sends raw text to a local LLM (via Ollama), extracts individual facts, wires the entity graph automatically, and stores everything:

remember_extracted {
  "text": "Met Yannick from the Acme procurement team. He confirmed the board approved the Pennylane migration. The CFO's concern about training cost has been resolved by the vendor's onboarding package."
}
 { "ids": [11122233, 44455566, 77788899] }
Enter fullscreen mode Exit fullscreen mode

Three facts stored, entity relationships auto-wired, all reachable by why. To enable it:

cargo build --release -p velesdb-memory --features extract
VELESDB_MEMORY_EXTRACTOR=ollama \
VELESDB_MEMORY_EXTRACTOR_MODEL=qwen3:8b \
  /path/to/velesdb-memory
Enter fullscreen mode Exit fullscreen mode

The standard build does not include this - it keeps the default binary tiny and offline.


Using the Python library directly

If you prefer to embed memory into your own application rather than use the MCP server, the same engine is available as a Python package:

import velesdb
import numpy as np

db = velesdb.Database("./sofia_memory")
memory = db.agent_memory(384, snapshot_dir="./sofia_memory/snapshots")  # 384-dim embeddings

# store a fact
def embed(text):
    # use sentence-transformers, Ollama, or any embedder
    from sentence_transformers import SentenceTransformer
    m = SentenceTransformer("all-MiniLM-L6-v2")
    return m.encode(text, normalize_embeddings=True).tolist()

memory.semantic.store(
    id=1,
    content="Pennylane chosen over Sage: multi-currency support + budget fits 12k EUR cap",
    embedding=embed("Pennylane Sage invoicing decision")
)

# query
results = memory.semantic.query(embed("why Pennylane"), top_k=3)
for r in results:
    print(f"[{r['score']:.2f}] {r['content']}")

# episodic: the CFO meeting
import time
memory.episodic.record(
    event_id=2,
    description="CFO confirmed: Sage renewal quote is 14.8k, over 12k cap",
    timestamp=int(time.time()) - 30 * 86400,  # 30 days ago
    embedding=embed("CFO budget constraint Sage renewal")
)

# procedural: a reusable pattern
memory.procedural.learn(
    procedure_id=3,
    name="SME vendor selection",
    steps=["map hard constraints", "shortlist to 3", "run 2-week pilot", "present decision matrix"],
    embedding=embed("vendor selection SME procedure"),
    confidence=0.9
)

# reinforce if the pattern worked well
memory.procedural.reinforce(procedure_id=3, success=True)

# snapshot to survive restarts
memory.snapshot()
Enter fullscreen mode Exit fullscreen mode
pip install velesdb
python3 -c "import velesdb; print(velesdb.__version__)"
# 3.4.0
Enter fullscreen mode Exit fullscreen mode

Using the Node.js package directly

The same engine ships as an npm package with prebuilt platform binaries — no Rust toolchain needed at install time:

npm install @wiscale/velesdb-memory-node
Enter fullscreen mode Exit fullscreen mode

The API is a single async class — no subsystems, no embeddings to manage yourself:

import { MemoryService } from '@wiscale/velesdb-memory-node'

// Open (or create) a persistent store. Sync factory, all methods are async.
const mem = MemoryService.open('./sofia_memory', 'hash')
// Use 'ollama' as second arg for real semantic recall (requires Ollama running locally)

// Store a fact — returns its id as a decimal string
const decisionId = await mem.remember(
  'We recommended Pennylane over Sage: multi-currency support + 12k EUR budget cap',
  [],
  { project: 'acme-corp', type: 'decision' }
)

// Store the reason and link it
const reasonId = await mem.remember(
  'CFO confirmed: Sage renewal quote is 14.8k EUR, over the 12k annual cap',
  [],
  { project: 'acme-corp', type: 'meeting', date: '2026-01-15' }
)

// Typed link: decision was motivated by the CFO meeting
await mem.relate(decisionId, reasonId, 'decided_in')

// Plain recall — vector similarity
const hits = await mem.recall('why Pennylane', 3)
hits.forEach(h => console.log(`[${h.score.toFixed(2)}] ${h.content}`))

// why() — vector seed + multi-hop graph traversal
const { nodes, edges } = await mem.why('why did we switch from Sage to Pennylane', 2)
nodes.forEach(n => console.log(`hop ${n.hop}: ${n.content}`))
// hop 0: the decision  →  hop 1: the CFO meeting (no shared words — graph found it)
Enter fullscreen mode Exit fullscreen mode

One feature is exclusive to the Node.js binding: recallWhere, which combines vector search with ColumnStore range filters in a single call — no Python counterpart:

// Recall meetings from the last 90 days only
const recent = await mem.recallWhere(
  'budget constraint',
  [{ field: 'date', op: 'ge', value: '2026-01-01' }],
  5
)
Enter fullscreen mode Exit fullscreen mode

What it is not

velesdb-memory is a single-process embedded library. It is not designed for concurrent access from multiple processes, nor for storing millions of memories on behalf of many users. It fits one agent, one user, one machine - which is exactly the shape the use cases above require.

Extraction quality depends on the local model you point remember_extracted at. A smaller model extracts noisier facts than a larger one. The graph and the retrieval engine are solid; the extraction layer is as good as the model you bring.


Getting started

git clone https://github.com/cyberlife-coder/VelesDB
cd VelesDB
cargo build --release -p velesdb-memory
./target/release/velesdb-memory --help
Enter fullscreen mode Exit fullscreen mode

Documentation and examples are at velesdb.com. If this was useful, a star on the GitHub repo helps other developers find the project, and we are always looking for partners with local-first or sovereign data requirements - details on velesdb.com.


Which use case resonates most with you - knowledge work (consulting, research, legal), coding assistance, or something else entirely? Drop a comment below.

Top comments (0)