Jonathanfarrow

Posted on Mar 28

Agents need true temporal memory. Well, they now have it.

#ai #agents #llm #rag

Agent memory has gotten good at answering "what does the agent know?" Your agent can store that Alice lives in Berlin and retrieve it later. That part works.

The harder question is when. When did Alice move to Berlin? Where did she live before? Did her job change at the same time? And if you had asked the system last Tuesday, before it learned about the move, what would it have told you?

These are temporal questions. And they come up constantly in any agent that operates over more than a single session. Hu et al.'s 107-page survey on agent memory, StructMemEval's benchmarks, Memori's semantic triple work: the research keeps converging on the same finding. The interesting unsolved problems in agent memory are not about storage or retrieval. They are about how facts change over time, how experience consolidates into knowledge, and how an agent's understanding of the world evolves.

We built MinnsDB to make those temporal problems first-class primitives rather than things you build on top of a general-purpose store.

the ceiling of fact-based memory

Most agent memory today is what the research calls token-level factual memory: discrete text entries stored in a vector database and retrieved by similarity. It works. It handles "what does the agent know?" reasonably well.

But the research identifies two other functional categories that flat stores handle poorly: experiential memory (how does the agent improve from past actions?) and working memory (what should the agent be thinking about right now?). More fundamentally, the dynamics axis, how memory is formed, evolved, and retrieved over time, is where flat stores fall apart entirely.

Consider what happens when your agent learns a new fact that contradicts an old one. Alice moved from London to Berlin. In a flat store, you have two options: overwrite the old entry (destroying history) or keep both (creating contradictions during retrieval). Some systems add a created_at timestamp so you can sort by recency. But recency is not the same as validity. Alice's move was recorded on March 15th but she actually moved on March 1st. Two weeks of queries between those dates should have returned Berlin, but the system did not know that yet.

This is not an edge case. It is the normal state of affairs for any agent that operates over time. Facts change. The world your agent monitors evolves. And the question "what is true?" always has a hidden companion: "true as of when?"

Most approaches to this problem treat time as a column you add to your schema. A created_at field. Maybe an updated_at. If you are careful, a valid_from and valid_until pair that you maintain in application code. This gets you partway there, but every temporal question beyond "what is the most recent value?" becomes a query you have to build yourself: versioning logic, interval overlap checks, snapshot reconstruction, change detection between two points. The database stores your timestamps. It does not reason about them.

The deeper issue is that storage is not memory. You can put facts into any database. The hard part is everything that happens after: detecting when a new fact supersedes an old one, consolidating repeated experiences into generalized knowledge, noticing temporal patterns across facts, and knowing what changed between two points in time without diffing snapshots in application code. Those are memory operations, and they need to be primitives in the system, not things you build on top.

bi-temporal storage as first principle

MinnsDB takes a different position. Every edge in the knowledge graph and every row in a relational table carries two independent time dimensions from the moment it is created. You do not opt into temporal tracking. There is no temporal extension to install. Time is the foundation the rest of the system is built on.

Valid-time records when a fact was true in the real world. Alice lived in London from January 2023 to March 2024. She has lived in Berlin since March 2024.

Transaction-time records when the database learned about it. The London-to-Berlin move was recorded on March 15th. If you query the database as of March 10th, it still tells you London, because that is what it believed at that point.

This distinction matters for the same reason the research keeps coming back to temporal dynamics: agent memory is not a snapshot. It is a timeline. And a timeline needs two clocks, one for the world and one for the observer, to be complete.

-- What is true right now? (default: only active facts)
MATCH (a:Person)-[r:lives_in]->(city)
WHERE a.name = "Alice"
RETURN city.name

-- What was true on a specific date? (valid-time filter)
MATCH (a:Person)-[r:lives_in]->(city)
WHEN "2024-01-15"
WHERE a.name = "Alice"
RETURN city.name

-- What did the database believe on March 10th? (transaction-time filter)
MATCH (a:Person)-[r:lives_in]->(city)
WHEN ALL AS OF "2024-03-10T00:00:00Z"
WHERE a.name = "Alice"
RETURN city.name, valid_from(r), valid_until(r)

The WHEN clause filters on valid-time. AS OF filters on transaction-time. Compose them for full bi-temporal queries. This is compiled into the query plan at the planner stage. It is not a post-filter applied after results are already materialised.

what lives inside an edge

The storage model makes bi-temporality concrete. Every GraphEdge carries:

GraphEdge {
    source, target, edge_type,
    weight: f32,              // relationship strength
    confidence: f32,          // extraction certainty
    created_at: Timestamp,    // transaction-time (nanoseconds)
    updated_at: Timestamp,
    valid_from: Option<Timestamp>,   // when this became true
    valid_until: Option<Timestamp>,  // when this stopped being true (None = still active)
    confidence_history: TCell<f32>,  // how confidence changed over time
    weight_history: TCell<f32>,      // how weight changed over time
    properties: HashMap<String, Value>,
}

When a fact is superseded (Alice moves from London to Berlin), the old edge gets a valid_until timestamp. The new edge is created with valid_until = None. No soft-delete flags. No tombstone records. No application-level version tracking. The database maintains the full timeline as a chain of successor states.

The TCell (temporal cell) deserves attention because it solves a specific problem: tracking how a property changes over time without requiring a separate history table. It is a tiered data structure that adapts to cardinality:

TCell<V>:
    Empty                              // zero allocation for static values
    One(timestamp, value)              // single snapshot (common case)
    Small(Vec<(timestamp, value)>)     // 2-64 entries, binary search
    Large(BTreeMap<timestamp, value>)  // 65+ entries, log(n) lookup

This means you can query the confidence of a relationship at any point in its history:

MATCH (a:Person)-[r:works_with]->(b:Person)
WHEN ALL
WHERE a.name = "Alice"
RETURN b.name, confidence_at(r, ago("30d")), confidence_history(r)

The history lives on the edge itself. Not in a separate audit table. Not in application logs. On the edge.

MinnsQL: one language across graph and tables

MinnsDB does not force a choice between a graph model and a relational model. MinnsQL handles both, and they can reference each other within a single query through NodeRef columns that bridge table rows to graph nodes.

Graph patterns for relationships:

MATCH (a:Person)-[r:KNOWS]->(b:Person)
WHERE r.confidence > 0.8
RETURN a.name, b.name, r.weight
ORDER BY r.weight DESC

Table queries for structured records:

FROM orders
WHERE orders.status = "pending" AND orders.amount > 50.0
RETURN orders.id, orders.customer, orders.amount

Cross-model joins bridging the two:

MATCH (n:Person)
JOIN orders ON orders.node = n
WHERE orders.status = "shipped"
RETURN n.name, orders.amount

Tables support full DDL and DML. Column types include String, Int64, Float64, Bool, Timestamp, Json, and NodeRef. Tables are bi-temporal by default. Every UPDATE creates a new row version and closes the old one. You can query all historical versions with WHEN ALL.

The research makes a strong case that the most useful agent memory topology is planar (2D): entries connected via explicit relationships, enabling multi-hop reasoning. MinnsDB's graph is exactly this. But the research also shows that many queries are simple factual lookups that do not need graph traversal. Tables handle those efficiently. Having both in the same query language means you use the right structure for each kind of data without needing two databases and a sync layer.

inside the query engine

A MinnsQL query goes through three stages: parse, plan, execute.

Planning. The parser produces an AST. The planner converts it into an ExecutionPlan with a sequence of physical operators:

Operator	Purpose
`ScanNodes`	Find graph nodes by label and property filters. Uses type index or concept-name index.
`Expand`	Traverse edges from bound nodes. Single-hop or variable-length BFS (configurable hop range).
`ScanTable`	Full scan or index scan of a relational table.
`JoinTable`	Hash join between tables, or graph-to-table lookup via `NodeRef`.
`IndexScan`	Point or range lookup on a primary key index.
`Filter`	Evaluate boolean expressions against intermediate rows.

Each variable in a MATCH pattern gets a numeric SlotIdx during planning, eliminating string lookups at execution time.

Temporal visibility is compiled into the plan as a TemporalViewport:

ActiveOnly         -- valid_until is None (default)
PointInTime(ts)    -- edges valid at a specific moment
Range(start, end)  -- edges valid during an interval
All                -- full history

Edges outside the viewport are rejected at the scan/expand step. They never enter the intermediate result set. This is critical for performance: a graph with millions of historical edges still executes queries quickly because the executor only materializes what is temporally relevant.

Graph execution stores intermediate results in BindingRow structs with a fixed-width slot array. Queries with 16 or fewer variables (the common case) use inline stack storage. Larger queries spill to heap. Variable-length paths use bounded BFS capped at 10,000 visited nodes. Total intermediate rows are capped at 100,000. A 30-second deadline prevents runaway queries.

Table execution uses a separate path optimised for relational access patterns. Predicate pushdown short-circuits to index lookup when a ScanTable is followed by a primary key equality filter. Tables use 8KB slotted pages with blake3 checksums and a binary row codec with O(1) column access.

Both paths share the same post-processing: projection, aggregation (with implicit GROUP BY), HAVING, ORDER BY, DISTINCT, LIMIT.

temporal algebra

The research on temporal reasoning in agent memory is thin. Most systems treat time as a sort key. MinnsDB treats it as a query algebra.

Allen's Interval Algebra is built into MinnsQL as native predicates:

-- Find where one relationship ended exactly as another began
MATCH (a)-[r1]->(b), (a)-[r2]->(c)
WHEN ALL
WHERE meets(r1, r2)
RETURN a.name, type(r1), type(r2)

The full set: overlap, precedes, meets, covers, starts, finishes, equals. Plus SUCCESSIVE(r1, r2) for edges that are temporally adjacent within a tolerance (default 1 second).

This is what lets an agent answer "did Alice change jobs and cities at the same time?" without application logic. The database expresses temporal relationships between facts as first-class query predicates.

Time-bucketed aggregations:

MATCH (a:Person)-[r:payment]->(b:Person)
WHEN LAST "90d"
RETURN time_bucket("1w", valid_from(r)) AS week,
       sum(r.amount) AS total,
       count(*) AS txn_count
ORDER BY week DESC

Change detection across windows:

MATCH (a:Person)-[r]->(b)
WHERE CHANGED(r, ago("7d"), now())
RETURN a.name, type(r), b.name, change_type(r, ago("7d"), now())

The change_type function returns whether a fact started, ended, or was created during the window. Your agent queries for exactly what shifted in any time range. No diffing. No snapshot comparison. The database knows what changed because it never lost track of what was true before.

reactive subscriptions

The research on agent working memory emphasises that agents should not passively buffer context. They should actively control what enters their attention. MinnsDB's subscription system is the reactive version of this principle: instead of polling for changes, the agent declares what it cares about and the database pushes deltas.

SUBSCRIBE MATCH (a:Agent)-[r:KNOWS]->(b:Agent)
RETURN a.name, b.name, r.confidence

This registers a live query. Every graph mutation produces a DeltaBatch on an internal broadcast channel. The SubscriptionManager checks each delta against each subscription's trigger set: a precompiled set of node type discriminants and edge type strings. If a delta cannot affect a subscription, it is rejected in O(1).

For deltas that pass the trigger check, the system chooses between two maintenance strategies:

Incremental maintenance for simple patterns. Operator states track scan candidates, expanded edges, filter results, and aggregation accumulators. Only the delta-affected portion is reprocessed. Cost is proportional to the change, not the result set.

Structural diffing for complex patterns (variable-length paths, node merges). The full query re-executes and the output is compared against the cached previous result. Only actual insertions and deletions are emitted.

// WebSocket: subscribe to a live query
{"type": "subscribe", "query": "MATCH (a)-[r]->(b) RETURN a.name, type(r), b.name"}

// WebSocket: delta pushed on change (~100ms)
{"type": "update", "inserts": [["Alice", "lives_in", "Berlin"]], "deletes": [["Alice", "lives_in", "London"]]}

The subscription fires when Alice moves. Your agent does not poll. It reacts. This is what the working memory research calls "active control over what enters the context." Subscriptions are the mechanism.

from events to episodes to consolidated memory

This is where MinnsDB diverges most sharply from "give the agent a database" approaches. A database stores what you put in it. MinnsDB has an opinion about how raw events become structured memory.

The research describes a dynamic lifecycle: formation (what to store), evolution (how to maintain it), and retrieval (how to access it). Most memory systems implement retrieval well and punt on formation and evolution entirely. MinnsDB implements the full lifecycle.

formation: the ingestion pipeline

Events enter through the HTTP API. Each event flows through a multi-stage pipeline:

Claims extraction. An LLM cascade runs entity extraction, relationship discovery, and fact production. In parallel, a financial NER extractor handles structured patterns (payer, amount, payee). Claims are deduplicated by a normalised (subject, object, category) tuple and scored for confidence. This is the knowledge distillation strategy the research recommends over raw summarisation, extracting discrete facts rather than compressing everything into a summary.

Graph construction. Claims become temporal edges. Edge behaviour is governed by an OWL/RDFS ontology loaded from Turtle files. Functional properties (like location:lives_in) automatically supersede their predecessors: the old edge gets a valid_until stamp, the new edge opens. Append-only properties (like financial:payment) accumulate without supersession. Symmetric properties create edges in both directions. The ontology is the schema. Adding a new domain is writing a Turtle file, not modifying code.

Episode detection. The EpisodeDetector segments the event stream into coherent episodes. This is the part the research calls experiential memory: not just "what happened" but "what was the experience and what was the outcome." Episodes are cut on four boundary triggers:

Context shift: embedding cosine similarity and goal-set Jaccard distance drop below 0.4
Time gap: more than 1 hour of silence between events
Outcome event: a success or failure after at least 2 events in the episode
Cognitive event: a goal formation or planning event starts a fresh episode

Each episode gets a significance score computed from six weighted signals: goal relevance, causal chain depth, duration, novelty (tracked via decay counters over seen contexts and event types), event-type baseline, and optional manual override. A prediction error score measures surprise, novel situations score higher than familiar ones. This drives what gets consolidated and what fades.

evolution: three-tier consolidation

This is where the "dynamics" axis of the research maps most directly onto MinnsDB's architecture. The system implements a three-tier consolidation pipeline modeled on how biological memory actually works, and more specifically, on the strategy-based experiential memory the research identifies as the most underexplored area for practitioners.

Tier 1: Episodic. Raw episodes with summaries, takeaways, outcomes, strength scores. High fidelity, high cost per entry. The case-based level of experiential memory.

Tier 2: Semantic. When 3+ episodic memories share a goal bucket, they are clustered and synthesized into generalized knowledge. Success and failure rates are aggregated. Individual experiences become patterns. "Three deployments that all required a migration" becomes "deployments usually need a migration step." This is the strategy-based level: distilled insights that transfer across tasks.

Tier 3: Schema. When 3+ semantic memories cluster by embedding similarity (threshold 0.80), they are synthesized into reusable mental models. The highest-compression, most general memories. This is what the research calls schematic patterns: high-level templates applicable to novel situations.

Consolidation runs automatically. Source memories decay by a configurable factor (default 0.3x strength) after consolidation. This is not the naive "forget old stuff" approach the research warns against (which destroys rare but critical long-tail knowledge). It is selective consolidation: the knowledge moves up through the tiers, the raw experience decays but is not deleted. The agent builds understanding from experience without you writing consolidation logic.

retrieval: hybrid search

Retrieval uses BM25 keyword search plus embedding similarity plus reciprocal rank fusion. This is the hybrid approach the research consistently recommends over any single retrieval method. BM25 catches exact matches. Embeddings catch paraphrases. RRF merges rankings without requiring score calibration across methods.

The natural language query pipeline goes further: question to LLM intent classification to graph projection (a dynamic walk, not a precomputed view) to multi-source ranking across claims, memories, and strategies.

concurrency model

Write operations flow through 2 to 8 write lanes, sharded by session_id % num_lanes. This preserves FIFO ordering per session while allowing parallel writes across sessions. Each lane is a bounded channel with backpressure: if full, the caller gets a 503 after 5 seconds. Per-lane metrics track in-flight counts, completions, rejections, and latency percentiles via a ring-buffer tracker.

Read operations acquire permits from a read gate: a tokio semaphore with num_cpus * 2 permits. Each query holds a permit for its duration (RAII release on drop). Latency is tracked per-permit lifecycle.

Writes never block reads. Reads never block each other up to the permit limit. The graph sits behind Arc<RwLock<GraphInference>> with a strict lock ordering protocol that makes deadlocks structurally impossible.

storage

The knowledge graph uses SlotVec<GraphNode>, a dense arena giving O(1) node access by ID. Edges, outgoing adjacency lists, and incoming adjacency lists each have their own SlotVec. This layout keeps graph traversal cache-friendly.

Relational tables use 8KB slotted pages with blake3 checksums. Updates create new row versions and close old ones, so table storage is bi-temporal by default. Indexes are rebuilt in memory from persisted pages on startup.

Everything persists to a single embedded ReDB file with a 256MB page cache. The full state (graph, tables, episodes, memories, strategies, WASM modules, auth keys) lives in one file. Binary export and import for snapshots.

what ships in the binary

One Rust binary. One process. No sidecars. No plugins to assemble.

Bi-temporal knowledge graph with OWL/RDFS ontology-driven edge behavior
Relational table engine with bi-temporal row versioning
MinnsQL parser, planner, graph executor, table executor
Reactive subscription manager with incremental maintenance
Episode detector with novelty-weighted significance scoring
Three-tier memory consolidation (episodic, semantic, schema)
Claims extraction via LLM cascade plus financial NER
Hybrid search (BM25 + embeddings + reciprocal rank fusion)
Natural language query pipeline
WASM agent runtime with instruction metering, 64MB memory cap, permission system, cron scheduling
API key authentication with group-scoped multi-tenancy
60+ HTTP endpoints, REST and WebSocket

get started

git clone https://github.com/minnsdb/minnsdb.git
cargo build --release -p minnsdb-server
./target/release/minnsdb-server

Ingest a state change:

curl -X POST http://localhost:3000/api/events/state-change \
  -H "Content-Type: application/json" \
  -d '{"entity": "Alice", "new_state": "Berlin", "old_state": "London", "category": "location"}'

Query the timeline:

curl -X POST http://localhost:3000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "MATCH (a:Person)-[r:lives_in]->(city) WHEN ALL WHERE a.name = \"Alice\" RETURN city.name, valid_from(r), valid_until(r)"}'

Subscribe over WebSocket:

wscat -c ws://localhost:3000/api/subscriptions/ws
> {"type": "subscribe", "query": "MATCH (a)-[r]->(b) RETURN a.name, type(r), b.name"}

The research is clear: the hard problems in agent memory are temporal. Formation, evolution, and consolidation over time. Not storage. Not retrieval. The lifecycle.

MinnsDB is the database that implements the lifecycle. Try it out https://minns.ai/

Top comments (3)

Dan Parry • Mar 28

Really like this framing, temporal structure is definitely a missing piece

One thing we keep seeing though is even with well-structured, time-aware memory, agents often don’t actually improve their behaviour over time.

Feels like the next step isn’t just storing events better, but turning that history into something agents can adapt from, otherwise you still end up with the same decisions after 1 interaction vs 1,000.

Jonathanfarrow • Mar 28

We also released an open-source agent framework, github.com/Minns-ai/agent-forge-sdk, that can utilise minnsDB to build multi-agent workflows. Specifically, the vibe graph and subscription elements

Rob van Dort • May 3

Looks good, will try. Reading between the lines I guess the storage model is not immutable / append only: "Updates create new row versions and close old ones"?