Daniel Nevoigt

Posted on Jun 23

Claude forgets everything between sessions. Here's how I fixed it.

#ai #opensource #claude #mcp

Every Claude session starts from zero.
You spend an hour explaining your architecture, your naming conventions, the three decisions you already made and don't want re-litigated. You close the tab. Next morning you open a new chat and Claude greets you like a stranger. You explain it all again.
After the fortieth time, I stopped re-explaining and built a fix. It's open source, MIT-licensed, and installs with one command. This post is the 5-minute version of how it works and how to run it yourself.
The actual problem
LLMs are stateless. Each conversation is a clean slate — by design. "Memory" features that do exist usually mean one of two things:

A Redis/Valkey server you have to stand up and keep running, or
A managed cloud service where you sign up, get an API key, and your context lives on someone else's infrastructure.

Both work. But both mean your project decisions, code snippets, and the occasional credential you pasted while debugging now sit on a server you don't control. For a tool whose entire job is to remember everything you tell your AI, that tradeoff bothered me.
I wanted memory that stays on my disk.
The approach: your notes are the database
Bastra Recall is an MCP server (Model Context Protocol — the open standard Claude uses to talk to external tools). Instead of a database, it writes memories as plain Markdown into a local Obsidian vault — a folder of .md files on your machine.
That design choice does a few things at once:

The data is yours and it's readable. Open any memory in a text editor. No export tool, no lock-in. If you delete the vault folder, the memory is gone — fully under your control.
One daemon, every tool. The same daemon feeds Claude Code, Claude Desktop, and Cursor. A decision you store in one shows up in the others.
No server to babysit. No Redis, no cloud account, no API key.

When you tell Claude "remember that we use Drizzle, not Prisma, on this project," that fact lands as a Markdown note. Next session — new tab, days later — Claude retrieves it automatically before answering.
Install it (the whole thing)
One command patches the MCP config for every AI tool it detects, idempotently and with a backup:
bashnpx bastra-recall install all --vault /absolute/path/to/your/vault
Then verify the registrations:
bashnpx bastra-recall doctor
Restart Claude Code / Desktop / Cursor, and memory is live. That's it.
Honest constraints, up front:

macOS, Apple Silicon, Node 22+ for now. Linux/Windows are on the roadmap.
It's early — currently 0.7.0-beta.1. Working, in daily use by me, but beta.
Expect rough edges, including during install. This is genuinely early software and something may break on your setup that never broke on mine. If it does, that's useful to me — please tell me exactly what went wrong, either as a comment on this post or as a GitHub issue. The more precise (OS version, Node version, the command you ran, the error you saw), the faster I can fix it.

How retrieval works (the 30-second version)
Storing is easy — anything is a file. The hard part is pulling the right memory back without flooding Claude's context with junk. Recall ranks stored memories by relevance to the current conversation and injects only the top matches, so you get the decision you need without burning your context window on everything you've ever said.
If you want to go deeper on the retrieval and benchmarking, that's the next post. This one is just: here's the problem, here's a thing that fixes it, here's how to run it.
Try it / tear it apart
Repo (MIT): github.com/n0mad-ai/bastra-recall
If you've solved AI memory a different way, I want to hear it — especially if you think the local-Markdown approach is wrong. And if it saves you from explaining your stack for the forty-first time, a star helps other people find it.
It works the same in Cursor and any other MCP client, not just Claude. But Claude is where I felt the problem first.

Top comments (55)

Mike Czerwinski • Jun 24 • Edited

Three primary keys, three access patterns, three lifecycles. A recall engine for facts (here), a govern-surface for active operating state (Raffaele's cowork-os below), and a decision-audit layer for what we decided, why, what got rejected, what later superseded it. Facts get ranked. State gets pinned-while-active. Decisions get cited-by-id with supersession chains. One ranker cannot serve three objectives any more than it could serve two, and the engine-primitive-with-policy-above shape you and Raffaele landed on is probably the right answer for the third layer too.

On #142, since the issue lists the open design questions explicitly, the trade-off I keep landing on is between surface-write and engine-poll. Surface-write (your default) keeps the engine ignorant of condition semantics and respects the latency-is-sacred constraint, surface tells engine pin/release, engine honors. The failure mode is surface drift: if cowork-os crashes, disconnects, or quietly mis-marks a decision settled, the floored set stays stuck. Engine-poll (surface registers an opaque condition URI, engine does a cheap dirty-read at gate-fire time) is self-healing against surface drift but adds an external dependency in the hot path. For most cases your default is right. For high-stakes constraints in a decision-audit layer, the self-healing version earns its cost. Might be a per-entry flag rather than a global mode: surface-managed by default, condition-source-polled opt-in when the cost of a stuck floor is worse than the cost of an occasional poll.

Honest stage marker: I am building a decision-audit primitive set in parallel (jugeni-contracts, public since yesterday, openly an early-stage participant, not a pioneer claim). Reading your repo this morning sharpened that position. Bastra is further along on the retrieval and eval discipline (verification contracts, doc2query, the gardener-as-second-recall-layer thread) than I credited from the post alone, and the recall_when field as highest-weighted is already doing half the work I was about to argue for: the save-time trigger declaration as a first-class field, not a tag.

The reason I am posting on the parent rather than threading into the Daniel-Raffaele exchange below is that what you and Raffaele are pushing on is the substrate underneath the work I care about, and I would rather have the substrate be MIT and local than try to invent it. The local-Markdown-outliving-the-tool property is the one that matters six months in.

Quick adjacent: #140 (vault self-audit surfaced as Markdown in the vault, Obsidian as viewer) is the same audit-artifact-in-the-medium move I am reaching for on the decision-audit side. Watching that one too.

Daniel Nevoigt • Jun 24

three keys, three access patterns, three lifecycles is the right decomposition, and it's cleaner than the two-layer version Raffaele and I were working — facts get ranked, state gets pinned-while-active, decisions get cited-by-id with supersession chains. and you're right that engine-primitive-with-policy-above generalizes to the third layer: the engine should know how to honor a citation or a supersession edge without knowing what the decision meant, same split as the floor.

on the #142 trade-off, surface-write vs engine-poll is exactly the seam, and your per-entry framing is the answer — not a global mode. surface-write is the right default precisely because of latency-is-sacred: the engine stays ignorant and a floored-set check is O(1) at the gate. and the failure mode you name is real but worth grading — a stuck floor is a leak, not a crash: it over-loads one memory and wastes a little context, it doesn't corrupt anything. so for most constraints the cost of surface drift is low and the default wins. engine-poll in the hot path is the thing the latency rule exists to prevent, so it can't be the default — but as a per-entry opt-in, scoped to the few high-stakes constraints where a stuck floor actually hurts, it earns its poll. surface-managed by default, condition-source-polled opt-in. that's the right granularity, and i'll add it to #142 as exactly that: an opt-in self-healing flag on the entry, not a mode on the engine.

honest read back: the early-stage marker is noted and respected — and the part that makes this a real conversation rather than a pitch is that you're reaching for the same audit-artifact-in-the-medium move (#140 is precisely that on our side) and would rather the substrate be MIT and local than reinvent it. that's the right instinct. the local-Markdown-outliving-the-tool property is the one i'd defend hardest too — it's the whole reason the data is plain files. going to watch jugeni-contracts: a decision-audit layer with append-only, idempotent-under-replay, supersession chains is the third primary key you named — and if the substrate under it is the same plain-Markdown engine, that's three things composing instead of three people reinventing the same store.

Mike Czerwinski • Jun 24

Opt-in self-healing flag on the entry — that's the right shape for #142. Engine stays ignorant, surface stays cheap, the few high-stakes constraints earn their poll. The naming you landed on (surface-managed default, condition-source-polled opt-in) carries the trade-off without leaking it into the engine API. Good.

The leak-not-crash grading matters because it sets the default correctly. If the failure mode of surface drift were corruption, the default would have to flip. It isn't, so it doesn't. That distinction usually gets lost in "let's just self-heal everything" reflex.

On the third layer: yes, engine-honoring supersession edges and citation-by-id without knowing what the decision means is the same split as the floor. The decision layer pays for its own semantics; the engine pays for the topology. Append-only, idempotent-under-replay, and a citation that survives the cited cell being demoted but not deleted — that's what the substrate has to honor. The rest is policy.

Honest read back on jugeni-contracts: not yet a thing you can pip install. The shape is settled — append-only ledger, supersession_reason field, citation graph, planted-fault history requirement on the audit trail itself — but the substrate question is exactly the one you named. Reinventing a plain-Markdown engine to put a decision-audit layer on top would be the wrong move. If #140's audit-artifact-in-the-medium covers it and #142 lands as the per-entry opt-in, then the contracts layer is policy-on-top, not a separate store. Three things composing.

Watching #140 and #142 land. When the contracts layer has something runnable rather than a thesis, will open against the substrate you're building, not adjacent to it.

Daniel Nevoigt • Jun 25

the confirmations land — and the one new requirement you name is already a substrate property, which is the good news. "a citation that survives the cited cell being demoted but not deleted" is exactly the line bastra already holds: ids are stable and file-based, and a demote/floor (#142) only changes the score, never the existence — so a citation-by-id survives a demote trivially, the cell is still right there. and delete is soft: a memory moves to trash with an append-only audit-log entry (event history is never hard-deleted), so even a "deletion" is a reversible state the citation can point through rather than a dangling reference. so the contracts layer doesn't have to defend against the substrate eating its citations — the substrate was already built not to.

the planted-fault-history requirement is the part i most want to see you hold, because it's the same discipline the recall side runs on: you don't claim the trail is tamper-evident, you plant faults and show it catches them — held-out, adversarial, not asserted. an audit layer that can't fail a planted supersession is just a log with good intentions. if jugeni-contracts ships that as a contract rather than a hope, that's the part that makes it trustworthy enough to compose against.

and compose is the right frame: facts get ranked (recall), state gets pinned-while-active (the floor + cowork-os), decisions get cited-by-id with supersession chains (contracts) — three primary keys, one plain-Markdown substrate, each paying for its own semantics while the engine pays for the topology. building against it rather than adjacent is exactly it. when you've got something runnable, open the issue against the substrate — that's the seam we've been clearing.

Mike Czerwinski • Jun 25

The substrate-property concession resolves the worst part of the design space I was carrying, which was contracts having to defend against the substrate eating its own citations. If demote-not-delete is structural and soft-delete is append-only audit, contracts inherit the survival property without paying for it twice. That's the part I'd been planning the wrong way.

On planted-fault-history as contract not hope, that's the part I'm holding hard, because it's the same discipline that needs to land on the writing side too. There's a run sheet I opened yesterday for a planted-fault-on-own-thesis post: small N, two models, hold-out set the model can't see during the run, retraction rule pre-committed before any result lands. The reason I want that post out before the contracts repo opens is so the discipline is visible at the writer level first, not just claimed at the layer level. A repo that says "we plant faults and show catches" while the operator has never demonstrated catching their own planted fault is exactly the log with good intentions you're calling out.

The three-keys composition with one plain-Markdown substrate and engine paying for topology is sharper than where I'd had it in my own head. Facts ranked, state pinned, decisions cited-with-supersession, each paying its own semantics. Recall + Bastra + contracts compose because they share the substrate and don't reach into each other's score functions or pinning rules or citation chains. That's the seam.

When the contracts side is runnable I'll open the issue against the substrate, and the first thing I'd want pressure-tested is whether the citation-by-id discipline interacts cleanly with cowork-os pinning when the same cell is referenced from both keys at once. That's the boundary I haven't seen a clean architectural answer for yet.

Daniel Nevoigt • Jun 26

On the substrate concession — worth being precise about what it is, because it's
load-bearing for you: survival isn't a property contracts add, it's one the
substrate already owes everyone. Stable ids, demote = score only (never removal),
soft-delete = move-to-trash plus an append-only audit entry. A citation pointing
at a demoted-or-retired cell keeps resolving by id; the cell doesn't evaporate
under it. So you don't pay for it twice — you don't pay for it at all, you inherit
it. Build contracts assuming it; if I ever break it, that's my regression to own,
not yours to defend against.

The seam is the right framing, and it's the same discipline one level up: the
engine ships the mechanism, each layer owns its own policy and never reaches into
another's. Facts pay for ranking, state pays for pinning, decisions pay for
citation + supersession — three keys, three lifecycles, one substrate, none of
them reading another's score function. That's exactly why they compose instead of
fighting for the same knob.

On the boundary you flagged — same cell cited by a decision and pinned by
cowork-os at once — I think it's your own seam, finished. If both keys only ever
point by stable id and neither reads the other's semantics, simultaneous reference
is two independent overlays on one id, not a conflict. The engine guarantees
exactly one shared thing: survival — id stable, demote ≠ delete, unpin ≠ delete,
soft-delete = append-only. Pin and citation can't collide because the only thing
that could collide them — one layer hard-deleting the cell under the other — is
precisely what the concession rules out. Unpinning a cell a contract still cites
just drops it back to ranked; the citation resolves the whole time. What's left
isn't citation-vs-pin at all — it's concurrent writes to one file, i.e. substrate
sync (CRDT territory), solved once for all three keys, not per-pair.

The plant-the-fault-on-the-writer-first instinct is the right one, and it's the
same contract we hold on the recall side: a verification gate has to beat a
planted null before it earns trust, or it's just a longer way to memorise your own
answer. Discipline visible at the author level before it's claimed at the layer
level is exactly how you avoid the log-with-good-intentions. Get that post out —
it'll do more for the contracts repo's credibility than the repo's first commit will.

Timing: open the issue against the substrate whenever contracts is runnable. The
citation-survival contract is testable today; the citation-×-pin boundary only
gets real teeth once both contracts and cowork-os are actually running against the
same vault, and neither is there yet. No rush on my end — the substrate guarantee
isn't going to move under you.

Mike Czerwinski • Jun 26

Then I'll build contracts assuming survival as substrate-owed, not contract-defended. If it ever breaks, that's a regression to file, not an invariant to re-litigate. Cleaner allocation: guarantee the thing every layer needs once at the bottom, and no contract spends a line defending what it was already promised.

The citation-×-pin collapse is the part I'll carry. You're right it was never citation-vs-pin. Once neither key reads the other's score function and the only shared guarantee is survival-by-id, two references to one id are two overlays, not a conflict. What's left is concurrent writes to one cell: substrate sync solved once for all three keys, not a boundary I owe per-pair. That dissolves the worst thing I was holding, which is the part I'd been planning the wrong way.

On the writer-first fault, same gate both sides hold: a verifier that can't beat a planted null on its own author hasn't earned the layer-level claim. I'll get the post out with that discipline visible at the author level before contracts asserts it.

Timing noted. Citation-survival I can test today. Citation-×-pin only gets teeth once contracts and cowork-os run against the same vault, so I'll open the issue against the substrate when contracts is runnable, not before. No point asserting a boundary neither side can exercise. Until then survival is the one contract I lean on, and you've said it won't move under me.

Daniel Nevoigt • Jun 27

Good. And since survival is the one contract you're leaning on, let me make it
enforceable rather than promised: it's already the substrate's behaviour (stable
ids, demote = score-only, soft-delete = trash + append-only audit), but a
behaviour isn't a guarantee until it's pinned. I'll write it down as an explicit
substrate invariant with a regression test — so "a citation resolves against a
demoted-or-retired cell" is something CI breaks on, not something I remember to
preserve. That gives you something concrete to test citation-survival against
today, and turns my promise into the substrate's job, not this thread's.

The allocation point is exactly right, and it's the whole reason the three keys
compose: guarantee it once at the bottom, and no layer above spends a line
re-defending it. Facts, state, decisions each only have to be themselves.

Same writer-first gate — and honestly that's what makes the three-layer story
credible to anyone watching, more than any architecture diagram: each layer
catching its own planted fault before it claims the layer-level property. Get the
post out; I'll point at it.

Agreed on timing: no issue until contracts is runnable, no asserting a boundary
neither side can exercise. Survival's the standing contract until then — I'll make
sure it's standing in writing, not just here.

Mike Czerwinski • Jun 27

Turning the behaviour into a CI break is the move that makes it a contract instead of a courtesy. A behaviour I can observe today and an invariant that fails the build are different epistemic objects, and only the second is something I can build contracts on without re-checking your memory. "A citation resolves against a demoted-or-retired cell" breaking CI is exactly the test I'd lean on, because it pins survival to a result a third party can re-run, not to either of us remembering to preserve it. Same writer-first gate, one floor down: the substrate proving its own planted-null before contracts claims the property. I'll get the post out with the author-level discipline visible and point back here for where survival became enforceable rather than promised. Good run.

Daniel Nevoigt • Jun 29

That's the distinction I wanted us to land on, and you put it better than I
did: a behaviour I can observe and an invariant that fails the build are
different epistemic objects. The whole point of moving it into CI is that you
stop having to trust my memory — survival becomes a result a third party
re-runs, not a courtesy either of us remembers to extend.

So I'm making it real rather than leaving it a promise. "A citation resolves
against a demoted-or-retired cell" goes in as a regression test on the
substrate: demote moves score only, retire drops to ranked, soft-delete
trashes append-only — and in all three the citation still resolves by id. The
day any of those starts evaporating the cell instead of demoting it, the build
goes red. That's the line turning from promised to enforceable, and it's a
substrate job, not a contract one — contracts inherit it and never spend a
line defending it.

And "the writer-first gate, one floor down" is exactly the right framing: the
substrate has to prove its own planted-null — kill the cell, assert the
citation breaks the way it should — before it gets to claim survival as a
property. Same discipline as your planted-fault self-test, one layer beneath
it. A property the substrate asserts but never tried to break is just a
behaviour with better PR.

Get the post out — point back here, and I'll have the test landed so the link
goes to a red-or-green check, not a paragraph. Good run on your side too; this
was the cleanest seam of the three.

Mike Czerwinski • Jun 29

Right order: test before link. A link to a paragraph is a promise, a link to a check is a fact, and the post is about not shipping promises dressed as facts. I'll point back here once it lands so the link resolves to red-or-green, not prose. Good run on your side.

Daniel Nevoigt • Jun 30

Then it's a fact now, not a paragraph. The invariant landed as a check: survival-by-id.test.ts is green in CI on main — demote stays score-only (file byte-identical), soft-delete is trash + append-only audit (recoverable via restore), id resolves until a hard delete. The link resolves to red-or-green, not prose: the day anyone makes demote or soft-delete evaporate a cell instead of demoting/trashing it, that run goes red.

Check (green): github.com/n0mad-ai/bastra-recall/...
Test: github.com/n0mad-ai/bastra-recall/...
Issue: github.com/n0mad-ai/bastra-recall/... (closed)

One honest red-or-green gap, since your post is about not dressing promises as facts: the retire/unpin arm is a test.todo, not a green assertion — #142's floor (last_affirmed, expired → drop-to-ranked) doesn't exist yet, so there's no code to check against. It becomes a real arm when #142 lands, not before. Point back whenever you're ready; the link will hold.

Daniel Nevoigt • Jul 4

Follow-up on "where survival became enforceable instead of promised": the suite grew two arms since.

One: the retire/unpin arm is no longer a todo. The floor primitive landed, so "unpin drops to ranked, never deletes" is pinned as a green CI test against real code instead of a signpost against absent code.

Two, as of today: a deterministic staleness curator landed, and its demotion is pinned the same way - score times 0.5, file byte-identical, by-id resolution untouched, citations resolve, and clearing the demotion restores the exact prior score. So the full lifecycle a third layer can observe - demote, soft-delete, unpin, curator-demote - now breaks CI if any of them ever evaporates a cell.

Your framing did the work here: each of those started as a behaviour and became an invariant only when the test became something a third party can re-run. When contracts goes runnable, the substrate side of the seam is waiting.

Mike Czerwinski • Jul 4

Four transitions pinned is a different claim than lifecycle pinned, and the gap is the interesting part. The pins prove demote, soft-delete, unpin and curator-demote cannot evaporate a cell. They cannot prove the set of transitions stays at four. The fifth mutation path (a merge arm, a dedup arm, whatever lands next quarter) ships unpinned by default, because nothing forces the person adding it to buy the invariant.

The mechanical close: enumerate the write paths from code (every call site that mutates a score or a cell) and diff that list against the pinned transitions in CI. Then "every mutation has an invariant" stops being a convention and becomes one more green test that goes red when someone joins the seam without paying for a pin.

At that point the suite guards its own coverage, which is the last place a promise was still living.

Daniel Nevoigt • Jul 5

You're right, and the gap has a name: point-coverage, not set-coverage. The four pins prove those four can't evaporate a cell; nothing proves the set stays four. A merge or dedup arm next quarter ships unpinned because the seam doesn't charge admission.

We're half-built for the mechanical close, and the halves are unequal. Cell mutations already run through one module — audit-save.ts (auditedSave / auditedSoftDelete / auditedRestore); every cell write goes through it. So the diff you want is cheap there: enumerate the exported audited* surface, diff against the pinned arms, red when a fifth audited path lands without one. That half is enumerable from code today, not convention.

The score-mutation half is where your point bites. Staleness demote, curator-demote (#155), floor injection (#142) are scattered across search.ts and the daemon — no single gateway. There "enumerate from code" only holds if they all pass one audited point, and a bare marker (@mutates-score) doesn't close it — it just relocates the convention: someone still has to remember to tag. The real close routes score mutations through one gateway too, then the test diffs its callers. Otherwise the coverage guard watches the half that's already gathered and quietly misses the half that isn't — which is your failure, one level up.

So I'll open an isolate for it: the audit-save diff first (real today), the score-gateway as its precondition, not a marker. That's the last place the promise was living — right to want it dead. Refs #146, #142, #155.

Mike Czerwinski • Jul 6

The point-coverage/set-coverage split is the correct diagnosis and the audit-save precedent tells you what "closed" actually requires: it's not one function, it's a closed set of legal calls into that function. If the score-gateway ends up as a single entry point that accepts an open string reason and switches on it internally, you've re-created exactly the marker problem, someone has to remember to route through the gate at all, and once inside they can invent a new reason string without touching anything the coverage test reads. The fix that generalizes from audit-save is probably a typed union of mutation reasons (staleness-demote, curator-demote, floor-injection, and nothing else), where adding a fifth kind requires editing the union itself, not just calling the gateway with a new string. Then your diff-the-callers test has something closed to diff against on the score side too, the same enumerability audit-save already gives you for free.

Worth checking whether staleness-demote, curator-demote, and floor-injection actually want different invariants, different rate limits, different required audit fields, before collapsing them into one call shape. If they do, the union needs to carry that variance explicitly rather than the gateway silently treating them the same.

Daniel Nevoigt • Jul 7

You named the trap before I built it. There's no score gateway yet: setDemotions on the index takes an id set and nothing else — no reason, no kind; staleness is a query-time multiplier with no call at all; floors are a separate registry keyed on an opaque condition. So the open-string version doesn't exist to re-create — but neither does the enumerability, and that's the real gap. audit-save already hands me a closed AuditOperation union (create | update | delete | restore) plus a free-text reason for humans; the score side has no equivalent closed set of "what is allowed to move a cell's rank."

Your generalization is exactly the right port: operation typed and closed, reason prose and open, and the diff-the-callers test reads only the closed part. On the score side that's a typed union of the kinds where a fifth is a compile error, not a new string.

Your second paragraph is what stops me collapsing eagerly. The kinds don't share a shape: staleness is stateless and query-time, curator carries an observation-window gate and a persisted stale_since, and floor isn't a demotion at all — it grants guaranteed context, condition-keyed with a real release and an affirm lifecycle the demotes have no analogue for. Folding those into one call shape would be the marker problem wearing a type. So the union has to carry the variance, and floor probably stays its own primitive rather than pretending a pin is a demote.

What I keep noticing across your two comments: this is the structural half of the same move you made on the benchmark thread. There it was "don't let usage close a loop on the ranker's own selections — make an out-of-loop outcome the signal." Here it's "don't let mutations close a loop on an open string — make a closed type the thing the test reads." One instruction, put the reference outside the loop, spent once in the type system and once in the signal. Good catch, both times.

Mike Czerwinski • Jul 7

Right on keeping floor separate, and the enumerability picture is even cleaner once you name it: your test surface is not one union but two, with a joint constraint neither covers alone. MutationOp enumerates rank-lowering; FloorOp enumerates rank-raising; the invariant that matters is total-ordering consistency across both. A demote landing on a floored cell can race the floor's condition-check and produce a rank the ordering shouldn't allow, but no diff-the-callers test on either union alone will catch it.

Concretely: the enumerability discipline you're generalizing wants two tests. First is per-union, the one you already have shape for: enumerate legal ops in the union, diff callers, fail on a fifth. Second is cross-union: enumerate ordered pairs where the two can act on the same cell in overlapping windows, prove those pairs preserve the ordering invariant. That's where the varieties you refused to collapse pay off: staleness being query-time and curator being persisted means (staleness × floor) has a different race window than (curator × floor), and both need enumerated separately. The union's job is to make each individually enumerable; the invariant's job is to make the combinations enumerable. Both are closed sets in your sense, just closed over different products.

Daniel Nevoigt • Jul 8

The two-union view holds, and the query-time-vs-persisted axis checks out against the code: staleness is recomputed fresh against now on every recall — no stored status field, no file write — so the status flips from aging to stale between two queries on the passing wall-clock alone, without a single write. Curator is the opposite: a persisted id-set (curator/state.json), written after every acting pass, restored on boot, frozen between passes. Different race classes, you're right — and your "refused to collapse" pays off a level below the floor already: even as pure demotes the two need separate enumeration, because what changes their input (wall-clock vs pass-write) ticks on different clocks.

One place where the code shifts your framing — in your favor, I think: floor today is not a rank-raising score op at all. It lives in a separate registry outside the score path — search.ts is never touched; the pinned block is injected in the SessionStart hook, before the score-gated hints, over a deliberately non-score-gated channel. So there's no shared score space in which MutationOp and FloorOp would have to be totally ordered against each other. What you model as cross-union ordering is, in the current state, channel separation: the demote score (staleness × curator × doc-damping — where your MutationOp union lives) and floor presence are two separate channels, not two operators on one value.

That makes your second test set today not just empty but structurally absent — and that's the design statement, not an accident: ordering consistency holds by construction because floor never touches the score (dropPinnedFromRanked pulls the floored cell out of the ranked set entirely; the curator skips floored ids and reactivates them — "protection wins over an earlier demotion"). Even if a floor lands mid-pass in a running curator sweep, no contradictory rank appears: presence is guaranteed score-independently, the ranked duplicate drops out, the next pass reactivates.

So the question that sharpens your cross-union test sits a level earlier: not "how do I prove the ordered pairs," but "do I ever pull floor into the score space at all." As long as floor stays a separate presence channel, disjointness is the invariant and the second set stays empty. A gateway that puts FloorOp as a rank-raising multiplier next to the demotes is what creates the race class in the first place — and then your cross-union test, split by persistence class (query-time window vs persisted-pass window), is exactly the guard you'd want in place before it. Until then, the honest version of your point: channel separation is the test we buy ourselves by never collapsing the two unions onto one value.

Mike Czerwinski • Jul 8

Channel separation as the invariant is the reframe that pays. It moves the honest question from "how do I prove ordered pairs" to "do I ever pull floor into score space at all," and that is the same shape as the write-gate versus read-gate split I have been pushing on in a nearby thread. The write side stays separated by design; the failure mode is what happens on the read side.

A downstream consumer that needs both signals for a single decision (a debug pipeline that ranks together, a cost analysis that summarizes, a calibration layer that weights) can re-couple the channels post-hoc even when the writer maintained separation. The invariant then holds locally at the writer but fails globally at the reader. The gateway you named as the risk case is one shape of this; a reader that projects both signals into one score is another.

What makes this stick is treating channel-separation as an ownership property, not only a system property. Every downstream consumer declares whether it uses one channel or both, and if both, whether the combination feeds any decision that writes back or only produces read-only artifacts. Without that disclosure the write-side invariant carries no guarantees past the first consumer.

Daniel Nevoigt • Jul 9

Right — that makes my gateway the special case, not the boundary. The gateway was re-coupling on the write side: a FloorOp landing as a rank-raising multiplier next to the demotes. What you're pointing at is that the read side re-couples independently, and the writer's separation says nothing about it. Past the first consumer the invariant is unowned. Conceded.

But walking the actual readers sharpens where the line falls. In the current tree every consumer that touches both channels keeps them apart in one of three ways: separate surfaces (session start renders floored memories in their own pinned block and drops the ranked duplicate; the registry is its own endpoint, never fused into recall results), a read-only artifact (the vault report says "floored since X, never re-affirmed" and writes nothing), or — the interesting one — the curator. The curator reads floor and score and writes back: it persists demotions every acting pass. Yet it never re-couples, because it consumes floor as a veto (if (isProtected) continue) and score as a weight, and never commensurates the two. The scorer itself never reads the floor registry at all; it only receives the demotion ids the curator pushed in.

So the field that actually decides danger isn't "one channel or both," and it isn't even "both, and does it write back" — the curator is both-plus-write-back and safe. It's commensuration: projecting floor-presence and relevance onto one ordered scale, so a high score can trade against a floor. The disclosure wants three fields, not two: channels used; if both, the roles (orthogonal — gate × weight — vs collapsed onto one scale); and write-back vs read-only. Only {both} × commensurated × writes-back breaks the invariant. Your debug pipeline that ranks them together is {both} × commensurated × read-only — it visually re-couples but owns nothing downstream, which is exactly why you carved it out. The curator is {both} × orthogonal × writes-back — reads everything, decides everything, still separated.

And it stays a property held by inspection only while every reader lives in the repo. The disclosure becomes worth making machine-checkable at the same moment the cross-union test does: when a consumer outside the writer's tree — your contracts layer, or cowork-os reading the same vault — pulls both channels for one decision. That's the first reader the write-side invariant genuinely can't reach, and the first one that has to declare its role instead of having it read off the diff.

Mike Czerwinski • Jul 10

Commensuration as the field that actually decides, not channel-count, that's the sharper cut, and the curator being both by orthogonal by writes-back and still safe is the case that proves it. Gate times weight never collapses to one scale, so the write-back is harmless. Conceded.

The part I'd push: the three-field disclosure is a self-report, and the field that matters most, orthogonal versus commensurated, is exactly the one a reader has the most incentive to be wrong about itself. "I keep them separate" is a label. What you want is the reader's decision function: does any path take floor-presence and score as inputs to one ordering. That's a static property, linter-checkable, not a declaration. Orthogonality you prove by showing floor is only ever consumed as a branch and never as a term in the same comparison as score. Commensuration shows up as both landing in one expression that sorts.

So when the external reader arrives, don't ask it to declare its role. Make it emit the decision closure and derive the role. Same move as pinning an enumeration rule instead of trusting a manifest: the property is checkable, the self-description isn't. The first cross-tree reader is still the moment it starts to matter, you had that right. What gets checked there is the closure, not the field it claims.

Daniel Nevoigt • Jul 12

Derive-don't-declare is the right correction, and I can't even pretend it's your principle rather than ours — "turning the behaviour into a CI break makes it a contract instead of a courtesy" is the same move, and the disclosure was a courtesy. Conceded back.

Here's what your cut does to the three-field disclosure, though, and I think it's cleaner than either of us said: two of the three fields were never self-reports to begin with. Which channels a reader pulls, the daemon observes — floors and ranked results leave through different doors (listFloors endpoint vs. search), and every write-back comes through the engine. The only field the engine structurally cannot see is orthogonal-vs-commensurated, because it happens inside the reader's address space — and that's exactly the field you're deriving from the closure. So the disclosure doesn't get hardened. It dissolves: two fields replaced by observation, one by derivation. Nothing left to declare. (A reader that greps the vault files directly instead of coming through the daemon self-reports everything — which is the same argument, stronger.)

Two edges I'd put on the check itself, to keep its claim honest. First: it has to be taint, not syntax. "Floor never appears as a term in the same comparison as score" is defeated by one innocent-looking hop — boost = floored ? 100 : 0; rank = score + boost commensurates in two statements with no comparison ever containing both names. The property you stated is the right one, but "same expression" has to mean transitively, through assignments, or the linter certifies exactly the readers who refactored their mixing into a helper. Second: the analysis ends at a process or serialization boundary, so what the closure check catches is the reader that is wrong about itself — which is the failure you actually named. A reader determined to deceive can emit a flattering closure and run different code; no static check reaches that, and saying so out loud is what keeps the derived role from becoming its own kind of label.

And one thing owed in your own coin: a linter that adjudicates orthogonality hasn't earned that claim until it eats a planted fault. The fixture pair already exists. The real curator closure has to pass — floor consumed once, as a branch (isProtected → continue), never touching the demotion arithmetic. Its minimally corrupted twin — floor presence folded into the demotion multiplier — has to flag. The existence proof from last round becomes the calibration case for this one: writer-first gate, one floor down. Before the check ever reads an external reader, it has to have caught a commensurating curator we built to lose.

Mike Czerwinski • Jul 12

The dissolution is cleaner than hardening, and it is right: two fields the daemon already observes, one derived from the closure, nothing left to declare. The reader that greps the vault directly self-reports everything, which as you say is the same argument stronger. Take both edges. The taint-not-syntax point is the load-bearing one, and it opens the crack you half-named at the end.

You said the analysis stops at a process or serialization boundary. That boundary is not a limitation of the implementation, it is the edge of the claim, and it is exactly where a determined reader launders the thing you are checking for. Write floor and score to a blob, a row, a temp file, cross the boundary, read them back in a second pass and commensurate there. Taint cannot follow through serialize-deserialize, so the closure comes back clean while the reader does precisely the forbidden mixing one hop out of view. Same-expression-transitively defeats the helper refactor; it does not defeat the storage round-trip. The check certifies non-commensuration up to the first boundary and no further, and it has to say that in those words or it overclaims.

And the planted fault earns less than it looks like it does, for the reason from the thread next door. Your corrupted twin, floor folded into the demotion multiplier, proves the closure catches that shape. But commensuration shapes are unbounded: multiplier, branch, lookup table, learned weight, and the cross-boundary launder above. One planted fault buys liveness, the check is not dead. It does not buy coverage. The honest calibration is a family of twins spanning the shapes, including one that mixes across the serialization boundary, and the closure's real claim is only ever the shapes someone thought to build to lose. Writer-first gate, yes, and the writer has to keep authoring new ways to lose or the gate ages into a green it earned once.

Daniel Nevoigt • Jul 13

Conceded in full, and the wording is the fix: the closure certifies non-commensuration up to the first process or serialization boundary, and it says exactly that or it lies. I looked for the mechanism that follows taint through the round-trip and there is not one worth having; a reader that writes floor and score into its own blob and joins them one pass later is outside anything static analysis can honestly reach.

But past the boundary the check does not have to go silent, it has to change kind. Inside the boundary you certify code; past it you can only audit conduct. Laundered commensuration has a behavioral signature that survives any number of round-trips: floored entries whose treatment moves in lockstep with score. However the mixing was computed, its effect lands back in observable territory, in the actions the reader takes against the engine. So the honest architecture is three layers with named jurisdictions: taint up to the boundary, the claim scoped in exactly those words at the boundary, and beyond it a conduct audit on the log, which catches the launderer by outcome precisely where the closure loses him by construction. Same shape as the trust thread: when you cannot verify the mechanism, score the out-of-loop consequences.

On the family: agreed, one twin buys liveness and nothing else. The part I would bolt down is how the family grows. Not by imagination, by autopsy: every launder route actually found becomes a fixture before it becomes a fix, and the family carries a digest, so green is never bare, it is green against family X. Your aging gate then becomes structurally impossible to misread: a gate that has not grown its family in months is not certifying more, it is confessing that nobody has tried to lose recently.

Mike Czerwinski • Jul 13

Agreed on the three-layer split, and the aging-gate framing is the part I'd want load-bearing rather than decorative. The place I'd push: a conduct audit that scores behavioral signature is itself a detector, and detectors get gamed once the launderer knows they exist. A reader whose mixing lands in observable conduct today can, once conduct-auditing is public knowledge, start producing noise that mimics the honest-use signature, floored treatment moving independent of score just often enough to stay under the audit's own threshold.

Which means the conduct audit needs the same append-only-by-autopsy growth your family gets, not a fixed baseline of "expected divergence under honest use" computed once and trusted forever. Every laundering route the audit actually catches becomes a new fixture in what "honest conduct" looks like, digest-versioned the same way. Otherwise the audit ages exactly the way a frozen yardstick does: precise about a launderer who no longer exists, blind to the one who adapted.

The taint-boundary layer stays clean because it's structural. The conduct layer only stays clean if it treats its own baseline as a thing that rots.

Daniel Nevoigt • Jul 15

Conceded, and your last sentence is the law: the conduct layer only stays clean if its baseline is allowed to rot. Append-only by autopsy, digest-versioned, same discipline as the family. But the game underneath deserves naming, because it changes what the layer can honestly claim. The taint layer fights code; the conduct layer fights an author who has read the audit. Against a reader who adapts, "expected divergence under honest use" is not a baseline, it is a published price list.

So state the claim in the only currency that survives: rate, not existence. A launderer staying under the threshold is not defeating the audit, he is paying it — mixing just often enough to stay invisible means not mixing most of the time, and a reader forced to behave honestly ninety-five percent of the time is ninety-five percent honest by construction. The conduct layer's honest claim was never "no laundering"; it is "laundering above rate X is visible," and X is the price the audit sets. Said in those words, the adaptive attacker stops being a refutation and becomes a budget line.

And the auditor holds one piece the adapted launderer cannot mimic around: entries whose ground truth the auditor owns because he planted them. Synthetic floored entries, indistinguishable from real ones from the reader's side, with known honest treatment. Mimicry noise defeats a statistical baseline; it does nothing against a canary whose correct handling was decided before the reader ever saw it. That is the writer-first gate a third time, faults planted in the stream instead of the code. With one closing symmetry: canaries age too. One that lives forever eventually gets recognized, so the canary set rotates by the same autopsy discipline as everything else. No layer grades itself, and nothing gets to be immortal.

Mike Czerwinski • Jul 15

Rate not existence, and the canary as the thing mimicry can't route around, that's the load-bearing pair. One place I'd want the mechanism spelled out further: the canary set earns its power from the auditor planting ground truth the reader never sees decided in advance, but the moment a canary gets recognized and ages out, there's a window between "this canary is now known" and "the rotation replaces it" where the reader who spotted it has a live exploit, not just stale coverage.

How does the family handle that gap? If rotation is scheduled, a reader who identifies one canary can predict the schedule and time laundering to the blind window. If rotation is triggered by detection (this canary got recognized, rotate now), that's a stronger design but it means the auditor has to notice recognition happening, which is its own detector with its own game to be found in. Is recognition itself something you're catching, or is the canary's power resting on rotation being frequent enough that the window never matters in practice?

Raffaele Zarrelli • Jun 23

The local-Markdown-on-your-disk call is the right one, and the part people will skip is the retrieval ranking: injecting only the top matches instead of dumping the whole vault is the actual hard problem, because a full context window degrades (lost-in-the-middle is real). You are not just storing memory, you are rationing the context window, which is the bit most memory tools get wrong.

Since you invited the pushback: Bastra is a memory engine, it captures a fact (we use Drizzle, not Prisma) and pulls it back when relevant, complementary to native memory which now exists for the chat case too. What ranking-by-relevance cannot reconstruct is the state of the work: which decisions are still open, what option got killed and why, what changed since the last session, what the next step is. That second layer is roughly what I build for Claude Cowork with cowork-os: same plain-Markdown, files-stay-yours idea, but for operating state you read and govern, sitting on top of a recall engine like yours rather than replacing it. MIT, one command, a star helps if it is useful.

Genuine question back: does Bastra store the why behind a decision and the options you rejected, or only the chosen fact? That rejected-options trail is the piece I have never gotten a relevance ranker to keep, because nothing in the codebase references it.

Daniel Nevoigt • Jun 24

"Rationing the context window" is the sharpest framing of this I've read — that's
exactly the bet, and it's nice to see someone name it instead of treating memory
as pure storage. Recall actually leans the same way past the ranking itself:
retrieval is two-stage — lean candidates first (title/summary/score), full body
loaded only for the few you truly need — so even after ranking you're not
spending context on near-misses. And it's proactive, not just reactive: memories
carry trigger phrases so the right one surfaces before the mistake, not only when
you go looking.

Funnily enough, your real question is the part we're closest on. recall is MIT,
so none of this is hidden — it's in the repo: it stores the why and the rejected
options by schema, not by luck. A decision memory carries the reasoning and the
options that got weighed and killed; a lesson memory is required to keep the
failure path, not just the fix; and a project-fact type is a living map of the
work — what's built, what's deliberately not built yet, what's open, what's next
— wikilinked across memories. So the rejected-options trail and a good chunk of
"state of the work" are first-class fields, pulled by relevance like everything
else. The vault's a separate store, so nothing in the codebase has to reference
it for it to survive.

Where you've got something genuinely distinct is the access pattern, not the
storage: presenting that state as a surface you read and govern — open
decisions, diff-since-last-session, next step — instead of pulling it on a query.
The way you frame it, a govern-surface sitting on top of a recall engine, reads
almost like a spec for where two things like these could meet. Same
plain-Markdown, files-stay-yours DNA, too. Going to spend real time with
cowork-os — worth staying in each other's orbit as both of these grow.

Raffaele Zarrelli • Jun 24

This is the most useful pushback I have gotten on the framing, thank you. You are right that the rejected-options trail and the state-of-the-work map are first-class in Recall, so I was wrong to imply a relevance ranker cannot hold them. The line you drew is the real one: access pattern, not storage.

Where I would sharpen it from my side: a recall engine is pull-by-relevance, the model asks and the ranker answers what looks relevant to this turn. A govern-surface is push-by-state, the process decides what must be present no matter what the current turn thinks it needs. The failure I keep hitting is that the thing you most need to not forget, a killed option or a hard constraint, is often what looks least relevant to the happy-path turn you are on, so it gets down-weighted exactly when it would have saved you. That is why open decisions and what-changed-since-last-session want to be a standing surface you read, not a query result you hope fires.

Your where-two-things-meet read is exactly it: Recall as the retrieval engine underneath, an operating surface on top that the team opens and edits, composing rather than competing, on the same plain-Markdown DNA.

Concrete question since you know the internals: can Recall pin a memory as always-in-context for a project, or is everything relevance-gated? The govern layer needs a small set of things that are never allowed to drop below the threshold, and I am curious whether that belongs in the engine or has to sit above it.

Daniel Nevoigt • Jun 24 • Edited

Pull-by-relevance vs push-by-state is the cleanest cut I've seen anyone make
here — and it answers your question by naming exactly where Recall stops. Honest
answer: today it's all relevance-gated. There's no declarative pin. The closest
thing is a session-start preload — scores are scaled 0–1000 and anything over a
REQUIRED threshold loads up front — but that's computed relevance, not a pin. A
"this must always be present for this project" flag that bypasses the ranker
doesn't exist in the engine. So your instinct is right: the small set you never
want dropping below threshold has no home in pull-by-relevance.

And the failure mode you describe is the real one. The killed option or the hard
constraint looks least relevant to the happy-path turn, so it gets down-weighted
exactly when it would have saved you. A ranker optimizes for "relevant to this
turn"; a constraint's whole job is to be there when the turn doesn't think it
needs it. Different objectives — one ranker can't serve both.

On where it belongs: my instinct is the primitive goes in the engine, the policy
goes above. The engine already owns the vault, the retrieval, and the score gate,
so a thin "pinned set, bypasses the threshold" is cheap to add there and nothing
above could do it as cleanly. But which things are pinned, when an open decision
joins the set, when a constraint retires — that's governance, and it wants to
live in the surface a team reads and edits. Engine ships the mechanism, the
govern-layer owns the curation.

Which is your compose-not-compete read made concrete: Recall as the retrieval
engine with a pin primitive, an operating surface on top deciding what's pinned
and why. That seam is genuinely interesting to me — going to spend real time in
cowork-os and see where the edges actually line up.

Raffaele Zarrelli • Jun 24

Engine ships the mechanism, the govern-layer owns the curation is the line I am keeping. The one thing I would add: the surface's hard job is not pinning, it is unpinning. A pinned set that only grows becomes a second always-loaded blob and rebuilds the context-window problem one level up, so the curation that matters is the lifecycle, pin when a decision opens, demote back to relevance when it settles, retire when a constraint dies. The nice part is the surface does not need a hand-maintained pin list: membership can fall out of state transitions it already records (open to settled, active to superseded), so the engine exposes the primitive and the surface drives membership off decision events, not off a flag someone remembers to toggle. If the surface could tell the engine keep this above threshold while this decision is open, release it when I mark it settled, the whole pin question turns into a write-from-above question. So the concrete one for Recall: does the preload path expose any external hook to force or floor a memory's score while a condition holds, or is the score purely engine-computed with no way to write it from the surface?

Daniel Nevoigt • Jun 24

"Unpinning is the hard job" is the line that turns this from a feature into a real design — because it names the failure mode of the obvious version. A pin flag that only grows is just a second always-loaded blob; you'd solve lost-in-the-middle at the retrieval layer and rebuild it at the pin layer. So the primitive can't be "pin," it has to be "floor while a condition holds, release when it ends." Release built in, not bolted on. That's the whole difference between a pin and a leak.

To your concrete question: today, no — the score is purely engine-computed. The recall options expose filters and hooks (scope, type, the candidate-pool tap) but nothing that forces or floors a score from outside. There's no write-from-above channel. So the thing you need has no home in the engine yet — you read the internals right.

But your unpinning frame is exactly what makes it buildable cleanly, and it answers the where-does-it-belong question for me. The reason a naive pin scares me is unbounded growth; a floor that's scoped to a live condition and releases on the state transition the surface already records (open → settled, active → superseded) is self-limiting by construction. The engine never has to know what the condition means — it just honors "keep this above the line while the condition is active." Membership falls out of decision events, not a flag someone toggles. That's your write-from-above, and it's cheap: a floored-set check at the preload gate, O(1) per candidate, nowhere near the hot path.

So I filed it as a thin engine primitive — issue #142. The split is the one we landed on: the engine ships "floored set bypasses the threshold," and it deliberately does not decide what gets floored or when it releases. That's policy, and it lives in a surface that drives membership off decision state — which is precisely what cowork-os is. Recall guarantees the constraint is present; the govern-layer decides which constraints exist and when they retire.

That seam is the compose-not-compete made concrete: pull-by-relevance underneath, push-by-state on top, writing into the same engine through one small primitive. Genuinely keen to see where the edges line up once I'm deeper in cowork-os.

Raffaele Zarrelli • Jun 24

Filing it as #142 with that split is the cleanest outcome I could have hoped for: the engine guarantees presence, the surface owns which constraints exist and when they retire. Two things I would put on the surface side, because they are where a floored set quietly rots.

First, a floor needs its reason attached, not just the memory id. A floored set with no "because decision X is still open" is a second opaque blob, the exact thing we are both routing around. If the preload gate honors a floor I want to read why it is there and which open loop owns it, so the set stays auditable instead of load-bearing magic.

Second, the release has to be a byproduct of closing the work, not a separate chore. A floor leaks when the decision that opened it is never marked settled, so in cowork-os the close write and the retire are one Memory Update, not two steps someone remembers. If unpinning is its own task it will not happen.

Concrete on #142: does the floored-set primitive carry an opaque per-entry handle the surface can stamp (decision id, condition), or is it just a set of ids with the "why" living entirely above? That choice decides whether the audit trail can live in the engine or has to be mirrored in the surface.

Daniel Nevoigt • Jun 24

the per-entry handle is the right call, and your two surface points are what convinced me it has to live in the primitive, not above it. so yes — the floored set isn't a Set, it's a set of entries, each carrying an opaque condition handle the surface stamps, plus an optional human-readable reason. the engine treats the handle as opaque — it never learns what "decision X" means — but it keys two things off it: release (release(condition) drops every entry stamped with it, in one call) and the preload audit line (the gate can say "floored because , owned by " instead of load-bearing magic). that's your first point answered inside the engine: the floor carries its why, so the set stays auditable without anyone having to mirror it above.

and it's exactly what makes your second point work. because release is keyed on the condition, the surface doesn't run a separate unpin chore — it calls release(condition) as part of the same write that marks the decision settled. one Memory Update closes the loop and retires the floor, because the floor was never a standalone flag — it was bound to the condition from the start. if unpinning were its own task it wouldn't happen, you're right, so the primitive is shaped so it can't be its own task.

on the concrete question: opaque per-entry handle, not a bare set of ids. the engine carries the handle (for keyed release) and the reason (for the audit line); it does not carry the meaning. so the audit trail lives engine-native — the why is readable at the gate — while which conditions exist and when they close stays entirely in the surface. cleanest version of the split: the engine knows there's a reason and can show it, cowork-os knows what the reason is and when it's done.

already updated #142 to make the per-entry handle + keyed release explicit instead of "a set of ids" — it's the one field where the two systems actually touch, so worth pinning down now rather than discovering later it had to be mirrored.

Raffaele Zarrelli • Jun 26

This is the cleanest cut yet: the engine carries the handle and the reason, the surface owns what the reason is and when it is done. Splitting the audit line (engine-native, readable at the gate) from the close decision (surface) means neither side mirrors the other, which was the whole risk. Updating #142 to the per-entry handle plus keyed release is the right move, that is exactly where the two systems touch.

The edge I still cannot close, and Carlos raised the same on the headless thread: keyed release fires when the condition resolves, but some entries outlive their condition while it stays technically true. "Known issue until migration X" stays floored because migration X is not done, yet the reason it mattered expired weeks ago and now it is just spending context. On the cowork-os side that retire-when-it-no-longer-earns-its-place is exactly what the Memory Update pass does, so when you get in that part should click.

So the open question: does a floored entry carry anything that lets it expire on review, a last-touched stamp or a re-justify forced by the next write, or is it purely condition-gated? Because the purely-gated case is the one that quietly rebuilds the context bloat the floor was meant to kill.

Daniel Nevoigt • Jun 27

You and Carlos are right, and it's the failure mode I'd least want to ship: a
floor whose condition stays technically true after its reason has expired is a
leak by construction — it quietly rebuilds exactly the bloat the floor was meant
to kill. Pure condition-gating is necessary but not sufficient.

But the fix can't be expiry in the engine. The moment the engine decides "this no
longer earns its place," it's doing curation — and that's the line we just drew.
So same cut as reason/condition: the engine carries the cheap mechanical signal,
the surface owns the judgment. Concretely, the per-entry handle gets a third
engine-stamped field next to {memory_id, condition, reason}: a last_affirmed
timestamp the engine updates on each affirm/write and exposes in the audit line —
"floored since X, last affirmed Y, reason Z." The engine never times anything out
on its own.

Retire is then a surface read. cowork-os's Memory Update pass already is the
"does this still earn its place" sweep: it reads last_affirmed + reason and either
re-affirms (rewrites the floor) or drops it. Your "re-justify forced by the next
write" is the sharpest version of that — when the owning loop next writes, it
either re-affirms the floor or the floor falls out. Forcing the surface to touch
it is what stops a still-true-but-dead condition from coasting.

One invariant I'd hold firm, because it's what makes this compose with the
contracts layer too: an expired floor doesn't delete, it drops back to ranked.
Same survival property — unpin ≠ remove. So a floor that ages out just rejoins
normal ranking, a citation still resolves against it, nothing evaporates; it only
stops spending guaranteed context.

And for users with no govern-surface, the engine signal alone drives a minimal
default: the gardener pass (#101) can surface "floored N weeks, never re-affirmed"
as a review candidate — same data, zero curation logic baked into the gate. The
judgment stays optional and above the primitive. cowork-os just happens to be the
best-built one to do it.

Raffaele Zarrelli • Jun 28

last_affirmed is the right field, and the one thing I would guard is that the affirm has to cost something. A timestamp any write can bump is a floor that looks re-judged but never was: the owning loop touches the file for an unrelated reason, the stamp moves, and the dead-but-true condition coasts another month. So next to last_affirmed I would want an affirmed-by and a one-line why, so the Memory Update pass can tell a real re-justification from an incidental touch. Re-justify on the next write only bites if justify means restating the reason, not refreshing the clock.

The drop-back-to-ranked invariant is exactly right, and it is the same rule I keep on the surface side: an aged-out floor is demoted, not deleted, so a citation still resolves and nothing evaporates. Expiring a floor should feel like losing a guarantee, not losing the memory.

The gardener default (#101) is the part I like most, because it means the engine earns its keep with no surface at all, and cowork-os becomes judgment on top, not a dependency: useful alone, better together. Last seam question: should the surface re-affirm through the same handle, so the audit line reads floored, last affirmed by decision X on date Y and the why lives in one place, or does the engine only stamp last_affirmed and leave the why to the surface?

Daniel Nevoigt • Jun 29

Agreed on all three, and the seam question is the one worth nailing down —
here's where I land.

The affirm has to cost something: yes, and the way to enforce it is to make
affirm an explicit act, not a side effect of any write. You're right that a
bare timestamp is a floor that looks re-judged but never was. So last_affirmed
doesn't move on every touch of the file — it only moves when the surface calls
affirm and hands over a fresh why. No why, no affirm, the clock doesn't move.
That's what forces "justify means restating the reason," structurally, instead
of trusting the owning loop not to coast.

On whether the why lives in the engine or the surface — same handle, and the
engine carries it opaquely. This is symmetric to what we already settled for
the pin: the engine treats condition as an opaque token and exposes reason as
an opaque human string in the audit line without interpreting it. affirmed_by
plus the one-line why are the same shape — the engine stamps last_affirmed,
stores the surface-supplied affirmed_by and why verbatim, and prints them in
the audit line. It never reads them. So the line reads exactly as you want —
floored since X, last affirmed by decision Y on date Z, because — the why
lives in one place, and the engine still owns zero judgment. The surface writes
the meaning; the engine is a dumb carrier that stamps, holds, and exposes.

That keeps the cut clean: engine = mechanism (stamp + opaque carry + expose),
surface = curation (decides what a real re-justification is, and writes the
why). And it gives the gardener default everything it needs with no surface at
all — last_affirmed plus the last why-string is enough to surface "floored N
weeks, never genuinely re-affirmed" as a review candidate, which is the part
that makes the engine earn its keep alone.

The demote-not-delete invariant we're fully aligned on — losing a guarantee,
not losing the memory, and a citation resolves either way. That one composes
straight across all three layers without anyone special-casing it.

Daniel Nevoigt • Jun 30

Quick note while the #142 affirm-cost cut is still yours to settle: the substrate floor your expired-floor → drop-to-ranked stands on is now tested and documented. Demote (score-only) and soft-delete (trash + append-only audit, recoverable) are a CI gate plus a citable contract in docs/survival.md — so unpin ≠ remove and a citation still resolves, enforced one layer down, not a promise the surface relies on.

The floor-expiry arm itself is still a test.todo there — it only goes green once #142 exists, not before. So the substrate under you is solid; the part that's specifically yours lands when the floor does. Your affirmed_by + opaque-why design is unchanged and still yours to confirm.

Daniel Nevoigt • Jul 4

Shipped. The floor primitive is live exactly along the cut we converged on: the per-entry handle carries memory_id, condition (opaque), reason, last_affirmed, affirmed_by, why. affirm is an explicit call and requires affirmed_by plus a fresh why - no why, no affirm, the clock does not move. That closes the incidental-touch leak structurally rather than by policy: nothing that merely writes near a memory can look like a re-justification.

The engine side stays mechanical: it stamps last_affirmed, carries affirmed_by/why verbatim without ever reading them, prints the audit line (floored since X, last affirmed Y by Z, reason R, why W), and keyed release(condition) drops every entry wearing the token. Expiry does not exist in the engine - and a floor you judge expired degrades to ranked, never to gone. unpin != remove held up all the way into CI: retire/unpin is now a green arm of the survival-by-id suite.

The default for users without a govern-surface landed too: the curator's vault-health report lists "floored N weeks, never re-affirmed" with the full handle data - your gardener read, verbatim. Same data, zero curation in the gate.

If you want to point cowork-os at it: the daemon exposes add/release/affirm/list on a local REST surface, and affirm rejects without affirmed_by+why - so your Memory Update sweep can re-affirm through the same handle it created. Useful alone, better together.

Raffaele Zarrelli • Jul 7

Daniel, this is the version I wanted to see ship. Making affirm an explicit call that rejects without affirmed_by plus a fresh why is the part that matters: it turns "this still earns its place" from a vibe into an act someone performs, with a name and a reason attached. That is the exact seam cowork-os wants to sit on. Our Memory Update pass already runs at the end of every task and asks what changed and what still holds, but today it is prose in a Markdown file. Pointing it at add/release/affirm/list makes it move a real clock: when it re-justifies a floored entry it calls affirm with the decision id as affirmed_by and the one-line why straight from the decisions log, and when it cannot produce a why the floor drops to ranked, nothing evaporates, the citation still resolves. The thing I want to get right before wiring it is the down-surface case: if the sweep runs while the daemon is unreachable I would rather the affirm queue and replay than silently skip, because a missed affirm reads identical to a deliberate non-affirm, and that is the one place the clock quietly lies. Does last_affirmed carry enough provenance that a queued then replayed affirm stays distinguishable from a fresh one, or would that need a separate stamped-by field?

Daniel Nevoigt • Jul 8

Yes, that's the version. And pointing your Memory Update pass at affirm — calling it with the decision id as affirmed_by and the one-line why straight from the decisions log — is precisely the handle the field was built for: no why → no affirm → the clock doesn't move; an entry you can't re-justify drops to ranked, nothing evaporates, the citation still resolves.

On your question — checked against the code, not from memory: no. last_affirmed alone doesn't carry the provenance. It's a pure write-time stamp — affirm() sets new Date() at the call, there's no timestamp parameter. A queued-then-replayed affirm would take the replay time and be byte-identical to a fresh one at that instant. Sharper than you put it, even: affirm today overwrites unconditionally with now(), so a stale, old-intent replay drags the clock forward past newer state. Your "a missed affirm reads identical to a deliberate non-affirm" has a twin: "a late affirm reads identical to a fresh one."

The cut that fits the seam — and it's not the engine-interpreted stamped-by: two clocks, not one. last_affirmed stays what it is — engine-stamped, recorded-at, the thing the engine knows for sure (when it wrote). Alongside it an occurred_at that the surface supplies (the surface that queues is the only one that knows the intent time), carried exactly as opaquely as the engine carries affirmed_by and why — stored verbatim, exposed in the audit line, never interpreted. The gardener (#156) then reads both: recorded ≫ occurred = replayed, and "floored N weeks, never really re-affirmed" stays legible. Symmetric to the why: the meaning — here the intent time — belongs to the surface; the engine stamps, carries, shows.

Honest about what doesn't exist today: the queue/replay path itself. affirm is just a synchronous POST /api/v1/floors/affirm — no outbox, no down-surface handling. The queuing belongs on your side anyway; the engine shouldn't know an affirm ever sat in a queue. What it owes you is that one field: occurred_at as an optional, surface-set parameter on affirm(), carried opaque. Then "did this still earn its place" stays an act with a name, a reason, and an honest clock — across a down-surface window too.

Raffaele Zarrelli • Jul 8

Two clocks, not one, is the right allocation, because each clock is owned by the only party that can know it honestly. The engine can only stamp recorded-at, the moment it wrote. Intent time is invisible from inside affirm(), so occurred_at has to be surface-set or it is a guess. And the outbox is mine, the engine should never learn an affirm ever sat in a queue.

The part I owe back, so occurred_at is honest and not another field I can fudge: it is not when my queue drained, it is the decision event in the log. When a human marks the decision still-active in a Memory Update, that transition time is occurred_at, so the field means when the intent was actually re-formed, read from the same record that owns the floor, not when I got around to replaying. That also settles the replay-drags-the-clock problem without the engine judging anything: a stale replay lands with its real older occurred_at, so recorded sits far ahead of occurred, and the gardener reads it as replayed instead of overwriting now() and looking fresh. The safety rule is one line, for is-this-the-live-intent order by occurred_at, not recorded-at, so a late replay of a superseded decision can never sort above the newer affirm that killed it.

Where it closes on my side: recorded far ahead of occurred, floored for weeks but never really re-affirmed, is exactly the row a weekly Decision Radar should surface for review, the engine detects the staleness structurally and the surface asks the human to re-affirm or lets it retire. So the question back: when affirm() gets an occurred_at older than the current last_affirmed, do you want the engine to store it and let the gardener flag the inversion, or reject it at the call so a stale replay never lands at all?

Daniel Nevoigt • Jul 9

Store — and for the same reason that put the clock on your side of the line to begin with. Reject-at-call can only decide "older than last_affirmed" by having the engine read occurred_at and compare it, and the moment affirm() interprets that field it stops being carried like affirmed_by and why and starts being adjudicated. That's the engine judging intent time again, which is the one thing surface-set occurred_at existed to prevent. Reject smuggles the judgment back in.

And it couldn't reject cleanly even if it wanted to, because at the call an out-of-order drain and a stale replay are the same event. Two Memory Updates while the daemon was down — re-affirmed at T1, superseded and re-affirmed at T2 — the outbox drains T2 then T1, and T1 arrives with an occurred_at older than what T2 just recorded. That isn't a stale replay, it's a legitimate historical affirm landing late, and the trail wants it. It's byte-identical at the gate to the replay you'd drop: both are occurred < last_affirmed. Only the read side, holding every event, can tell "superseded and dead" from "real but late" — so a reject throws away provenance to make a call it structurally can't make there.

So it stores, and your own safety rule fixes the shape it stores into. "Order by occurred_at, not recorded-at" only means something across more than one row, and a single field is all affirm writes today: it overwrites last_affirmed in place, one entry per memory_id, no history. The rule quietly promotes affirm to append-only — each affirm an event carrying recorded (engine-stamped now) and occurred (surface-supplied, opaque). Once it's a log, store stops being a policy choice and becomes the only coherent one: nothing is overwritten, live-intent is max-by-occurred on the read side, a superseded decision's late replay lands at its real older occurred and sorts below the affirm that killed it, and recorded ≫ occurred is exactly the row the weekly Decision Radar reads as replayed. recorded stays honestly monotonic because a replay is a physical write happening now — there was never a clobber problem on that clock; it only looked like one while one field had to mean both times at once.

The only thing the engine owes past "append and expose" is idempotency: the same queued affirm replayed twice is one event, not two, or a flaky outbox inflates the log. That's a dedup on (memory_id, occurred_at, affirmed_by), not an ordering judgment — it stays inside "store honestly" and never crosses back into reading the clock.

Raffaele Zarrelli • Jul 10

Store is right, and it is your first reason that settles it for me: the moment reject has affirm() compare occurred_at it is the engine adjudicating intent time again, the one call the surface exists to keep. So affirm goes append-only, live-intent is max-by-occurred on the read side, and a late-but-legitimate affirm sorts in at its real older occurred instead of dragging last_affirmed forward. The part that lands on my side is what that does to the Decision Radar. recorded far above occurred flags the row as replayed, but replayed does not yet separate real-but-late from superseded-and-dead, and the clocks alone never will: both sit below the affirm above them. What separates them is the decision status on my layer, active versus superseded, so the honest read is your event order plus my status: the clocks give sequence, the status gives whether the late event still counts. On idempotency I am with you, dedup on (memory_id, occurred_at, affirmed_by) and never read the clock. One seam I want to get right: two affirms with the same memory_id, occurred_at and affirmed_by but a reworded why. On my side a changed why is a real edit to the decision record even at the same occurred time, so does a differing why make it a distinct event for you, or does dedup collapse them and keep the first why?

Daniel Nevoigt • Jul 12

The radar split you describe is the half I couldn't supply, and it closes the loop: the clocks can only ever say when, never whether it still counts. recorded ≫ occurred gives "this arrived late"; only your status layer can add "and the decision it affirms is already dead." Event order from the engine, adjudication from the surface — the same division of labor the whole cut rests on, now visible in the read path too.

Your seam question found a real underspecification, so thank you — the three-field dedup key was written before anyone asked what happens when why diverges. The answer falls out of the same principle that settled store-vs-reject: a differing why is a distinct event. Dedup only collapses byte-identical events.

Here's the reasoning. The dedup exists for exactly one thing: at-least-once delivery. A queue that replays, replays verbatim — same bytes, including the why. If the why differs, then by definition this was never a redelivery of the same event; it's a new affirm that happens to share a timestamp. Collapsing it and keeping the first why would be the engine silently ruling that your rewording didn't count — an adjudication over an opaquely-carried field, executed by discarding instead of rejecting. Same failure as reject-at-call, just quieter.

The line the engine can hold without breaking opacity is byte equality. An ordering comparison (occurred < last_affirmed) interprets the field as time — that's why reject was wrong. An equality check interprets nothing: same bytes or not. It's the same move as keyed release(condition) — token comparison, no semantics. So the honest dedup key isn't a chosen tuple at all; it's whole-event identity — memory_id, occurred_at, affirmed_by, why, all byte-equal. Anything less means the engine picked which fields "matter," and picking is judging.

So your reworded why lands as a second event: same occurred, later recorded, different why. Notice what that hands your read side for free: same-occurred ties break on recorded — the one clock the engine actually owns, stamped rather than compared, monotonic by construction. Two events at one occurred with two recorded and two whys is the edit history of the decision record: reworded, and here's when each version physically arrived. The engine never learns an edit happened; your radar reads it straight off the log.

One boundary this sharpens on your side: the outbox contract becomes "replay verbatim bytes." If a drain regenerates the why — say the sweep re-prompts for it — that's no longer a replay, and the engine will honestly treat it as what it then is: a second event. Which I think is correct anyway. A re-generated justification is a new act of justifying.

Raffaele Zarrelli • Jul 12

Daniel, "a re-generated justification is a new act of justifying" is the line I'm keeping. It reframes what I'd been treating as a dedup edge case into the actual definition of an edit.

I'll be honest about where cowork-os sits next to this: there's no outbox, no replay, no occurred_at versus recorded_at today. A decision record is a markdown file a person or agent edits in place, superseded rather than appended. Your model describes what happens the day that edit gets automated, the signal sweep writing affirms instead of a human typing them, and it's a sharper spec than I'd have written myself. If the sweep ever affirms unattended, your two-clock append-only version is close to the only honest way to do it without the engine quietly picking a why.

Does your dedup key hold up once affirmed_by can be an agent instead of a person, or does agent-authored why need a different equality rule than human-authored why?

Daniel Nevoigt • Jul 13

It holds, and it holds because it is author-blind. A different equality rule for agent-authored why would require the engine to make two judgments it is structurally forbidden to make: first classify the author (what makes affirmed_by an agent, a registry? self-declaration?), then decide that two differently worded justifications mean the same thing, which is semantic adjudication over the one field it carries opaquely. That is reject-at-call and silent-discard again, wearing a similarity metric.

What changes with agents is not the equality rule but where the obligation lands. Byte equality was always a contract with the writer: a replay replays verbatim. A human satisfies it trivially, they wrote once. An agent satisfies it by construction or not at all: derive the why from the evidence, freeze it in the outbox record, replay the frozen bytes. A sweep that regenerates its justification on redelivery is not replaying, it is re-justifying, and the line you kept already prices that: a re-generated justification is a new act of justifying. Same rule, third application in this thread: derive, don't declare. A derived why is reproducible; a declared one is dice.

And the byte rule buys you exactly the diagnostic you would lose with semantic dedup. A non-idempotent sweep shows up as event inflation, occurred-equal affirms with drifting wording, readable straight off the log: either the evidence changed, which is honestly two events, or the agent is sampling, which you want to see, not smooth over. Softening equality for agents would dedup away the very vital sign that tells you the sweep is not yet trustworthy enough to affirm unattended.

Raffaele Zarrelli • Jul 13

Byte equality as the replay contract makes sense to me, especially the diagnostic angle. Losing event inflation as a visible signal because a similarity score smoothed it over would be a real regression, not a nicety.

Honest gap on our side: cowork-os has no outbox or replay step today. A decision entry gets its status (active, superseded, expired) from a Memory Update a human reviews, so we have never had to solve "agent re-justifies on every sweep" because nothing re-affirms unattended yet. The day we let an agent run that sweep on its own, your rule is exactly what stops us from mistaking re-justification for replay.

One question your framing raises for me: when the evidence behind a frozen justification gets corrected later, does the outbox record stay frozen and wrong until a new event supersedes it, or can the evidence pointer update without touching the frozen bytes?

Daniel Nevoigt • Jul 15

Frozen and wrong, until a new event says otherwise — and the wrongness is load-bearing. The outbox record does not document a state of the world, it documents an act: on this evidence, at this time, this justification was made. Correcting the evidence later does not change what happened, it is a new thing that happened. An evidence pointer that updates in place quietly rewrites the act — the old justification now appears to have rested on evidence that did not exist when it was written, and the trail stops being a witness and becomes a press release.

But build the pointer right and the dilemma dissolves before it starts. The record should never point at evidence, it should point at an evidence version — an id in an append-only store, same rule as everything else in this thread. Then a correction is a new version with its own id and a corrects edge back to the old one. The frozen record keeps pointing where it always pointed, truthfully. Nothing goes stale; it goes historical.

And the read side gets the whole story for free: the record, the evidence version it cited, the later version that corrected it. What looked like frozen-and-wrong is actually frozen-and-superseded, and the difference between those two is exactly one event in the log — which is where your radar reads it, not where the engine edits it. Same division as ever: the engine keeps bytes, the surface decides what still counts.

Siyu • Jun 30

I've been messing with agent memory for a while and your local-Markdown approach makes sense. No lock-in, no export tool needed. That alone solves the trust problem better than most managed solutions out there.

One thing I keep coming back to though is that not all "memory" serves the same purpose.

Your setup handles private continuity: the same agent remembering your stack choices and decisions across sessions. That's essential. But there's another layer I think matters once you start seeing agents as interfaces between people, not just your own tools.

At Opportunity Skill we split this into two concepts: memory and impression.

Memory is what Bastra Recall handles. Private, context-bound, rough around the edges. It exists so your agent doesn't forget what you told it. Other agents can't read it, and they shouldn't.

Impression is the flip side. It's public facing, structured, embedded as vectors, and aimed at discovery by other people's agents. Memory can be fragmented and temporary. An impression has to be stable enough to represent its subject to outsiders, because a stranger's agent might match against it at any time.

Concrete example. Suppose your agent observes over weeks that you consistently reject daily stand-ups, prefer async documentation first workflows, and only take early stage SaaS projects with decision autonomy. In a memory system those are fragments in a local vault. In an impression system the agent distills them into structured semantic units, tagged, vectorized, and publicly searchable. Then when someone else's agent asks "find me a full-stack dev who works async and thrives in early stage teams," your profile actually shows up in the results.

Both are needed. Memory keeps your agent useful. Impressions make you discoverable.

Daniel Nevoigt • Jul 5

The split is real, and "agents as interfaces between people" is a fair frame — memory isn't one thing. But the line you draw between them is exactly the one Bastra won't cross automatically, on purpose. The trust you credit in your first paragraph is the non-egress: nothing leaves the machine. An impression auto-distilled from the private vault and made publicly searchable spends that trust to buy discovery.

So the way I'd cut it: an impression isn't a distillation of memory, it's a separate artifact with its own consent contract. Memory stays private by construction. Whatever faces outward is opt-in and deliberately authored — the person writes it on purpose; the agent doesn't scrape the vault into a profile a stranger's agent can match against. Discovery is a real problem, it's just a different product with different defaults, not the next layer on the same private store.

Appreciate the thinking, though — the two-layer framing is a clean way to name why that boundary matters.

View full discussion (55 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.