Saravanan Jaichandaran

Posted on Jun 5

The fifth layer is forming: six memory-tool authors wrote a Claude Code spec

#opensource #claude #knowledgegraph #aiagents

A GitHub issue on anthropics/claude-code is quietly becoming the most interesting agent-infrastructure document of the year. It is not a roadmap published by Anthropic. It is a spec being written by the people who actually build memory layers for Claude Code, contributing into a thread that the maintainers have not weighed in on.

The issue is #47023, titled "[PROPOSAL] Expose compact/session lifecycle hooks for external memory layers." It was opened on April 12 by Isaac (immartian), who builds Bellamem. Over the following eight weeks, six other tool authors arrived and contributed substantive comments. By early June it had become a working group writing a four-hook lifecycle spec, complete with payload shapes, edge cases, and a per-item provenance schema. None of the contributors were assigned. They showed up because they all kept hitting the same wall.

This piece names the wall, walks through who said what, and argues that the working group is implicitly defining a fifth layer of the agentic stack: runtime memory and shared state. The standard taxonomy (AGENTS.md for context, SKILL.md for knowledge, methodology frameworks for discipline, orchestration for scale) does not have a slot for it. Six independent builders are now proving the slot exists.

The proposal
Isaac opened the thread with a clean technical observation. Claude Code's existing hook surface covers tool-call lifecycle (PreToolUse, PostToolUse) and session boundaries (SessionStart, SessionEnd), but it has no first-class lifecycle event around compaction. The agent's working context can be reset by the model's compactor at any time, dropping facts that an external memory layer had carefully recorded, with no notification, no opportunity to inject a curated summary, and no way to record what was lost.

His proposal: four hooks, named PreCompact, PostCompact, SessionEnd, and a refined SessionStart that signals via a source enum (compact | resume | startup | clear). Each hook gets structured metadata. PreCompact lets the memory layer save state before compaction; PostCompact lets it inject the right context back in; the enum on SessionStart lets the memory layer differentiate "warm restart from a checkpoint" from "fresh start that should not inject anything."

That is the entire architectural seam external memory needs. It is also, notably, the seam that Anthropic's built-in memory primitive does not need, because the platform's own memory has internal visibility into the compactor.

The arrivals
What happened next is the interesting part.

wazionapps arrived a day later, on April 13, building NEXO Brain. He confirmed transcript_path is unreliable at PreCompact fire time and proposed a Stop + PreCompact two-hook contract: Stop writes a lightweight checkpoint on every turn, PreCompact reads it as a guaranteed-fresh fallback when transcript access fails. This is the kind of nuance you only know if you have run a memory layer in production long enough to see transcript_path come back null mid-session.

junaidtitan arrived on April 17 with a different angle. He is the curator of the awesome-claude-code and related lists, so he sees the entire memory-tool ecosystem from above. His contribution was the recognition that this is a coordination problem: every memory tool implements lifecycle adapter logic, and every one will rediscover the same edge cases unless the surface is documented.

Haustorium12 arrived on April 26 with Continuity v2, a 32-star memory layer that already ships PreCompact/PostCompact hooks against the current undocumented surface. He sharpened the SessionStart.source enum claim, arguing it needs formal semantics: compact triggers warm restart from checkpoint, resume triggers project state re-orientation, startup and clear deliberately inject nothing. Without that contract, every memory layer reverse-engineers the same branching, gets it subtly wrong, and ships silent regressions on Claude Code updates.

By the end of April, four memory-tool authors were converging on the same spec from four different production codebases.

I arrived on April 29 with world-model-mcp, a temporal knowledge graph with PreToolUse constraint enforcement and PostCompact auto-injection. My contribution was naming the defer enforcement tier (a third decision between deny and warn that pauses headless agents on recurring violations) and pointing at the v0.6.0 transcript-pointer primitive as evidence that the spec was implementable today on the existing hook surface, even partly.

kehansama arrived on June 3 with AgentRelay, a cross-agent memory middleware. He added three concrete spec items the thread had not named: context budget awareness (the memory layer needs to know how many tokens are available post-compaction), session provenance (knowing whether this is a fresh start, a resume, or a continuation), and project scope signals (which files and paths were touched in this session). He also proposed a fifth hook, compact-override, that lets the memory layer provide a replacement compaction summary instead of accepting the default LLM summarization.

Patdolitse arrived on June 4 with piia-engram, a 162-star local-first persistent identity layer that launched the same day. He made what is, in my reading, the load-bearing technical contribution of the entire thread. His argument:

The thread has converged on faithfulness of the summary, whether the replace-mode output dropped a key or hallucinated a value. That matters, but it's faithfulness of the summary to the transcript. There's a second axis underneath it, the trustworthiness of the underlying facts themselves. When a memory layer injects recalled context back into the post-compaction window, the model has no way to tell which of those items the user actually confirmed and which the layer inferred on its own and never had checked. Replace mode makes this sharper, because a confidently re-injected inference now carries the same authority as a user-stated fact, and the one place a human could have caught it, the original conversation, is exactly what just got flattened.

His proposed addition: additionalContext should carry per-item provenance, not just content. At minimum asserted_by (user or agent) and confirmation_state. So PostCompact can return "here are three things I'm restoring that you never actually confirmed" instead of silently promoting inferences to ground truth.

That comment turned the thread from a lifecycle-hook proposal into a memory-correctness spec. The two are related but they are different categories of guarantee. Patdolitse's framing collapses them.

What seven people independently agreed on
By June 4, with seven contributors writing into a single GitHub issue with no maintainer participation, the working group has implicitly produced a spec sketch:

Four lifecycle hooks: Stop (per-turn checkpoint), PreCompact (structured save before flattening), PostCompact (re-injection with receipts), and SessionStart with a normative source enum (compact | resume | startup | clear).

Two payload extensions: every hook payload carries structured metadata: context budget (tokens available), project scope signals (files touched), and session provenance (fresh/resume/continuation).

One correctness primitive: additionalContext carries per-item provenance, not just content. Each restored item declares asserted_by and confirmation_state. The model can then weight a re-injected inference differently from a re-injected user assertion.

One escape hatch: a compact-override hook that lets the memory layer provide its own compaction summary. The default summary is a lossy LLM compression; a memory layer with structured state can produce something higher fidelity.

None of these came from a single author. Each emerged from a specific production failure mode that one of the contributors hit and the others recognized. That is what makes the thread credible.

The fifth layer
Ry Walker, in a survey of agentic-skills frameworks published in February, articulated a four-layer aspirational stack: AGENTS.md for context, SKILL.md for knowledge, methodology for discipline, orchestration for scale. He flagged a gap in his own conclusion: "there's no standard for agent-to-agent coordination, shared state, or cross-agent review."

The #47023 working group is filling that gap. The fifth layer is runtime memory and shared state: the substrate that runs underneath the other four, persists what the agent learned, enforces constraints at the tool-call boundary, and survives the compactor.

Six of the seven thread participants ship production implementations of pieces of this layer:

Author Tool What it ships
Isaac (immartian) Bellamem Belief-graph memory, importance over recency
wazionapps NEXO Brain Persistent memory + identity across sessions
Haustorium12 Continuity v2 Session indexing with PreCompact/PostCompact hooks
SaravananJaichandar world-model-mcp Temporal knowledge graph with constraint enforcement
kehansama AgentRelay Cross-agent memory middleware (pre-public)
Patdolitse piia-engram Local-first persistent identity layer
Notice the variation. Belief graphs, session indexes, temporal knowledge graphs, identity layers, cross-agent middleware. Six different architectures, none competing with each other in any direct sense. The standardization they are asking for is at the lifecycle boundary, not the implementation: give us reliable hooks with structured payloads, and we will each pick the data model and storage substrate that fits our users.

This is the shape of a real category forming. Not a vendor war. Not a winner-take-all consolidation. Builders converging on shared infrastructure because they all need the same seams, and writing the seams together because nobody upstream has written them yet.

What this means for Anthropic
The thread has had no maintainer response in fifty-three days. That is not a complaint. Spec proposals on busy issue trackers often take months to be triaged, and the absence of a response is consistent with the proposal being substantive enough that the right Anthropic engineer has not had the budget to engage with it.

But the substance is now too coherent to ignore. The proposal is no longer one author asking for four hooks. It is six independent production memory layers, three of them with stars in the double or triple digits, converging on the same four hooks, the same payload extensions, and the same correctness primitive. Anthropic could ship this spec straightforwardly and the existing memory-tool ecosystem would adopt it within a release cycle. Or Anthropic could ship something different and the ecosystem will spend another six months reverse-engineering whatever they pick. The thread reads as evidence for the former.

There is also a non-trivial commercial pressure. Claude Managed Agents shipped a built-in memory primitive in April. The self-hosted sandboxes shipped in May with the explicit caveat that built-in memory is not yet supported for self-hosted execution. That caveat is a feature gap that external memory layers are positioned to fill, and the lifecycle hook spec is exactly the surface those external layers need to fill it reliably. The two announcements compose: an open lifecycle spec lets the open-source ecosystem fill the self-hosted memory gap without Anthropic having to ship a parallel implementation.

What this means for builders
If you are building anything in this category, the highest-leverage move available right now is reading #47023 in full and contributing the production edge case you have hit that the existing comments have not yet named. The thread is converging, not because anyone declared a deadline, but because each contribution sharpens the spec by a small specific amount. Add yours.

If you are building agents that use any of the six tools listed above, the spec emerging from the thread is the implicit contract that the next generation of these tools will be built against. Knowing the shape of the contract changes how you evaluate them. A memory layer that ships PreCompact + PostCompact + Stop + SessionStart(source) with per-item provenance is doing something architecturally different from a memory layer that ships generic vector retrieval.

If you are an analyst or VC writing about agent infrastructure, the four-layer aspirational stack is the wrong map. Walker's gap is the right map. The fifth layer is forming in public on a GitHub issue, and the people forming it are the authors of the production tools you should be tracking.

Postscript: how this artifact was produced
This piece is a synthesis of seven months of comments on a single GitHub issue. Every quote and architectural claim above is traceable to a specific named author and a specific comment in the thread. None of the contributors knew this synthesis was coming when they wrote their contributions. I am one of the contributors and I have done my best to flatten my own voice and represent the others' arguments at their strongest, but readers should treat that disclosure as load-bearing.

The thread itself is at https://github.com/anthropics/claude-code/issues/47023. It is open to new contributions.

DEV Community

The fifth layer is forming: six memory-tool authors wrote a Claude Code spec

Top comments (0)