<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Saravanan Jaichandaran</title>
    <description>The latest articles on DEV Community by Saravanan Jaichandaran (@saravananj2294).</description>
    <link>https://dev.to/saravananj2294</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943319%2F74793650-b3ea-4bef-b3bb-96dd9417e39e.jpg</url>
      <title>DEV Community: Saravanan Jaichandaran</title>
      <link>https://dev.to/saravananj2294</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saravananj2294"/>
    <language>en</language>
    <item>
      <title>The fifth layer is forming: six memory-tool authors wrote a Claude Code spec</title>
      <dc:creator>Saravanan Jaichandaran</dc:creator>
      <pubDate>Fri, 05 Jun 2026 05:57:36 +0000</pubDate>
      <link>https://dev.to/saravananj2294/the-fifth-layer-is-forming-six-memory-tool-authors-wrote-a-claude-code-spec-23c5</link>
      <guid>https://dev.to/saravananj2294/the-fifth-layer-is-forming-six-memory-tool-authors-wrote-a-claude-code-spec-23c5</guid>
      <description>&lt;p&gt;A GitHub issue on anthropics/claude-code is quietly becoming the most interesting agent-infrastructure document of the year. It is not a roadmap published by Anthropic. It is a spec being written by the people who actually build memory layers for Claude Code, contributing into a thread that the maintainers have not weighed in on.&lt;/p&gt;

&lt;p&gt;The issue is #47023, titled "[PROPOSAL] Expose compact/session lifecycle hooks for external memory layers." It was opened on April 12 by Isaac (immartian), who builds Bellamem. Over the following eight weeks, six other tool authors arrived and contributed substantive comments. By early June it had become a working group writing a four-hook lifecycle spec, complete with payload shapes, edge cases, and a per-item provenance schema. None of the contributors were assigned. They showed up because they all kept hitting the same wall.&lt;/p&gt;

&lt;p&gt;This piece names the wall, walks through who said what, and argues that the working group is implicitly defining a fifth layer of the agentic stack: runtime memory and shared state. The standard taxonomy (AGENTS.md for context, SKILL.md for knowledge, methodology frameworks for discipline, orchestration for scale) does not have a slot for it. Six independent builders are now proving the slot exists.&lt;/p&gt;

&lt;p&gt;The proposal&lt;br&gt;
Isaac opened the thread with a clean technical observation. Claude Code's existing hook surface covers tool-call lifecycle (PreToolUse, PostToolUse) and session boundaries (SessionStart, SessionEnd), but it has no first-class lifecycle event around compaction. The agent's working context can be reset by the model's compactor at any time, dropping facts that an external memory layer had carefully recorded, with no notification, no opportunity to inject a curated summary, and no way to record what was lost.&lt;/p&gt;

&lt;p&gt;His proposal: four hooks, named PreCompact, PostCompact, SessionEnd, and a refined SessionStart that signals via a source enum (compact | resume | startup | clear). Each hook gets structured metadata. PreCompact lets the memory layer save state before compaction; PostCompact lets it inject the right context back in; the enum on SessionStart lets the memory layer differentiate "warm restart from a checkpoint" from "fresh start that should not inject anything."&lt;/p&gt;

&lt;p&gt;That is the entire architectural seam external memory needs. It is also, notably, the seam that Anthropic's built-in memory primitive does not need, because the platform's own memory has internal visibility into the compactor.&lt;/p&gt;

&lt;p&gt;The arrivals&lt;br&gt;
What happened next is the interesting part.&lt;/p&gt;

&lt;p&gt;wazionapps arrived a day later, on April 13, building NEXO Brain. He confirmed transcript_path is unreliable at PreCompact fire time and proposed a Stop + PreCompact two-hook contract: Stop writes a lightweight checkpoint on every turn, PreCompact reads it as a guaranteed-fresh fallback when transcript access fails. This is the kind of nuance you only know if you have run a memory layer in production long enough to see transcript_path come back null mid-session.&lt;/p&gt;

&lt;p&gt;junaidtitan arrived on April 17 with a different angle. He is the curator of the awesome-claude-code and related lists, so he sees the entire memory-tool ecosystem from above. His contribution was the recognition that this is a coordination problem: every memory tool implements lifecycle adapter logic, and every one will rediscover the same edge cases unless the surface is documented.&lt;/p&gt;

&lt;p&gt;Haustorium12 arrived on April 26 with Continuity v2, a 32-star memory layer that already ships PreCompact/PostCompact hooks against the current undocumented surface. He sharpened the SessionStart.source enum claim, arguing it needs formal semantics: compact triggers warm restart from checkpoint, resume triggers project state re-orientation, startup and clear deliberately inject nothing. Without that contract, every memory layer reverse-engineers the same branching, gets it subtly wrong, and ships silent regressions on Claude Code updates.&lt;/p&gt;

&lt;p&gt;By the end of April, four memory-tool authors were converging on the same spec from four different production codebases.&lt;/p&gt;

&lt;p&gt;I arrived on April 29 with world-model-mcp, a temporal knowledge graph with PreToolUse constraint enforcement and PostCompact auto-injection. My contribution was naming the defer enforcement tier (a third decision between deny and warn that pauses headless agents on recurring violations) and pointing at the v0.6.0 transcript-pointer primitive as evidence that the spec was implementable today on the existing hook surface, even partly.&lt;/p&gt;

&lt;p&gt;kehansama arrived on June 3 with AgentRelay, a cross-agent memory middleware. He added three concrete spec items the thread had not named: context budget awareness (the memory layer needs to know how many tokens are available post-compaction), session provenance (knowing whether this is a fresh start, a resume, or a continuation), and project scope signals (which files and paths were touched in this session). He also proposed a fifth hook, compact-override, that lets the memory layer provide a replacement compaction summary instead of accepting the default LLM summarization.&lt;/p&gt;

&lt;p&gt;Patdolitse arrived on June 4 with piia-engram, a 162-star local-first persistent identity layer that launched the same day. He made what is, in my reading, the load-bearing technical contribution of the entire thread. His argument:&lt;/p&gt;

&lt;p&gt;The thread has converged on faithfulness of the summary, whether the replace-mode output dropped a key or hallucinated a value. That matters, but it's faithfulness of the summary to the transcript. There's a second axis underneath it, the trustworthiness of the underlying facts themselves. When a memory layer injects recalled context back into the post-compaction window, the model has no way to tell which of those items the user actually confirmed and which the layer inferred on its own and never had checked. Replace mode makes this sharper, because a confidently re-injected inference now carries the same authority as a user-stated fact, and the one place a human could have caught it, the original conversation, is exactly what just got flattened.&lt;/p&gt;

&lt;p&gt;His proposed addition: additionalContext should carry per-item provenance, not just content. At minimum asserted_by (user or agent) and confirmation_state. So PostCompact can return "here are three things I'm restoring that you never actually confirmed" instead of silently promoting inferences to ground truth.&lt;/p&gt;

&lt;p&gt;That comment turned the thread from a lifecycle-hook proposal into a memory-correctness spec. The two are related but they are different categories of guarantee. Patdolitse's framing collapses them.&lt;/p&gt;

&lt;p&gt;What seven people independently agreed on&lt;br&gt;
By June 4, with seven contributors writing into a single GitHub issue with no maintainer participation, the working group has implicitly produced a spec sketch:&lt;/p&gt;

&lt;p&gt;Four lifecycle hooks: Stop (per-turn checkpoint), PreCompact (structured save before flattening), PostCompact (re-injection with receipts), and SessionStart with a normative source enum (compact | resume | startup | clear).&lt;/p&gt;

&lt;p&gt;Two payload extensions: every hook payload carries structured metadata: context budget (tokens available), project scope signals (files touched), and session provenance (fresh/resume/continuation).&lt;/p&gt;

&lt;p&gt;One correctness primitive: additionalContext carries per-item provenance, not just content. Each restored item declares asserted_by and confirmation_state. The model can then weight a re-injected inference differently from a re-injected user assertion.&lt;/p&gt;

&lt;p&gt;One escape hatch: a compact-override hook that lets the memory layer provide its own compaction summary. The default summary is a lossy LLM compression; a memory layer with structured state can produce something higher fidelity.&lt;/p&gt;

&lt;p&gt;None of these came from a single author. Each emerged from a specific production failure mode that one of the contributors hit and the others recognized. That is what makes the thread credible.&lt;/p&gt;

&lt;p&gt;The fifth layer&lt;br&gt;
Ry Walker, in a survey of agentic-skills frameworks published in February, articulated a four-layer aspirational stack: AGENTS.md for context, SKILL.md for knowledge, methodology for discipline, orchestration for scale. He flagged a gap in his own conclusion: "there's no standard for agent-to-agent coordination, shared state, or cross-agent review."&lt;/p&gt;

&lt;p&gt;The #47023 working group is filling that gap. The fifth layer is runtime memory and shared state: the substrate that runs underneath the other four, persists what the agent learned, enforces constraints at the tool-call boundary, and survives the compactor.&lt;/p&gt;

&lt;p&gt;Six of the seven thread participants ship production implementations of pieces of this layer:&lt;/p&gt;

&lt;p&gt;Author  Tool    What it ships&lt;br&gt;
Isaac (immartian)   Bellamem    Belief-graph memory, importance over recency&lt;br&gt;
wazionapps  NEXO Brain  Persistent memory + identity across sessions&lt;br&gt;
Haustorium12    Continuity v2   Session indexing with PreCompact/PostCompact hooks&lt;br&gt;
SaravananJaichandar world-model-mcp Temporal knowledge graph with constraint enforcement&lt;br&gt;
kehansama   AgentRelay  Cross-agent memory middleware (pre-public)&lt;br&gt;
Patdolitse  piia-engram Local-first persistent identity layer&lt;br&gt;
Notice the variation. Belief graphs, session indexes, temporal knowledge graphs, identity layers, cross-agent middleware. Six different architectures, none competing with each other in any direct sense. The standardization they are asking for is at the lifecycle boundary, not the implementation: give us reliable hooks with structured payloads, and we will each pick the data model and storage substrate that fits our users.&lt;/p&gt;

&lt;p&gt;This is the shape of a real category forming. Not a vendor war. Not a winner-take-all consolidation. Builders converging on shared infrastructure because they all need the same seams, and writing the seams together because nobody upstream has written them yet.&lt;/p&gt;

&lt;p&gt;What this means for Anthropic&lt;br&gt;
The thread has had no maintainer response in fifty-three days. That is not a complaint. Spec proposals on busy issue trackers often take months to be triaged, and the absence of a response is consistent with the proposal being substantive enough that the right Anthropic engineer has not had the budget to engage with it.&lt;/p&gt;

&lt;p&gt;But the substance is now too coherent to ignore. The proposal is no longer one author asking for four hooks. It is six independent production memory layers, three of them with stars in the double or triple digits, converging on the same four hooks, the same payload extensions, and the same correctness primitive. Anthropic could ship this spec straightforwardly and the existing memory-tool ecosystem would adopt it within a release cycle. Or Anthropic could ship something different and the ecosystem will spend another six months reverse-engineering whatever they pick. The thread reads as evidence for the former.&lt;/p&gt;

&lt;p&gt;There is also a non-trivial commercial pressure. Claude Managed Agents shipped a built-in memory primitive in April. The self-hosted sandboxes shipped in May with the explicit caveat that built-in memory is not yet supported for self-hosted execution. That caveat is a feature gap that external memory layers are positioned to fill, and the lifecycle hook spec is exactly the surface those external layers need to fill it reliably. The two announcements compose: an open lifecycle spec lets the open-source ecosystem fill the self-hosted memory gap without Anthropic having to ship a parallel implementation.&lt;/p&gt;

&lt;p&gt;What this means for builders&lt;br&gt;
If you are building anything in this category, the highest-leverage move available right now is reading #47023 in full and contributing the production edge case you have hit that the existing comments have not yet named. The thread is converging, not because anyone declared a deadline, but because each contribution sharpens the spec by a small specific amount. Add yours.&lt;/p&gt;

&lt;p&gt;If you are building agents that use any of the six tools listed above, the spec emerging from the thread is the implicit contract that the next generation of these tools will be built against. Knowing the shape of the contract changes how you evaluate them. A memory layer that ships PreCompact + PostCompact + Stop + SessionStart(source) with per-item provenance is doing something architecturally different from a memory layer that ships generic vector retrieval.&lt;/p&gt;

&lt;p&gt;If you are an analyst or VC writing about agent infrastructure, the four-layer aspirational stack is the wrong map. Walker's gap is the right map. The fifth layer is forming in public on a GitHub issue, and the people forming it are the authors of the production tools you should be tracking.&lt;/p&gt;

&lt;p&gt;Postscript: how this artifact was produced&lt;br&gt;
This piece is a synthesis of seven months of comments on a single GitHub issue. Every quote and architectural claim above is traceable to a specific named author and a specific comment in the thread. None of the contributors knew this synthesis was coming when they wrote their contributions. I am one of the contributors and I have done my best to flatten my own voice and represent the others' arguments at their strongest, but readers should treat that disclosure as load-bearing.&lt;/p&gt;

&lt;p&gt;The thread itself is at &lt;a href="https://github.com/anthropics/claude-code/issues/47023" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-code/issues/47023&lt;/a&gt;. It is open to new contributions.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>claude</category>
      <category>knowledgegraph</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>Five primitives I exercised end-to-end on world-model-mcp's own repo</title>
      <dc:creator>Saravanan Jaichandaran</dc:creator>
      <pubDate>Thu, 21 May 2026 05:36:21 +0000</pubDate>
      <link>https://dev.to/saravananj2294/five-primitives-i-exercised-end-to-end-on-world-model-mcps-own-repo-moo</link>
      <guid>https://dev.to/saravananj2294/five-primitives-i-exercised-end-to-end-on-world-model-mcps-own-repo-moo</guid>
      <description>&lt;p&gt;I shipped four releases of world-model-mcp in twelve days. v0.6.1 to v0.7.2. The pitch is "AI coding agents lose context across compaction, repeat the same mistakes, and hallucinate APIs that do not exist." Before I write more about it I wanted to demonstrate the primitives on a real codebase, with real outputs, not screenshots someone has to take my word for.&lt;/p&gt;

&lt;p&gt;The codebase is the project's own repo. I ran python -m world_model_server.cli setup (it auto-seeded 598 entities from the source), then ran scripts/demo_seed.py which inserts the small set of constraints, facts, and a compaction audit row that real PostToolUse / record_correction hook activity would write organically over one to two weeks of development with Claude Code installed.&lt;/p&gt;

&lt;p&gt;Every output block below is verbatim from the actual SQLite database after running the actual command. You can reproduce every output here by cloning the repo, running python -m world_model_server.cli setup, then python scripts/demo_seed.py. The script is idempotent and supports --dry-run and --reset.&lt;/p&gt;

&lt;p&gt;Install: pip install world-model-mcp. Source: github.com/SaravananJaichandar/world-model-mcp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. A learned constraint denying an edit at the PreToolUse boundary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a developer corrects the agent (rewrites console.log to logger.debug), the PostToolUse hook records the diff and infers a rule. Once that rule's violation count crosses the hard-threshold (severity=error, count ≥ 3), the next attempt is denied at PreToolUse before the tool runs.&lt;/p&gt;

&lt;p&gt;The constraint as the graph stores it:&lt;br&gt;
{&lt;br&gt;
  "rule_name": "no-console-log",&lt;br&gt;
  "severity": "error",&lt;br&gt;
  "violation_count": 5,&lt;br&gt;
  "description": "Use logger.debug() not console.log() in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers.",&lt;br&gt;
  "file_pattern": "*.ts",&lt;br&gt;
  "examples": [&lt;br&gt;
    {"incorrect": "console.log", "correct": "logger.debug"}&lt;br&gt;
  ]&lt;br&gt;
}&lt;br&gt;
The PreToolUse hook's actual JSON response when an edit containing console.log reaches it:&lt;br&gt;
{&lt;br&gt;
  "hookSpecificOutput": {&lt;br&gt;
    "hookEventName": "PreToolUse",&lt;br&gt;
    "permissionDecision": "deny",&lt;br&gt;
    "permissionDecisionReason": "Hard constraint violation: no-console-log (Use logger.debug() not console.log() in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers.). Violated 5 times previously."&lt;br&gt;
  },&lt;br&gt;
  "violations": [&lt;br&gt;
    {&lt;br&gt;
      "rule": "no-console-log",&lt;br&gt;
      "severity": "error",&lt;br&gt;
      "violation_count": 5,&lt;br&gt;
      "is_hard": true,&lt;br&gt;
      "is_defer": false&lt;br&gt;
    }&lt;br&gt;
  ]&lt;br&gt;
}&lt;br&gt;
Rules in CLAUDE.md or AGENTS.md are advisory and the model treats them as suggestions. Rules with a violation count and an enforcement boundary at the edit step are binding. Both have the same source — a developer correcting the agent — but very different effect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A regression warning that flags edits to a file with a recorded bug fix&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;get_related_bugs walks decision traces and prior bug-fix facts. When validate_change runs on a file with a recorded fix, the related-bugs query surfaces the prior fix and flags the proposed change.&lt;/p&gt;

&lt;p&gt;The project has a bug fix on file world_model_server/knowledge_graph.py:120-135 for content-hash backfill (the migration logic must run on every initialize(), not just when the column is created). I proposed a refactor that removed the backfill loop and ran the related-bugs check:&lt;br&gt;
{&lt;br&gt;
  "risk_score": 0.6,&lt;br&gt;
  "bugs": [&lt;br&gt;
    {&lt;br&gt;
      "bug_id": "12457e2a-5638-46ec-a9df-02fe13b9c104",&lt;br&gt;
      "description": "Bug fix: NULL content_hash backfill must run on every initialize() to cover post-migration inserts. Earlier code only backfilled when the column was created, which left merge_from rows un-hashed and broke dedup.",&lt;br&gt;
      "fixed_at": "2026-05-10T10:17:51.737046",&lt;br&gt;
      "critical_regions": [&lt;br&gt;
        {"file": "world_model_server/knowledge_graph.py", "lines": "120-135"}&lt;br&gt;
      ]&lt;br&gt;
    }&lt;br&gt;
  ],&lt;br&gt;
  "warnings": [&lt;br&gt;
    "Lines 120-135 preserve fix for 12457e2a-5638-46ec-a9df-02fe13b9c104: Bug fix: NULL content_hash backfill must run on every initialize() to cover post-migration inserts. Earlier code only backfilled when the column was created, which left merge_from rows un-hashed and broke dedup."&lt;br&gt;
  ]&lt;br&gt;
}&lt;br&gt;
The risk score is 0.6 because the proposed change touched a critical region without re-implementing the fix. The warning text quotes the original bug description directly so the agent (or the human) can see why the region matters, not just that it does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. A contradiction resolved by confidence + source-count weighting&lt;/strong&gt;&lt;br&gt;
The temporal layer assigns each fact a confidence score, a source_count, and a valid_at timestamp. When two facts about the same entity disagree, find_contradictions surfaces them with both sides' metadata, and resolve_contradiction picks a winner using the strategy you set.&lt;/p&gt;

&lt;p&gt;Two facts both pointing at the same entity (http_transport_port):&lt;br&gt;
{&lt;br&gt;
  "fact_a_id": "e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5",&lt;br&gt;
  "fact_b_id": "7fe854f9-d64a-4304-b43a-7d1b126c6ebb",&lt;br&gt;
  "fact_a_text": "HTTP transport listen port default is 8080",&lt;br&gt;
  "fact_b_text": "HTTP transport listen port default is 8765",&lt;br&gt;
  "similarity_score": 0.929,&lt;br&gt;
  "both_valid": true,&lt;br&gt;
  "reason": "same entity, similar text",&lt;br&gt;
  "confidence_a": 0.7,&lt;br&gt;
  "confidence_b": 0.95,&lt;br&gt;
  "source_count_a": 1,&lt;br&gt;
  "source_count_b": 3&lt;br&gt;
}&lt;br&gt;
resolve_contradiction(strategy="auto") picks the strategy with the largest signal gap. Here source count differs 3:1, so it picks keep_most_sources:&lt;br&gt;
{&lt;br&gt;
  "strategy": "keep_most_sources",&lt;br&gt;
  "winner_id": "7fe854f9-d64a-4304-b43a-7d1b126c6ebb",&lt;br&gt;
  "loser_id": "e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5",&lt;br&gt;
  "resolved_at": "2026-05-21T10:24:16.287368"&lt;br&gt;
}&lt;br&gt;
The loser is updated in place:&lt;br&gt;
{&lt;br&gt;
  "id": "e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5",&lt;br&gt;
  "fact_text": "HTTP transport listen port default is 8080",&lt;br&gt;
  "status": "superseded",&lt;br&gt;
  "invalid_at": "2026-05-21T10:24:16.285184",&lt;br&gt;
  "confidence": 0.7&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Queries that ask "what's true now?" silently skip the superseded fact. Queries that ask "what was true on 2026-05-18?" still see it. That's what the temporal layer earns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The PostCompact injection bundle&lt;/strong&gt;&lt;br&gt;
v0.7.0 added a PostCompact hook that re-injects the top constraints and recent canonical facts after the agent's context is compacted. The bundle is small (configurable, default ~10 constraints + 10 facts) and prioritized.&lt;/p&gt;

&lt;p&gt;The actual bundle returned by get_injection_context(event_type="PostCompact", max_constraints=5, max_facts=5):&lt;/p&gt;

&lt;h2&gt;
  
  
  Active constraints (top by violation count)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;no-console-log: Use logger.debug() not console.log() in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers. (violated 5x)&lt;/li&gt;
&lt;li&gt;check-twine-before-tag: Run &lt;code&gt;python3 -m twine check dist/*&lt;/code&gt; before tagging. Catches PyPI metadata errors before the tag is pushed; saves a retraction. (violated 5x)&lt;/li&gt;
&lt;li&gt;tag-before-upload: Always run git tag + git push --tags before twine upload. PyPI is permanent; an untagged upload pins a wheel to no git ref. (violated 2x)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Recent canonical facts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Never run twine upload before git tag. Always tag, push, then upload to PyPI so the published wheel maps to a real git ref.&lt;/li&gt;
&lt;li&gt;Cursor hooks.json uses object-keyed schema with version: 1 (integer), preToolUse / preCompact / beforeSubmitPrompt event names, failClosed (not fail_open), timeout in seconds.&lt;/li&gt;
&lt;li&gt;PostCompact and UserPromptSubmit hooks emit additionalContext to splice constraints + recent facts back into agent context after compaction.&lt;/li&gt;
&lt;li&gt;HTTP transport defaults to port 8765 in Dockerfile.http; do not change without updating docs/deployment/mcp-tunnel.md and docker-compose.yml together.&lt;/li&gt;
&lt;li&gt;BetaAbstractMemoryTool subclass lives at world_model_server/memory_backend.py; required by the Anthropic SDK Managed Agents memory path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That bundle is what gets spliced into the agent's working context as additionalContext after a compaction event. The same query also runs on UserPromptSubmit, biased toward whatever the user just asked about.&lt;/p&gt;

&lt;p&gt;The compaction audit log records what happened, queryable via the CLI:&lt;/p&gt;

&lt;p&gt;$ world-model audit-compactions --limit 5&lt;br&gt;
1 compaction audit rows&lt;br&gt;
  2026-05-21T10:38:01.606771  session=demo-session-1  pre=84320&lt;br&gt;
  post=22150  facts_injected=10  constraints_injected=3  event=PostCompact&lt;br&gt;
pre=84320, post=22150 — the compaction dropped ~62k tokens of context. The injection put 10 facts + 3 constraints back. The audit row exists so a human can later answer "what did the agent see vs what did it lose."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. A defer decision that pauses a headless agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;v0.7.0 added a defer enforcement tier between deny and warn. Warning-severity violations with violation_count ≥ 5 return permissionDecision: "defer" when the client advertises support, so headless agents pause instead of silently passing or hard-blocking. Clients that do not advertise support fall back to ask automatically.&lt;/p&gt;

&lt;p&gt;I have a check-twine-before-tag constraint with violation_count=5, severity=warning. When a Bash tool input matches it, the hook returns:&lt;br&gt;
{&lt;br&gt;
  "hookSpecificOutput": {&lt;br&gt;
    "hookEventName": "PreToolUse",&lt;br&gt;
    "permissionDecision": "defer",&lt;br&gt;
    "permissionDecisionReason": "Recurring warning-level violations (check-twine-before-tag). Headless agents should pause for confirmation."&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
Same payload, same constraint, but with supports_defer: false in the request — fall back to ask:&lt;br&gt;
{&lt;br&gt;
  "hookSpecificOutput": {&lt;br&gt;
    "hookEventName": "PreToolUse",&lt;br&gt;
    "permissionDecision": "ask",&lt;br&gt;
    "permissionDecisionReason": "Recurring warning-level violations (check-twine-before-tag). Headless agents should pause for confirmation."&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
The defer tier exists because the binary deny / warn choice forces you to either be too strict or too permissive. Recurring warnings that don't rise to error-level should pause for a human, not block, not pass.&lt;/p&gt;

&lt;p&gt;What this means if you are building agents&lt;/p&gt;

&lt;p&gt;The reason this works is not that the tool is clever. It is that the substrate — a temporal knowledge graph with facts, constraints, contradictions, and decision traces — captures the right shape of information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plain markdown rules in CLAUDE.md cannot answer:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many times has this rule been violated?&lt;/li&gt;
&lt;li&gt;Which fact is true now vs three sessions ago?&lt;/li&gt;
&lt;li&gt;Which constraints should I re-inject after compaction?&lt;/li&gt;
&lt;li&gt;Which prior fix does this proposed change risk re-introducing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A graph can. The cost is one MCP server, ~2,000 lines of Python, and a SQLite database that sits at ~155 KB empty (mine grew to about 2 MB after running this exercise plus the auto-seed). The payoff is a memory layer that survives compaction, enforces at the edit boundary, and tracks evidence chains back to the source.&lt;/p&gt;

&lt;p&gt;If you are building anything with Claude Code, Cursor, or any harness that supports MCP + hooks:&lt;/p&gt;

&lt;p&gt;pip install world-model-mcp&lt;br&gt;
cd /your/project&lt;br&gt;
python -m world_model_server.cli setup&lt;/p&gt;

&lt;p&gt;For Claude Managed Agents with self-hosted sandboxes (where Anthropic's built-in Memory primitive is not yet supported), v0.7.2 added streamable HTTP transport so the same 25 MCP tools also work behind an MCP tunnel.&lt;/p&gt;

&lt;p&gt;Source: github.com/SaravananJaichandar/world-model-mcp.&lt;/p&gt;

&lt;p&gt;If world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>claude</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
