Rahul Reddy Talatala

Posted on May 31

prism-mem: Automatic Knowledge Extraction for AI Coding Agents

#ai #coding #agents #programming

AI coding agents are stateless between sessions. Every time you start a new session, the agent knows nothing about what you built yesterday, why you made certain decisions, or what you explicitly decided to stop doing. You write a CLAUDE.md by hand, it goes stale after a few sessions, and you spend the first few minutes of every session re-explaining context the agent should already know.

In a 3-session demo on a real project, prism-mem compressed 411,463 bytes of raw Claude Code transcripts into 11,707 characters of structured, queryable knowledge -- a 35x reduction -- with zero manual updates. The agent that started session 4 had accurate context from everything that happened in sessions 1, 2, and 3, including a database migration it never witnessed.

prism-mem does this by reading what actually happened instead of asking you to write it down.

🤔 The Problem With Manual Context Files

Here is what typically happens on a real project:

Session 1: You write a clean CLAUDE.md. Tech stack, decisions, conventions. It is accurate.
Session 3: You migrated from SQLite to PostgreSQL. You forgot to update CLAUDE.md.
Session 5: The agent confidently uses SQLite APIs because that is what the context file says.

Manual context files are snapshots. They reflect what you thought was important enough to write down at the moment you wrote them. They do not capture the dozens of small decisions that happen during a session: the library you tried and abandoned, the architecture pattern you switched away from, the refactor that changed the data model.

prism-mem reads session transcripts and git diffs after every commit, extracts structured knowledge as (subject, predicate, object) triples, links them into a graph, automatically detects when old facts are contradicted by new ones, and regenerates your context files. No manual input at any step.

📖 Where the Idea Comes From

Two research papers shaped how prism-mem works.

Memori (arXiv:2603.19935, March 2026) showed that storing memories as semantic triples instead of raw text blobs leads to 67% fewer tokens consumed at retrieval and more precise answers. The core insight: when an agent needs context, it does not need the full transcript of what happened. It needs the facts extracted from that transcript, in a compact structured form. Triples are that form.

A-MEM (arXiv:2502.12110, NeurIPS 2025) introduced the idea that memories should evolve and link to each other rather than just accumulate. It draws from the Zettelkasten method: every new note connects to existing notes, and old notes get updated when new information contradicts them. prism-mem's graph edge creation and staleness detection come directly from this idea. When snap-url uses PostgreSQL arrives, the old snap-url uses SQLite triple does not get deleted. It gets marked stale, and an edge connects the two, preserving the history of the decision change.

Neither paper had a cross-agent, open-source, installable implementation. prism-mem is the attempt to build one.

⚙️ How It Works

The main entry point is the prism crystallize command in prism_mem/cli.py. It runs four phases in sequence. Each phase is self-contained and testable independently.

Phase 1: Ingest

prism reads from two sources: Claude Code session transcripts and git history.

Session transcripts are handled by read_latest_session() in prism_mem/ingestion/session_reader.py. Claude Code stores every session as a .jsonl file under ~/.claude/projects/<encoded-path>/. Each line in the file is a JSON object representing a single event: a user message, an assistant response, a tool call result, or a summary. prism parses all of these, extracts the text and thinking blocks, and also merges in any subagent transcripts found at <session-uuid>/subagents/*.jsonl. The final output is a flat list of chunks, each with role, content_type, content, timestamp, session_id, and source.

Critically, prism uses watermark-based incremental ingestion. The session_watermarks table in SQLite stores the last processed timestamp for each session file. On subsequent runs, get_session_watermark() retrieves this timestamp and read_latest_session() skips any chunks with an earlier timestamp. This means you never re-process content you have already seen, and the pipeline stays fast even on long-running projects.

Git history is handled by read_git_diff(), read_git_log(), and read_git_head() in prism_mem/ingestion/git_reader.py. These run git diff HEAD~1 HEAD, git log --oneline -20, and git rev-parse HEAD as subprocesses. The HEAD commit hash is checked against the processed_commits table. If it is already there, the git phase is skipped entirely. The hash is recorded after successful processing via mark_commit_processed().

Phase 2: Extract

extract_triples() in prism_mem/extraction/extractor.py passes the combined session and git text to kg-gen, a library built specifically for extracting knowledge graphs from unstructured text. kg-gen handles chunking (8,000 characters per chunk), LLM calls via LiteLLM, and entity clustering. The output is a list of (subject, predicate, object) tuples.

A triple from a real session looks like this:

("snap-url", "uses", "PostgreSQL")
("snap-url API", "handles", "authentication via JWT")
("Redis", "stores", "rate limiting state")
("prism-mem", "embeds triples using", "all-MiniLM-L6-v2")

kg-gen runs with cluster=True by default. Clustering groups similar entity names so "PostgreSQL", "Postgres", and "the database" all resolve to the same canonical entity. If the LLM returns a malformed entity (which triggers a Pydantic validation error in kg-gen), extract_triples() catches the exception and retries the same chunk with cluster=False. Triples are still extracted cleanly in the fallback path, and cosine similarity linking in the next phase handles residual duplication.

Phase 3: Store and Link 🔗

This is where the knowledge graph actually gets built. For each extracted triple, ingest_triple() in prism_mem/linking/linker.py runs three operations in a specific order.

Step 1 (Store): store_triple() in prism_mem/storage/db.py inserts the triple into the triples table and calls embed() to compute a 384-dimensional vector embedding using sentence-transformers/all-MiniLM-L6-v2. The model runs entirely locally, no API call needed, and returns normalized float32 vectors. The embedding is stored as a binary blob in the embeddings table using struct pack format (384f). vec_from_bytes() unpacks it back to a numpy array at query time.

The choice of numpy over sqlite-vec is intentional. sqlite-vec requires enable_load_extension which is disabled in many Python builds including pyenv installs and the system Python on macOS. At prism's scale (thousands of triples), a numpy O(n) cosine scan takes around 50ms. The portability cost of sqlite-vec is not worth the index speed at this scale.

Step 2 (Link): search_similar() loads all existing embeddings from the database into a numpy matrix and computes cosine similarity against the new triple's embedding in a single matrix dot product. Because the embeddings are pre-normalized, the dot product is equivalent to cosine similarity. Pairs above the similarity threshold get recorded as edges in the edges table via create_edge(), with the similarity score stored as the edge weight.

Step 3 (Stale detection): check_and_mark_stale() queries for triples that share the same subject and predicate but have a different object. When it finds one, it calls mark_stale() to flip the stale boolean on the old triple. This is how prism knows that snap-url uses SQLite is no longer current after snap-url uses PostgreSQL is ingested.

The ordering here matters. Linking runs before staleness marking, so edges are created while the old triple is still active. This preserves the graph structure: the connection between the old SQLite triple and the new PostgreSQL triple stays visible in the graph as evidence of the migration.

Phase 4: Generate 📄

write_constitution() in prism_mem/constitution/generator.py calls select_top_triples() first. This filters all non-stale triples and scores each one using:

score = 1 / (1 + age_seconds / 86400) + confidence

The recency component decays once per day. The confidence component is set by kg-gen during extraction based on how many chunks corroborated the triple. The top 30 triples by score are selected, formatted as a bullet list of (subject) [predicate] (object) facts, and passed to the LLM via _call_haiku().

The LLM is called three times with different prompts, producing three files:

CLAUDE.md: Full structured documentation covering project description, tech stack, architecture, decisions, and conventions
.cursorrules: Under 40 lines of imperative bullets for Cursor IDE
AGENTS.md: Agent-oriented guide with repo layout, build commands, and behavioral rules

All three files are written to the project root. Every session that follows reads a context file derived from the actual current state of the project.

📊 Demo Results: 3 Sessions on snap-url

snap-url is a URL shortening service built with FastAPI and Python. You POST a long URL, get a 6-character short code back, and the redirect path hits Redis first for sub-millisecond cache lookups with a fallback to PostgreSQL. Click analytics are recorded asynchronously via background tasks without adding latency to the redirect.

The project was built across three sessions:

Session 1: Core API. POST /shorten generates short codes from a 62-symbol alphabet, GET /{code} handles redirects, SQLAlchemy models backed by SQLite for local development
Session 2: Click analytics. Added a Click model, a record_click background task, and a GET /stats/{code} endpoint that returns total clicks, last-24h count, and per-click metadata
Session 3: Redis caching layer and a PostgreSQL migration. Added cache.py with a 0.5-second timeout and graceful DB fallback, then swapped SQLite out for PostgreSQL in production

That last session is where the staleness story gets interesting.

Here are the numbers after running prism across all three sessions:

411,463 bytes of raw transcripts in. 11,707 characters of structured knowledge out. That is a 35x compression ratio.

Metric	Value
Raw session transcript size	411,463 bytes
Structured knowledge (active triples)	11,707 characters
Compression ratio	35x
Total triples extracted	700
Active triples	380
Stale triples (auto-detected contradictions)	320 (45.7%)
Graph edges (cross-session links)	416
Pipeline time	5 min 7 sec

What the 35x compression actually means: A raw session transcript is full of tool call outputs, retried commands, syntax errors, intermediate reasoning, and large file reads. prism throws all of that away and keeps only the structured facts: what the project uses, what was decided, and what changed. The 11,707-character output is what an agent actually needs at session start.

What 320 stale triples means: Nearly half of everything extracted was eventually superseded by newer information. This is not a signal of poor extraction. It is proof that the graph is tracking decisions over time. A flat CLAUDE.md would have accumulated all 700 facts with no way to tell which ones were still true. The staleness mechanism is what separates a live knowledge graph from a snapshot.

What 416 edges means: Facts from different sessions are linked by semantic similarity. A triple about the PostgreSQL migration connects to the schema design triple from two sessions earlier. A triple about Redis connects to the caching strategy decision. These links make the graph queryable across time, not just within a single session.

🔄 The Staleness Proof

This is the clearest way to see what prism actually does:

After session 1:

(snap-url) [uses] (SQLite)   ← active

After session 3 (PostgreSQL migration):

(snap-url) [uses] (SQLite)       ← stale, automatically detected
(snap-url) [uses] (PostgreSQL)   ← active
(snap-url) [uses] (Redis)        ← active

When check_and_mark_stale() encountered snap-url uses PostgreSQL, it found the existing triple snap-url uses SQLite matching on the same subject and predicate with a different object. It marked the old triple stale without any manual input, config change, or prompt engineering.

The CLAUDE.md written after session 3 says PostgreSQL. The one written after session 1 said SQLite. Neither required a human to update it.

🔌 Using prism Across Agents

prism ships a Model Context Protocol (MCP) server in prism_mem/server/mcp_server.py built with FastMCP. It exposes exactly three tools:

Tool	What it does
`get_context()`	Returns the current CLAUDE.md content for injection at session start
`query_knowledge(question)`	Embeds the question, runs `search_similar()` over the triple graph, returns top-5 triples with cosine similarity scores
`crystallize(session_id)`	Spawns `prism crystallize` as a background subprocess and returns immediately

query_knowledge is the most useful for mid-session use. An agent can ask "what database does this project use?" and get back a ranked list of relevant triples instead of having to read the whole context file.

Claude Code:

claude mcp add prism -- prism serve --project /path/to/your/project

Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "prism": {
      "command": "prism",
      "args": ["serve", "--project", "/path/to/your/project"]
    }
  }
}

Codex and other MCP-compatible agents: The protocol is the same. Point the server config at prism serve and the three tools are available.

The Graph UI

prism ui --project /path/to/your/project starts a FastAPI server at http://localhost:7823 with three views:

/constitution: The current CLAUDE.md with a Regenerate button to rebuild it on demand
/memory: A searchable table of all triples, filterable by text, with active and stale badges and truncated session IDs
/graph: An interactive force-directed graph rendered by Pyvis + vis.js. Click any node to see a sidebar with all contributing session IDs and the active/stale status of each connected triple

The graph view is especially useful for debugging extraction quality: you can see which triples are well-connected (high confidence, corroborated across sessions) and which are isolated (extracted once, low confidence).

🚀 Install and Setup

Install:

pip install prism-mem

Configure your LLM provider (Anthropic, OpenAI, Gemini, Ollama, and 20+ others via LiteLLM):

prism config set provider anthropic
prism config set model claude-haiku-4-5-20251001
prism config set api-key <your-api-key>

Run the first crystallization manually:

prism crystallize --project /path/to/your/project

This takes 3-10 minutes depending on session length. The bottleneck is kg-gen's LLM calls for extraction. On subsequent runs, watermark caching skips content that was already processed, so incremental runs are much faster.

Install the post-commit hook so crystallization runs automatically after every commit:

prism hook install --project /path/to/your/project

The hook calls prism crystallize in the background (&) so it does not block your commit. prism hook uninstall removes only the prism block, leaving any other hook content intact.

Start the MCP server for your agent:

prism serve --project /path/to/your/project

Open the graph UI:

prism ui --project /path/to/your/project

Storage layout:

~/.prism/
└── projects/
    └── <project-hash>/
        └── graph.db     ← SQLite, five tables, one per project

Everything lives locally. No cloud service, no telemetry, no shared state.

What This Enables

An agent that starts a new session and calls get_context gets back a CLAUDE.md that was built from what actually happened in previous sessions, scored by recency and confidence, with stale facts already removed. It does not matter if the last session was on Claude Code and this one is on Cursor. The knowledge graph is independent of the agent.

The 35x compression and 45.7% staleness detection rate from the snap-url demo are not the interesting numbers. The interesting number is zero: the number of times a human had to manually update a context file across three sessions and a database migration.

The agent that starts tomorrow does not start from scratch.

🛣️ What's Next

The current version handles one developer, one machine, post-commit extraction from Claude Code sessions. Here is what would make it significantly more useful:

Multi-agent ingestion. Right now prism only reads Claude Code .jsonl transcripts. Cursor, Codex, Gemini CLI, and Windsurf all produce their own session formats. Supporting them on the ingestion side (not just the MCP read side) would make the knowledge graph richer without any extra work from the user.

Retrieval frequency scoring. The current triple scoring formula uses recency and confidence. The original design also included retrieval_count -- how often an agent queried a given triple via query_knowledge. Triples that get looked up repeatedly are clearly load-bearing. Adding this signal would push the most-referenced facts to the top of the constitution more reliably than recency alone.

Cross-project knowledge. Each project currently lives in its own isolated graph.db. If two projects share a library, a pattern, or a team convention, there is no way to surface that. A lightweight cross-project index could let you query "have I solved this problem before?" across your entire project history.

Constitution diff view. After each crystallization, the only way to see what changed is to open CLAUDE.md and read it. A diff view showing which triples were added, which went stale, and how the constitution changed between runs would make it much easier to verify the pipeline is working correctly and trust the output.

Team support. The v1 scope is deliberately single-user, single-machine. The obvious next step is merging knowledge graphs from multiple developers on the same project -- each developer's sessions contribute triples, conflicts are resolved by staleness and confidence, and the team shares one constitution. This is the harder problem but also the most valuable one for larger projects.

Smarter conflict resolution. Right now staleness is binary: same subject and predicate, different object, the old triple loses. This works well for clear-cut migrations (SQLite to PostgreSQL) but less well for nuanced decisions where both old and new triples might be partially true. A confidence-weighted resolution that surfaces conflicts for human review rather than auto-marking would reduce false staleness on ambiguous facts.

prism-mem is open source. Source at github.com/rahult18/prism-mem. Install via pip install prism-mem.

Top comments (2)

Gilder Miller • May 31

This is a good snapshot of where multi-agent systems actually break in practice.
The failure point is usually not bad reasoning inside a single agent. It’s state divergence once work is split. Once that happens, every downstream step is just amplifying inconsistent assumptions.

The hard part is deciding how much disagreement the system is allowed to tolerate. If every mismatch becomes a hard overwrite, you lose nuance. If nothing gets resolved, you get contradiction buildup until the graph becomes unreliable.
So the real question I keep coming back to: are you designing this more like a database with strict consistency rules, or more like a versioned log where conflicting truths can coexist until something explicitly resolves them?

Harjot Singh • May 31

Automatic knowledge extraction for coding agents is going after the right bottleneck, because the thing that makes a coding agent feel junior is that it re-learns your codebase from scratch every session, the conventions, the gotchas, the why-we-did-it-this-way that lives in nobody's docs. Capturing that into durable memory is what would let it accrue expertise instead of resetting. The automatic part is the hard and valuable part, manual memory curation never gets done, so extracting knowledge from what actually happens (the diffs, the decisions, the corrections) is the only version that scales. Two things I'd watch, because they're where memory systems quietly go wrong. First, extraction precision: auto-capturing everything fills memory with noise that later drowns the signal at retrieval time, so being selective about what's worth remembering matters as much as capturing it. Second, staleness, code changes, so an extracted fact (this module does X, we always use Y) can silently become false, and a confidently-recalled outdated convention is worse than none, which means memories need provenance and a way to expire or get corrected when the code moves. Extract selectively, stamp it with where/when, and let it be revised. That durable-but-inspectable-and-forgettable instinct is core to how I think about agent memory in Moonshift. How is prism-mem deciding what's worth extracting, and does it re-validate stored knowledge against the current codebase or trust it once captured?