DEV Community

Raphael de Almeida Southall
Raphael de Almeida Southall

Posted on

Why Your PKM Needs Prediction Errors

Here's a problem that doesn't get talked about enough in the PKM world: your notes go stale, and you have no idea which ones.

You wrote a note about how to deploy something, or how a system works, or what you understood about a topic. Six months later you've learned more. The mental model has shifted. But the note sits there, unchanged, filed under the same heading, returned by every search on that topic — and now it's actively misleading you. Or worse, it's misleading the AI assistant you've given access to it.

Most knowledge management tools don't have a concept of a note being wrong in context. A note either exists or it doesn't. It either matches a query or it doesn't. There's no signal that says "this note keeps showing up in places where it doesn't quite fit."

That signal exists in neuroscience. It's called a prediction error.

What a Prediction Error Is

In memory research, a prediction error is the gap between what the brain expected and what it actually retrieved. When you recall something and it doesn't match the current context — what you know now, what you're trying to do — that mismatch generates a signal. That signal triggers memory reconsolidation: the brain updates the memory to incorporate the new information.

Sinclair and Bhatt (PNAS, 2022) showed that prediction errors specifically disrupt hippocampal representations and force episodic memory updating. It's not a bug in memory — it's the mechanism that keeps memory accurate over time.

The key insight is that retrieval itself is evaluative. Every time a memory is recalled, it's also assessed for fit. High-fit recalls reinforce. Low-fit recalls flag for update.

Your PKM has no equivalent of this. Notes are retrieved, used, and filed away with no record of whether they fit the query that found them.

How NeuroStack Implements It

NeuroStack is a local MCP server that indexes your Markdown vault and exposes it to Claude Code and Cursor. It runs entirely on your machine.

During retrieval, when a note's embedding is far from the query embedding — cosine distance above 0.62 — that retrieval event is logged as a prediction error. The note was returned by the search, but it doesn't semantically belong in that context. That's the signal.

Over time, the vault_prediction_errors tool surfaces these:

{
  "total_flagged_notes": 12,
  "errors": [
    {
      "note_path": "infra/kubernetes-networking.md",
      "error_type": "low_overlap",
      "avg_cosine_distance": 0.71,
      "occurrences": 8,
      "sample_query": "how does service mesh handle mTLS"
    },
    {
      "note_path": "reading/distributed-systems.md",
      "error_type": "contextual_mismatch",
      "avg_cosine_distance": 0.65,
      "occurrences": 3,
      "sample_query": "CAP theorem and eventual consistency"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

That kubernetes-networking.md note has been retrieved 8 times in contexts where it scored poorly. It keeps appearing in searches but not fitting them. That's a note to review.

A Worked Example

Say you have a note from two years ago: "How Kubernetes schedules pods." You wrote it when you were learning the basics. Since then you've gone deep on affinity rules, taints, topology spread constraints. Your mental model has completely changed.

You never updated the note.

Now when you or Claude searches for anything related to pod scheduling, that note surfaces. The search retrieves it. But the embedding distance is high because your query is specific (affinity rules, topology constraints) and the note talks about general concepts (node selection, resource requests). The note is there, but it doesn't fit.

NeuroStack logs this. After a few queries it shows up in vault_prediction_errors with low_overlap. You open the note, see it's from two years ago, rewrite it.

That's the loop. Retrieval → mismatch detection → flag → review → update.

The Rest of the Architecture

Prediction errors are one feature. The full picture:

Hybrid retrieval: In full mode, FTS5 full-text search is combined with semantic embeddings (nomic-embed-text via Ollama) and cross-encoder reranking. Lite mode uses FTS5 only — no GPU, no Ollama needed.

Tiered depth: The MCP tools operate at three depths: triples (~15 tokens of Subject-Predicate-Object facts), summaries (~75 tokens), and full content. Claude requests the depth it needs rather than loading everything into context.

Knowledge graph: Wiki-links are indexed into a graph with PageRank scoring. vault_graph returns the neighbourhood of any note — what links to it, what it links to, its relative centrality in the vault.

Community detection: The optional Leiden algorithm clusters the wiki-link graph into thematic communities. vault_communities runs a global query across the cluster summaries rather than individual notes — useful for questions that span a whole topic area.

Hot notes: Notes marked status: active receive preferential linking weight, analogous to CREB-mediated neuronal excitability — the mechanism by which active neurons are preferentially recruited into new memory engrams.

Install and Try It

# Lite mode — FTS5 only, no GPU, no Ollama
pip install neurostack
neurostack init ~/path/to/your/vault
neurostack index
neurostack serve
Enter fullscreen mode Exit fullscreen mode

Then add to your Claude Code MCP config (~/.claude/.mcp.json):

{
  "mcpServers": {
    "neurostack": {
      "command": "neurostack",
      "args": ["serve"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

For full mode with embeddings and LLM summaries:

pip install "neurostack[full]"
ollama pull nomic-embed-text
ollama pull qwen2.5:3b
neurostack index
Enter fullscreen mode Exit fullscreen mode

After a few days of use, run neurostack doctor to check vault health, index status, and which notes have accumulated prediction errors.


The prediction error mechanism is the part I find most defensible about this approach. Everything else — search, graph, communities — those are useful retrieval features. But the drift detection is the only thing I've seen that closes the feedback loop: notes that are misleading retrieval eventually get flagged for review. Without that loop, a growing vault is also a growing liability.

Repo: https://github.com/raphasouthall/neurostack
PyPI: https://pypi.org/project/neurostack/
Website: https://neurostack.sh

Top comments (0)