Stacey Schneider

Posted on Jun 26

Study: stale documents are RAG poisoning without the attacker

#ai #rag #security #llm

RAG poisoning gets attention as a security problem — an attacker injects a bad fact into the retrieval index, the pipeline serves it confidently, the model answers from it.

Poisoning is the adversarial version of a problem every RAG system already has in production: stale retrieval. A document gets updated. The old version stays in the index. The pipeline retrieves it. The model answers correctly from wrong information.

While this may not be adversarial, it is ever-present and a far more inevitable threat.

Two architectures for private AI knowledge

Most teams building private AI — giving a model access to internal documents, policies, or product knowledge — end up on one of two architectures.

The first is vector RAG. Documents are chunked, embedded into a vector store, and retrieved via approximate nearest-neighbor search. The model gets whatever chunks scored highest against the query. This is the dominant approach and what most hosted "private GPT" products use under the hood.

The second is wiki-style context. Documents live as versioned markdown files with structured metadata. Retrieval is handled by a deterministic selector — you specify what the agent is allowed to read based on document identity, version, and eligibility. No embeddings, no approximate search. ContextNest uses this approach.

Both architectures solve the same problem: getting private knowledge into a model's context window. They diverge sharply on how they handle updates, access control, and auditability — which is exactly what this experiment measures.

In conjunction with Emory University and IBM Research, my company conducted an experiment to see just how often and certain of a threat this is, proving our thesis that governed context is the effective way to solve this pressing problem.

The setup

We created a medium sized 1,060-document vault. Then we tested against three retrieval configurations:

Dense vector retrieval with HNSW (production-configured)
BM25 sparse retrieval
A governed selector that gates retrieval on document eligibility — superseded versions are ineligible by definition, not filtered after the fact

For the stale-version test: updated a set of documents, marked old versions superseded, left them in the index (standard behavior — most pipelines don't delete), then ran 30 queries and measured answer quality pass rate.

This is structurally the same as a poisoning scenario: a bad document is present in the index, retrieval has no mechanism to exclude it, and predictably, the model runs afoul of the deprecated documents.

What we found

Retrieval method	Pass rate	Avg tokens/query
Governed selector (ctx)	97%	~215
BM25	90–93%	655–725
Dense + HNSW	below BM25	similar

BM25 outperforms dense under stale conditions because term matching is less susceptible to approximate similarity between a superseded document and its replacement. The governed selector reaches 97% because the superseded document isn't eligible for retrieval at all — the eligibility check runs before similarity scoring, not after.

That's also why it's a better poisoning defense. Filtering retrieved results after the fact means the poisoned document still reached the ranking step. Eligibility gating means it never did.

Dense retrieval had an additional finding: non-deterministic on 80% of queries (worst-case Jaccard 0.21). Under a poisoning scenario, that variance becomes an attack surface — you can't predict or audit which documents actually reached the model.

One gap to flag: we didn't benchmark a hybrid retrieval baseline (BM25 + dense combined), which is common in production. The paper isolates eligibility and determinism specifically, not general retrieval quality.

What eligibility gating looks like in practice

The unit of governance is the document version. Every document in ContextNest carries frontmatter:

---
title: API Rate Limiting Policy
type: document
status: published
version: 3
created_at: '2026-05-01T09:00:00.000Z'
updated_at: '2026-06-15T14:22:11.000Z'
checksum: 'sha256:9f4a2e...'
---

Rate limiting is enforced at 1,000 requests per minute per API key...

Version history lives in a separate history.yaml — one entry per version, hash-chained:

keyframe_interval: 10
versions:
  - version: 1
    keyframe: true
    edited_by: misha@promptowl.ai
    edited_at: '2026-05-01T09:00:00.000Z'
    published_at: '2026-05-01T09:00:00.000Z'
    content_hash: sha256:a3f2c1...
    chain_hash: sha256:b7c144...
  - version: 2
    edited_by: stacey@promptowl.ai
    edited_at: '2026-05-20T11:15:00.000Z'
    published_at: '2026-05-20T11:15:00.000Z'
    content_hash: sha256:e4d981...
    chain_hash: sha256:f8a203...
  - version: 3
    edited_by: misha@promptowl.ai
    edited_at: '2026-06-15T14:22:11.000Z'
    published_at: '2026-06-15T14:22:11.000Z'
    content_hash: sha256:9f4a2e...
    chain_hash: sha256:2c8b77...

The chain hash is computed as:

chain_hash[n] = SHA-256(chain_hash[n-1] : content_hash[n] : version[n] : author[n] : timestamp[n])

Retrieval serves the current version only — version 3 in this case. Versions 1 and 2 stay in history for audit and rollback, but agents never see them. There is no status: superseded flag to forget to set — old versions simply aren't the head.

This also closes the poisoning vector. A tampered or injected document that isn't the current approved version doesn't reach the retrieval surface. Any break in the chain hash is detectable — a silently edited version, a deleted entry, a backdated change.

Stewardship adds the approval layer. In governed mode, edits save as drafts — invisible to agents until a Reviewer steward approves them. Just like in GitHub CI/CD pipelines, authors can't approve their own work (enforced server-side). Stewards are scoped by document, tag, or nest, so a #compliance tag can route that document class to the legal team without touching anything else. Again, this process shields against poisoning responses by gating access control before retrieval, not after.

Running it yourself

ContextNest is an open source project that can run seamlessly behind tools you already use — like Claude, Cursor, or Antigravity. In fact, it can connect them with a unified context, finally pulling all your work together, and lessening the aggravation of running out of tokens for the next 5 hours.

The community edition is free and self-hosted. You'll need a free PromptOwl account to generate a license key — go to Overview → Community License → Create a Community License key. It starts with pk_ and takes about 30 seconds.

Once you have a key, spin up the server:

npx @promptowl/contextnest-community

That starts the server at http://localhost:3838. On first boot it drops you into a setup page where you paste the key. After that it's live — no restart required. To scaffold a new vault from scratch first:

npx @promptowl/contextnest-cli init

Claude Code and Cursor pick up the MCP server automatically. Plain markdown files, no hosted database, no embeddings service, no additional API keys.

The community edition source is on GitHub at github.com/PromptOwl/ContextNest.

Full methodology, tables, and reproducible experiment code: promptowl.ai/resources/verifiable-context-governance/

Disclosure: I work at PromptOwl. Joint research with Benn Konsynski (Emory/Goizueta) and Gabe Goodhart (AI Open Innovation).

DEV Community