Originally published on PrepStack.
Everyone's first RAG pipeline is the same four boxes: documents, chunk, vector DB, LLM. It demos in an afternoon and then quietly betrays you in production — stale answers, no relationships, no governance, and a model guessing from fragments. The fix is not a bigger vector index. It is to stop storing documents and start storing knowledge. That is Open Knowledge Format (OKF).
To be clear up front, because the title is deliberately provocative: OKF does not kill embeddings. Vectors still do the recall. What OKF kills is blind chunking — slicing opaque documents into context-free fragments and hoping cosine similarity reassembles meaning. On Mattrx, a multi-tenant marketing-analytics SaaS (.NET 9 + Azure SQL + a Python FastAPI AI service), replacing blind chunking with OKF + a Context Engine took the assistant's hallucination rate from 18% to 3% and stale-answer rate from 11% to 1.5%.
TL;DR
| Dimension | Documents → chunk → vector DB (before) | OKF + Context Engine (after) |
|---|---|---|
| Unit of knowledge | Opaque chunk of text | Typed, governed knowledge unit |
| Structure | None — chunks are islands | Metadata + relationships + schemas |
| Freshness | Snapshot, rots silently |
valid_until + live API refs |
| Rules | Buried in prose, ignorable | First-class data the engine enforces |
| Retrieval | Top-k cosine | Hybrid + vector + graph |
| Multi-hop questions | Unanswerable | Answered via relationships |
Results after the rebuild:
- Knowledge base restructured into ~11,000 OKF units (Markdown + metadata + relationships + APIs + schemas + business rules).
- Hallucination 18% -> 3%; faithfulness 0.96; answer-relevance 0.91.
- Context tokens/call 14k -> 3.5k — structure lets the engine attach the right thing, not everything.
- Outdated-answer rate 11% -> 1.5% (
valid_until+ metadata freshness). - Multi-hop questions unanswerable -> answered via graph retrieval.
- Deprecated-plan recommendations recurring -> 0 (business rules enforced as data).
The one mental shift: a chunk is a fragment of text with no identity, no owner, and no expiry. An OKF unit is a governed, typed, related piece of knowledge your context engine can reason about. Stop indexing text. Start indexing knowledge.
Part 1 — The OKF Knowledge Base
1. From documents to knowledge units. A chunk has no identity — you can't say who owns it, when it expires, or what it relates to. The OKF atomic unit is a Markdown body plus structured frontmatter: id, type, owner, version, valid_from / valid_until, visibility, plus relationships, apis, schemas, and business rules. The body is still embedded; the metadata is what the engine filters and ranks on. The moment a unit has valid_until, the engine can refuse to ground an answer in expired knowledge — which dropped outdated answers 11% -> 1.5%.
2. Relationships and schemas: the knowledge graph. Chunks are islands; vector similarity finds text that sounds alike, not text that's connected. OKF makes edges first-class (relates_to, supersedes, governed_by, depends_on) and defines schemas for structured units — together a knowledge graph over the corpus. Graph retrieval starts from semantic seeds and expands along those edges, which is how "if a Growth customer downgrades mid-cycle, how is it prorated?" (an answer spanning three documents) became answerable. Vectors find the door; the graph walks the building.
3. APIs and business rules: live data and governance as data. A snapshot like "Our plans are Starter, Growth, Scale…" is wrong the day the catalog changes, and a rule written in a paragraph is a suggestion the model can ignore. OKF units reference APIs (resolved at query time, through a governed tool layer) so answers reflect the current catalog, and link to business-rule units (enforcement: hard) that the engine injects as constraints. "Recommended a deprecated plan" went from a recurring complaint to 0.
Part 2 — The Context Engine
OKF is how knowledge is stored; the Context Engine is how it becomes a prompt.
4. Retrieval: hybrid + vector + graph. Top-k cosine is one signal — it misses exact-term matches (a campaign id, an error code) and related-but-not-similar units. The engine runs hybrid search (BM25 + vector) for recall, vector retrieval for nuance, and graph retrieval for connected units — all tenant-scoped and filtered to currently-valid units (onlyValid: true, metadata doing security and quality work for free). This combination is the core of the 18% -> 3% hallucination drop and 0.96 faithfulness.
5. Memory, tool calls, prompt assembly. Memory supplies what's already known; tool calls fetch the live data OKF units reference; prompt assembly packs everything to a hard token budget in priority order — constraints first (never trimmed), then memory, then ranked knowledge, then live data. Ordering is a design decision: a budget squeeze can never silently drop the rule that keeps an answer compliant. This is how context holds at 3.5k tokens (down from 14k) while accuracy improves.
When NOT to adopt OKF
It's an investment in structure — pay it where structure exists and matters. Skip it for a small/static corpus, genuinely unstructured knowledge, when no one can own authoring (rotten metadata is worse than none), or before you've nailed plain hybrid RAG + a cross-encoder rerank (OKF is the next layer, not the first). And don't big-bang it — convert high-value, high-churn domains first (we did billing, then product, then runbooks) and leave the long tail as plain chunks.
And the honest framing of the title: OKF does not replace embeddings. Vector retrieval is still inside the Context Engine doing recall. OKF replaces blind chunking and adds the structure, governance, and graph embeddings alone cannot provide. If someone sells you "vectors are dead," walk away.
The model to carry forward
Documents are what you have; knowledge is what the model needs. Give every unit an identity, an owner, and an expiry; model relationships explicitly (the graph answers the multi-hop questions vectors structurally can't); encode rules as data, not prose.
👉 The full article — the complete OKF format, the C# records and graph/retrieval/assembly code, the end-to-end query walkthrough, the migration checklist, and the full "when not to do this" — is on PrepStack:
Stop Chunking Documents: The Open Knowledge Format (OKF)
Top comments (0)