DEV Community

Cover image for Knowledge as Code: A Pattern for Knowledge Bases That Verify Themselves
snapsynapse
snapsynapse

Posted on • Originally published at airef.snapsynapse.com

Knowledge as Code: A Pattern for Knowledge Bases That Verify Themselves

Documentation rots. You know this. You've seen internal wikis with pages last updated in 2023 that everyone still treats as authoritative. You've inherited a knowledge base where half the links are dead and nobody knows which facts have drifted.

The usual response is to assign someone to "maintain the docs." This works until that person gets busy, changes roles, or leaves. Then the decay resumes, silently, until the wrong person relies on the wrong fact.

What if the knowledge base could detect its own decay?

The pattern

Knowledge as Code applies software engineering practices to knowledge management. The knowledge lives in version-controlled plain text files. It is validated by automated processes. It produces multiple outputs from a single source. And it actively resists becoming outdated.

This pattern emerged from building the AI Capability Reference, an open-source site that tracks AI capabilities, pricing tiers, and platform support across 12 products. The data changes constantly: vendors update pricing, features move between tiers, platforms add or deprecate capabilities. A traditional static site would be stale within weeks. Knowledge as Code is how it stays current.

Six properties

Plain text canonical. Knowledge lives in human-readable, version-controlled files. No database, no CMS, no vendor lock-in. In this project: markdown and YAML files in data/.

Self-healing. Automated verification detects when the knowledge has drifted from reality. The system flags decay before humans notice it. In this project: a multi-model AI cascade cross-checks all data twice weekly, opens GitHub issues for human review.

Multi-output. One source produces every format needed. The results are human-readable, machine-readable, agent-queryable, and search-optimized. In this project: HTML site, JSON API, MCP server, 125 SEO bridge pages, sitemap, llms.txt.

Zero-dependency. No external packages. The build uses only language built-ins. Nothing breaks when you come back in a year. In this project: one Node.js script, no package.json, no node_modules.

Git-native. Git is the collaboration layer, the audit trail, the deployment trigger, and the contribution workflow. Issues, PRs, CI/CD, version history -- all through Git.

Ontology-driven. A vendor-neutral taxonomy of concepts maps to vendor-specific implementations. The structure is the data model. In this project: 18 capabilities map to 72 implementations across 12 products.

Why these compound

Any one of these is a reasonable design choice. The value is in the combination.

Plain text plus Git means anyone can contribute with no dev environment. Edit a file, open a PR. Plain text plus zero-dependency build means the project still builds in five years. Nothing to update, nothing to break.

Ontology plus multi-output means one correction fixes the site, the API, the MCP server, and every bridge page at once. Self-healing plus Git means verification results are tracked as issues with full audit trail. Nothing is silently changed. Zero-dependency plus self-healing means maintenance cost stays low even as the knowledge grows. The system scales through automation, not staffing.

The self-healing mechanism

This is the piece that makes Knowledge as Code more than "docs as code with a new name."

Twice a week, a three-layer multi-model verification cascade runs. Gemini, Perplexity, Grok, and Claude each cross-check every tracked data point: pricing tiers, platform availability, feature status, gating, regional restrictions. To prevent provider bias, models are skipped when verifying their own platform (Gemini doesn't verify Google features). A change only gets flagged when at least three models agree on a discrepancy.

Flagged changes become GitHub issues for human review. Nothing auto-merges. Every data point carries a Checked date; anything not re-verified within seven days is treated as stale.

Link integrity gets checked every week too, using both automated CI checks and a browser-based checker that runs through real Chrome to bypass bot protection.

This is anti-entropy for knowledge. In distributed systems like Dynamo and Cassandra, anti-entropy is the process that detects and repairs divergence from desired state. The verification cascade does the same thing, it finds where reality has moved away from what the files say and flags the gap.

Standing on shoulders

This pattern draws from established work:

File over app. Steph Ango's argument that durable digital artifacts must be files you can control, in formats that are easy to retrieve and read. Derek Sivers on plain text permanence. The permacomputing movement on resilient, minimal-dependency software.

Docs as code. Managing documentation with the same tools as software -- version control, pull requests, CI, plain text formats. Popularized by the Write the Docs community. Tom Preston-Werner (Jekyll, 2008), Eric Holscher (Read the Docs), Anne Gentle (Docs Like Code, 2017), Andrew Etter (Modern Technical Writing, 2016).

Living documentation. Cyrille Martraire's framework for documentation that evolves at the same pace as the system it describes. His approach generates docs from code annotations and tests. This pattern extends the idea: the knowledge isn't derived from code, and verification uses AI models rather than test suites.

GitOps. Coined by Weaveworks (2017). Git as single source of truth, with automated agents that detect drift between declared state and actual state, then reconcile. Originally for infrastructure, but it maps directly to knowledge:

GitOps (infrastructure) Knowledge as Code
YAML declares desired state Markdown declares what's true
Controller detects drift AI cascade detects drift from reality
Auto-reconciliation or alert GitHub issues for human review
Git as single source of truth Git as single source of truth

Multi-model verification. Academic foundations for using multiple AI models as cross-checking judges: Zheng et al.'s "Judging LLM-as-a-Judge" (NeurIPS 2023), Verga et al.'s "PoLL: Panel of LLM Evaluators" (2024), Du et al.'s "Multiagent Debate" (2023), Huang & Zhou's "LLMs Cannot Self-Correct Reasoning Yet" (ICLR 2024).

What we think is new

We haven't found prior art for these specific applications. If you know of any, we'd like to hear:

  • "Knowledge as Code" as a named pattern -- the "-as-code" lineage is well-established, but this specific application to maintained knowledge bases doesn't appear to be named
  • AI verification cascades for documentation -- multi-model evaluation exists in academic literature, but applying it as a scheduled process to maintain a knowledge base's factual accuracy
  • Multi-format output from the same plain text -- HTML, JSON API, MCP endpoints, and SEO bridge pages, all from markdown/YAML, with zero dependencies
  • Ontology-driven static site generation -- using a formal taxonomy to drive site structure, navigation, and programmatic pages

Try it

The entire project is open source. There is nothing to install.

git clone https://github.com/snapsynapse/ai-capability-reference.git
cd ai-capability-reference
node scripts/build.js
open docs/index.html
Enter fullscreen mode Exit fullscreen mode

Architecture: ARCHITECTURE_PATTERNS.md
Data model: ONTOLOGY.md
Verification system: VERIFICATION.md
Contributing: CONTRIBUTING.md

GitHub repo | Live site | Discussion


This is a working title and an active discussion. If you've seen this pattern elsewhere, named or unnamed, tell us. Built by SnapSynapse.

Top comments (0)