Paul Chen

Posted on Jun 15 • Edited on Jun 21

Google's OKF Is Proposing a Standard for How AI Agents Consume Domain Knowledge - Here's How Synthadoc Already Fits

#ai #agents #architecture #automation

On June 12, 2026, Google Cloud published Open Knowledge Format (OKF) v0.1 - a vendor-neutral specification for how AI agents should consume structured knowledge. If you've been thinking about how to give your agents reliable, domain-specific memory beyond RAG over raw documents, this is worth your attention.

We built Synthadoc to solve exactly this problem, and the OKF spec reads like a formal description of what we built. Independent convergence on the same design is usually a sign that both teams found something real.

This post breaks down what OKF is, where Synthadoc already aligns with it, how we're converging on the remaining gaps, and why we think Synthadoc is one of the most important implementations the OKF ecosystem needs.

What Problem OKF Is Solving

Organizations today fragment their knowledge across metadata catalogs, wikis, Notion pages, code comments, and internal documents. When you want an AI agent to reason over that knowledge, you face a choice: either build a custom pipeline for each source, or dump everything into a vector store and hope similarity search finds the right facts.

OKF proposes a third path: a portable, file-based format for curated knowledge that any AI agent can consume without vendor-specific APIs or custom parsers. The format is deliberately minimal - markdown files with YAML frontmatter, organized in directories whose paths represent concept identity. Standard markdown links between files create a traversable knowledge graph.

The only required field is type. Everything else - title, description, tags, timestamp, resource - is optional. The philosophy is producer/consumer independence: the same format works whether you generated it from a data catalog, a wiki, or a code annotation pipeline.

This is a strong design philosophy. It means OKF bundles are readable by humans in any text editor, versionable in git, and consumable by any agent that can read files.

Google's reference implementation

Alongside the spec, Google shipped three things: an Enrichment Agent that ingests BigQuery metadata and emits OKF bundles (built on Google's Agent Development Kit with Gemini as the backend), a Visualizer that renders any OKF bundle as a self-contained interactive HTML file using Cytoscape.js, and three pre-built sample bundles for GA4 e-commerce, Stack Overflow, and Bitcoin datasets.

The Enrichment Agent is worth studying. It is a working OKF producer, a concrete proof that the format is implementable, not just specifiable. But its source material is structured: BigQuery tables, schemas, foreign-key relationships, data catalog metadata. This is one important slice of the OKF producer space.

The other slice - the harder one for most organisations - is unstructured source material: internal PDFs, research papers, spreadsheets, meeting notes, product documentation. That is the gap Synthadoc fills. Google's Enrichment Agent and Synthadoc are complementary producers in the same ecosystem, covering different source types with the same output format.

	Google Enrichment Agent	Synthadoc
Source material	BigQuery datasets, data catalogs	PDFs, XLSX, PNG, plain text
OKF role	Producer	Producer
Lifecycle management	None	5-state (draft → archived)
Conflict resolution	None	Contradicted state + auto-resolve
Human editorial gate	None	draft → active transition
LLM backend	Gemini only	Gemini, Anthropic, OpenAI, Groq, Qwen, Ollama, Deepseek, Minimax

Synthadoc's Design Was Already There

Synthadoc compiles raw documents (PDFs, spreadsheets, images, Youtube video, plain text) into a structured wiki - a collection of markdown files in a directory, each with YAML frontmatter, linked together by wikilinks. That wiki is then queryable by both humans (through Obsidian or a web UI) and AI agents (through the API).

Here's what a Synthadoc wiki page looks like:

---
title: Alan Turing
tags: [biography, theory, cryptography]
status: active
confidence: high
created: 2026-04-08
sources:
  - file: turing-biography.pdf
    hash: sha256:9f3a...
    ingested: 2026-04-08
---

# Alan Turing

Alan Mathison Turing (1912–1954) was an English mathematician and computer scientist...

The [[von-neumann-architecture]] that underlies modern computers shares conceptual roots
with Turing's stored-program ideas. Turing also influenced [[programming-languages-overview]]
by formalising what it means for a problem to be computable.

When we mapped this against the OKF specification, the alignment was immediate:

OKF requirement	Synthadoc
Markdown + YAML frontmatter	Native format for all wiki pages
`index.md` reserved for navigation	`wiki/index.md` in every wiki — auto-generated and maintained
Hierarchical concept organization	`ROUTING.md` groups pages into topic branches (`People`, `Hardware`, etc.) - logical hierarchy via manifest rather than physical directory structure
Markdown links as knowledge graph edges	`[[wikilinks]]` compile to a traversable knowledge graph
`tags` field	First-class frontmatter field on every page
`timestamp` (ISO 8601)	`created` and lifecycle timestamps throughout
No proprietary SDK required	Standard files, readable by any tool
Git-compatible	Every wiki is a git-managed directory

The core insight OKF encodes that knowledge for agents should live in plain files with structured metadata, not locked in a database, is the same insight Synthadoc was built on.

Converging with OKF

Synthadoc was designed independently before OKF was published, so a few intentional design choices don't yet match the spec exactly. Here's what they are and how we're closing each gap.

Wikilinks → standard markdown links.
OKF uses [Alan Turing](people/alan-turing.md); Synthadoc uses [[alan-turing]]. The graph structure is identical - only the syntax differs. We're adding an OKF export mode that rewrites wikilinks to standard markdown links at export time, so any Synthadoc wiki can be published as a fully spec-compliant OKF bundle.

Adding the type field.
OKF's only required field is type , a classifier for what kind of knowledge the page contains (concept, person, technology, event). Synthadoc's frontmatter uses status for lifecycle state and confidence for epistemic weight - complementary metadata, not a replacement. We're adding type to the schema in the next release. With that one field, every Synthadoc wiki page will be OKF-compliant out of the box.

sources → resource alignment.
OKF's resource field is a single URL pointing to an external source. Synthadoc's sources array goes deeper: each entry includes the file path, SHA-256 hash, file size, and ingestion date, and individual claims carry line-level citations. This is a superset of what OKF specifies, not a conflict. The OKF export will surface the primary source URL in the resource field while preserving Synthadoc's full provenance in the body.

These three changes give Synthadoc full OKF compliance while keeping the richer metadata that makes the wiki useful for humans and auditors, not just agents.

Where Synthadoc Goes Further

OKF v0.1 explicitly lists several open design questions - problems the specification hasn't solved yet. These are interesting to us because they're exactly what Synthadoc was designed to handle.

Contradiction resolution. When two source documents say conflicting things about the same topic, OKF has no answer yet. Synthadoc flags the affected page as contradicted and can automatically propose a reconciled view using LLM-assisted lint. The resolution is recorded in the page's lifecycle audit trail.

Stale information management. OKF acknowledges this as an open problem. Synthadoc has a stale lifecycle state triggered when a source document changes after initial ingest. Scheduled lint runs detect drift and surface pages that need review.

Adversarial review. OKF doesn't address how to audit knowledge for unsupported claims or overconfident assertions. Synthadoc's lint pass runs an adversarial review on each page, flagging claims that lack source support and writing findings directly into lint_warnings frontmatter.

These aren't listed as criticisms of OKF v0.1 is explicitly a "starting point for community feedback." They're listed because they represent the class of problems any serious knowledge management system needs to solve as it matures.

Synthadoc as an OKF Producer: Bridging Agents to Domain Knowledge

Here's the framing we find most compelling.

OKF describes a consumption format , a way for AI agents to read structured knowledge. What it doesn't describe is how that knowledge gets created, validated, kept current, and made queryable. That's the production problem.

Synthadoc solves the production problem. It ingests raw documents from your organization, compiles them into a structured wiki, validates each page through adversarial lint, manages the page lifecycle as sources change over time, and serves the result through a query API.

The output of Synthadoc, a directory of markdown files with YAML frontmatter, linked by wikilinks, with an index.md, is naturally aligned with what OKF describes as agent-consumable knowledge.

This means you can think of Synthadoc wikis as OKF-compatible knowledge bundles that cover the full lifecycle:

If you're building an agent that needs domain expertise, a customer support agent that needs product knowledge, a research agent that needs a scientific literature summary - a coding assistant that needs your internal architecture decisions - Synthadoc turns your existing documents into a structured, validated, self-maintaining knowledge base the agent can trust.

Our Perspective: What the Standard Still Needs

OKF v0.1 is a strong starting point, and we think the direction is right. But having built in this space, we have a few opinions worth putting on record, not as criticism, but as a contribution to where v0.2 should go.

A format standard without a quality standard is garbage-in, garbage-out.
If any tool can produce an OKF bundle, agents will inevitably consume contradictory, stale, or unsupported content formatted as valid OKF. The spec defines how knowledge is structured, but not how it is validated. The ecosystem needs production tooling that enforces quality - adversarial review, conflict detection, lifecycle management - before knowledge reaches an agent. A format alone doesn't make a knowledge base trustworthy.

Agent memory needs trust signals, not just content.
OKF sepc currently gives an agent no way to know how confident to be in what it's reading. A page written two years ago from a single unverified source should carry different weight than one reviewed last week, cross-referenced against multiple documents, and cleared by an adversarial lint pass. confidence, status, and claim-level provenance need to become first-class concepts in the spec, not optional extensions, if agents are to make consequential decisions from OKF knowledge.

Domain-specific knowledge is where this matters most.
General facts are already baked into LLM weights. What agents actually need is proprietary, time-sensitive, domain-specific knowledge that cannot be in training data - internal architecture decisions, evolving product specifications, specialist research. This is precisely where OKF is most valuable, and also where freshness and accuracy matter most. A standard that doesn't address knowledge currency will struggle in exactly the use case it's most suited for.

The human editorial layer is non-negotiable for consequential decisions.
Fully automated pipelines - ingest everything, trust everything - break down when agents act on the results. Knowledge intended for agent consumption needs a defined editorial state: has a human reviewed this page and cleared it for use? Synthadoc's draft → active lifecycle transition is exactly this gate. We think OKF should define a concept of editorial provenance so consumers can distinguish machine-generated drafts from human-validated knowledge.

Synthadoc v0.8.0

The OKF alignment discussed in this post ships alongside Synthadoc v0.8.0, released June 12, the same day as OKF v0.1. The release focuses on three areas:

Multi-turn conversation : the web UI maintains full session history across turns. Follow-up questions resolve correctly through context injection and automatic phrase rewriting. Long sessions compress older turns into summaries rather than discarding them. Sessions persist server-side and survive page refreshes; the sidebar shows history as a collapsible tree with turn-count badges.
Qwen DashScope + Claude Opus 4.8 : Qwen cloud models (qwen-plus, qwen-max, qwq-32b) now work via DashScope's OpenAI-compatible endpoint. New DashScope accounts receive 1 million free tokens valid for 90 days. Claude Opus 4.8 is supported for maximum answer quality.
Conversational job operations : the Action Agent handles live job queries directly in chat. Ask for job status and receive a real-time table; job IDs appear as clickable chips for drill-down. Multi-status filters ("show failed and skipped jobs") work across turns.

If you find Synthadoc useful, a ⭐ on GitHub helps the project reach more people: https://github.com/axoviq-ai/synthadoc.

References

OKF v0.1 Specification : GoogleCloudPlatform/knowledge-catalog
Synthadoc : github.com/axoviq-ai/synthadoc

DEV Community