AI agents are only as smart as the context you give them. OKF is a new open specification that packages your organizational knowledge as plain markdown files so any agent can read it without custom integrations or proprietary SDKs.
Every team building AI agents hits the same wall. The model is capable. The agent framework is set up. But the agent doesn't know anything about your organization. It doesn't know what your orders table means, what the churn_score metric formula is, or what the on-call runbook says to do when the pipeline breaks.
That knowledge exists. It's scattered across Confluence pages, Notion wikis, data catalog entries, Slack threads, and the heads of senior engineers. Getting it into an agent means building a custom integration for every source. Every team solves this from scratch.
Published on June 12, 2026, the Open Knowledge Format (OKF) is a vendor-neutral specification that solves this with the simplest possible approach: a directory of markdown files. ๐ฏ
๐๏ธ What OKF Actually Is
An OKF bundle is a directory of markdown files representing concepts: anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file.
That's the entire model. A directory of .md files with YAML frontmatter. The format is deliberately minimal: one required field (type), optional metadata (title, description, resource, tags, timestamp), and a free-form markdown body.
A concept document looks like this:
---
type: table
title: "orders"
description: "One row per customer order. Source of truth for revenue reporting."
resource: "postgresql://prod-db/ecommerce/orders"
tags: [revenue, core, sla]
timestamp: 2026-06-15T10:00:00Z
---
# orders
The `orders` table records every purchase event. It is the join root for all
revenue queries. Do not filter on `status = 'complete'` unless you specifically
want to exclude in-flight orders from the count.
## Key columns
- `order_id` - UUID primary key
- `customer_id` - FK to [customers](../customers.md)
- `amount_usd` - Total order value in USD
- `status` - Enum: pending / processing / complete / refunded
## Common joins
Joins to [customers](../customers.md) on `customer_id`.
Joins to [order_items](../order_items.md) on `order_id`.
## Known issues
The `created_at` column is UTC but the BI tool displays it in PST without
conversion. Always convert at query time.
Three things make this powerful:
1. File path is identity. The file tables/orders.md has the concept identifier tables/orders. No registry, no ID generation, no database.
2. Markdown links are the knowledge graph. A link from concept A to concept B asserts a relationship. The specific kind of relationship is conveyed by the surrounding prose, not by the link itself. Concepts linking to each other turns the directory into a navigable graph.
3. No tooling required. No API, no authentication, no SDK. The file system IS the API.
๐ Bundle Structure
knowledge-bundle/
โโโ index.md # Optional: directory listing
โโโ log.md # Optional: changelog
โโโ tables/
โ โโโ index.md
โ โโโ orders.md
โ โโโ customers.md
โ โโโ order_items.md
โโโ metrics/
โ โโโ churn_score.md
โ โโโ monthly_recurring_revenue.md
โโโ runbooks/
โ โโโ pipeline_failure.md
โ โโโ incident_response.md
โโโ apis/
โโโ payments_api.md
Two reserved filenames carry defined meaning at any directory level. An index.md file enumerates the directory's contents to support progressive disclosure, letting a human or agent see what is available before opening individual documents. A log.md file records changes in date-grouped entries, newest first.
๐งช Real-World Examples
Example 1: Data Team Knowledge Base
Your data team documents tables, metrics, and known data quality issues. An agent building a SQL query can read the bundle before generating the query, understanding column semantics, join patterns, and gotchas:
---
type: metric
title: monthly_recurring_revenue
description: Sum of all active subscription charges normalized to a monthly value.
tags: [revenue, finance, sla]
---
# Monthly Recurring Revenue (MRR)
MRR = SUM(subscription_amount / billing_period_months) WHERE status = 'active'
Source table: [subscriptions](../tables/subscriptions.md)
## Caveats
- Annual plans divide by 12. Do not count them as 12x monthly.
- Trial accounts are excluded (status = 'trial').
- See [churn_score](./churn_score.md) for related attrition metric.
An agent reading this before writing a revenue dashboard query avoids the annual-plan division error that junior analysts make constantly.
Example 2: Platform Team Runbook
Your SRE team writes runbooks as OKF concepts. An on-call agent can navigate the bundle to diagnose and respond to incidents:
---
type: runbook
title: ML Pipeline Failure Response
tags: [on-call, mlops, critical]
---
# ML Pipeline Failure Response
## Symptoms
- CloudWatch alarm: `prod-pipeline-5xx-errors`
- SageMaker Pipeline execution status: `Failed`
## Diagnosis steps
1. Check [pipeline logs](./pipeline_logs.md) in CloudWatch
2. Verify [feature store](../systems/feature_store.md) sync completed
3. Check [training data](../tables/training_data.md) for schema drift
## Resolution
If root cause is data schema drift:
- Update the [preprocessing component](../components/preprocess.md)
- Re-run pipeline manually via EventBridge
Escalate to [ML Platform team](../teams/ml_platform.md) if unresolved in 30 minutes.
The agent reads the runbook, follows cross-links to related concepts, and can diagnose or escalate - all from the same markdown files a human uses.
Example 3: API Documentation Bundle
Your platform team ships an OKF bundle alongside their service. Any agent integrating with the API reads the bundle for endpoint semantics, auth patterns, and rate limits, without needing to parse OpenAPI specs or call an MCP server:
---
type: api
title: Payments API
resource: "https://api.internal/payments/v2"
tags: [payments, core, pci]
---
# Payments API
REST API for all payment processing operations.
## Auth
Bearer token from [auth service](../services/auth.md). Tokens expire after 1 hour.
## Key endpoints
- `POST /charges` - Create a new charge. See [charge schema](./schemas/charge.md).
- `GET /charges/{id}` - Retrieve charge status.
- `POST /refunds` - Issue a refund. Requires `refunds:write` scope.
## Rate limits
100 req/min per API key. Burst to 200 for 10 seconds. Returns 429 on breach.
## PCI scope
All fields tagged `pci:true` must not be logged. See [PCI policy](../policies/pci.md).
๐ OKF vs RAG vs MCP
These three are complementary, not competing:
| OKF | RAG | MCP | |
|---|---|---|---|
| What it is | Curated knowledge format | Query-time retrieval from chunks | Runtime tool/data connection protocol |
| When knowledge is prepared | Ahead of time (authored) | At query time (derived) | Live (at request time) |
| Best for | Stable org knowledge, schemas, runbooks | Large unstructured corpora | Live data, actions, tool calls |
| Requires infrastructure | No (just files) | Yes (vector DB, embeddings) | Yes (MCP server) |
RAG re-derives knowledge at query time from raw chunks. An OKF bundle stores curated, cross-linked concepts that an agent reads and updates directly.
MCP governs how AI agents connect to tools and live data sources - the runtime plumbing. OKF does not replace MCP.
The practical pattern: use OKF for stable knowledge (table schemas, metric definitions, runbooks), RAG for large document archives, and MCP for live APIs and tool calls.
๐ง Getting Started in 5 Minutes
# Create a bundle
mkdir my-knowledge-bundle && cd my-knowledge-bundle
# Create your first concept
cat > tables/orders.md << 'EOF'
---
type: table
title: orders
description: One row per customer order.
resource: "postgresql://prod/ecommerce/orders"
tags: [revenue, core]
---
# orders
Source of truth for revenue. Joins to customers on customer_id.
EOF
# Validate conformance
npx okf-validate ./my-knowledge-bundle
Run the open-source OKF validator over your bundle: node validator/okf-validate.mjs ./your-bundle. It returns pass or fail, names every rule a file tripped, and exits with a code you can gate CI on.
Point any agent at the directory:
import pathlib
def load_okf_bundle(bundle_path: str) -> dict:
concepts = {}
for path in pathlib.Path(bundle_path).rglob("*.md"):
if path.name in ("index.md", "log.md"):
continue
text = path.read_text()
concept_id = str(path.relative_to(bundle_path)).removesuffix(".md")
concepts[concept_id] = text
return concepts
# Load bundle and inject into agent context
bundle = load_okf_bundle("./my-knowledge-bundle")
context = "\n\n".join(f"# {k}\n{v}" for k, v in bundle.items())
โ ๏ธ What to Know Before Adopting
It's v0.1 and explicitly experimental. OKF v0.1 is an early, experimental spec that Google calls a starting point, not a finished standard. Expect the spec to evolve. Don't build mission-critical tooling that assumes field names won't change.
Structural interoperability, not semantic. OKF already gives agents a shared way to find and read context. It does not yet give them shared semantics for what that context means. Two teams using OKF can exchange bundles. Whether their agents interpret them identically depends on conventions that aren't yet standardized.
No vendor lock-in by design. An OKF bundle lives in any git repository, on any filesystem, readable by any tool that can parse markdown. You can switch consumers without migrating your content.
Complements AGENTS.md and CLAUDE.md. Those convention files tell an agent how to behave in a repo. OKF describes a body of data and knowledge. They solve different layers of the same problem.
Audit-ready by default. Every OKF bundle has optional log.md files at any directory level. Changes are tracked in ISO 8601 format. For regulated industries, this means your knowledge base is audit-ready by default.
โญ๏ธ The Early Mover Case
The spec is Apache 2.0, free to use, and takes five minutes to learn. The honest case for adopting now is the same as schema markup a decade ago: it is cheap to ship, it makes your knowledge legible to the agents that are starting to answer questions about you, and early movers learn the format before it matters.
Start with one team's most pain-prone knowledge: the tables that always get misused, the metrics that always get mis-defined, the runbook that nobody can find at 2am. Write three OKF concept files. Point an agent at them. See what changes.
The spec, reference implementations, and sample bundles are on GitHub: github.com/GoogleCloudPlatform/knowledge-catalog
Found this helpful? Follow for more AI architecture deep-dives! ๐ฌ
Top comments (0)