Suhas Mallesh

Posted on Jul 3

Open Knowledge Format (OKF): The Markdown Standard Your AI Agents Have Been Waiting For 📚

#ai #mlops #opensource #architecture

AI agents are only as smart as the context you give them. OKF is a new open specification that packages your organizational knowledge as plain markdown files so any agent can read it without custom integrations or proprietary SDKs.

Every team building AI agents hits the same wall. The model is capable. The agent framework is set up. But the agent doesn't know anything about your organization. It doesn't know what your orders table means, what the churn_score metric formula is, or what the on-call runbook says to do when the pipeline breaks.

That knowledge exists. It's scattered across Confluence pages, Notion wikis, data catalog entries, Slack threads, and the heads of senior engineers. Getting it into an agent means building a custom integration for every source. Every team solves this from scratch.

Published on June 12, 2026, the Open Knowledge Format (OKF) is a vendor-neutral specification that solves this with the simplest possible approach: a directory of markdown files. 🎯

🏗️ What OKF Actually Is

An OKF bundle is a directory of markdown files representing concepts: anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file.

That's the entire model. A directory of .md files with YAML frontmatter. The format is deliberately minimal: one required field (type), optional metadata (title, description, resource, tags, timestamp), and a free-form markdown body.

A concept document looks like this:

---
type: table
title: "orders"
description: "One row per customer order. Source of truth for revenue reporting."
resource: "postgresql://prod-db/ecommerce/orders"
tags: [revenue, core, sla]
timestamp: 2026-06-15T10:00:00Z
---

# orders

The `orders` table records every purchase event. It is the join root for all
revenue queries. Do not filter on `status = 'complete'` unless you specifically
want to exclude in-flight orders from the count.

## Key columns

- `order_id` - UUID primary key
- `customer_id` - FK to [customers](../customers.md)
- `amount_usd` - Total order value in USD
- `status` - Enum: pending / processing / complete / refunded

## Common joins

Joins to [customers](../customers.md) on `customer_id`.
Joins to [order_items](../order_items.md) on `order_id`.

## Known issues

The `created_at` column is UTC but the BI tool displays it in PST without
conversion. Always convert at query time.

Three things make this powerful:

1. File path is identity. The file tables/orders.md has the concept identifier tables/orders. No registry, no ID generation, no database.

2. Markdown links are the knowledge graph. A link from concept A to concept B asserts a relationship. The specific kind of relationship is conveyed by the surrounding prose, not by the link itself. Concepts linking to each other turns the directory into a navigable graph.

3. No tooling required. No API, no authentication, no SDK. The file system IS the API.

📁 Bundle Structure

knowledge-bundle/
├── index.md                    # Optional: directory listing
├── log.md                      # Optional: changelog
├── tables/
│   ├── index.md
│   ├── orders.md
│   ├── customers.md
│   └── order_items.md
├── metrics/
│   ├── churn_score.md
│   └── monthly_recurring_revenue.md
├── runbooks/
│   ├── pipeline_failure.md
│   └── incident_response.md
└── apis/
    └── payments_api.md

Two reserved filenames carry defined meaning at any directory level. An index.md file enumerates the directory's contents to support progressive disclosure, letting a human or agent see what is available before opening individual documents. A log.md file records changes in date-grouped entries, newest first.

🧪 Real-World Examples

Example 1: Data Team Knowledge Base

Your data team documents tables, metrics, and known data quality issues. An agent building a SQL query can read the bundle before generating the query, understanding column semantics, join patterns, and gotchas:

---
type: metric
title: monthly_recurring_revenue
description: Sum of all active subscription charges normalized to a monthly value.
tags: [revenue, finance, sla]
---

# Monthly Recurring Revenue (MRR)

MRR = SUM(subscription_amount / billing_period_months) WHERE status = 'active'

Source table: [subscriptions](../tables/subscriptions.md)

## Caveats

- Annual plans divide by 12. Do not count them as 12x monthly.
- Trial accounts are excluded (status = 'trial').
- See [churn_score](./churn_score.md) for related attrition metric.

An agent reading this before writing a revenue dashboard query avoids the annual-plan division error that junior analysts make constantly.

Example 2: Platform Team Runbook

Your SRE team writes runbooks as OKF concepts. An on-call agent can navigate the bundle to diagnose and respond to incidents:

---
type: runbook
title: ML Pipeline Failure Response
tags: [on-call, mlops, critical]
---

# ML Pipeline Failure Response

## Symptoms
- CloudWatch alarm: `prod-pipeline-5xx-errors`
- SageMaker Pipeline execution status: `Failed`

## Diagnosis steps

1. Check [pipeline logs](./pipeline_logs.md) in CloudWatch
2. Verify [feature store](../systems/feature_store.md) sync completed
3. Check [training data](../tables/training_data.md) for schema drift

## Resolution

If root cause is data schema drift:
- Update the [preprocessing component](../components/preprocess.md)
- Re-run pipeline manually via EventBridge

Escalate to [ML Platform team](../teams/ml_platform.md) if unresolved in 30 minutes.

The agent reads the runbook, follows cross-links to related concepts, and can diagnose or escalate - all from the same markdown files a human uses.

Example 3: API Documentation Bundle

Your platform team ships an OKF bundle alongside their service. Any agent integrating with the API reads the bundle for endpoint semantics, auth patterns, and rate limits, without needing to parse OpenAPI specs or call an MCP server:

---
type: api
title: Payments API
resource: "https://api.internal/payments/v2"
tags: [payments, core, pci]
---

# Payments API

REST API for all payment processing operations.

## Auth

Bearer token from [auth service](../services/auth.md). Tokens expire after 1 hour.

## Key endpoints

- `POST /charges` - Create a new charge. See [charge schema](./schemas/charge.md).
- `GET /charges/{id}` - Retrieve charge status.
- `POST /refunds` - Issue a refund. Requires `refunds:write` scope.

## Rate limits

100 req/min per API key. Burst to 200 for 10 seconds. Returns 429 on breach.

## PCI scope

All fields tagged `pci:true` must not be logged. See [PCI policy](../policies/pci.md).

🔄 OKF vs RAG vs MCP

These three are complementary, not competing:

	OKF	RAG	MCP
What it is	Curated knowledge format	Query-time retrieval from chunks	Runtime tool/data connection protocol
When knowledge is prepared	Ahead of time (authored)	At query time (derived)	Live (at request time)
Best for	Stable org knowledge, schemas, runbooks	Large unstructured corpora	Live data, actions, tool calls
Requires infrastructure	No (just files)	Yes (vector DB, embeddings)	Yes (MCP server)

RAG re-derives knowledge at query time from raw chunks. An OKF bundle stores curated, cross-linked concepts that an agent reads and updates directly.

MCP governs how AI agents connect to tools and live data sources - the runtime plumbing. OKF does not replace MCP.

The practical pattern: use OKF for stable knowledge (table schemas, metric definitions, runbooks), RAG for large document archives, and MCP for live APIs and tool calls.

🔧 Getting Started in 5 Minutes

# Create a bundle
mkdir my-knowledge-bundle && cd my-knowledge-bundle

# Create your first concept
cat > tables/orders.md << 'EOF'
---
type: table
title: orders
description: One row per customer order.
resource: "postgresql://prod/ecommerce/orders"
tags: [revenue, core]
---

# orders

Source of truth for revenue. Joins to customers on customer_id.
EOF

# Validate conformance
npx okf-validate ./my-knowledge-bundle

Run the open-source OKF validator over your bundle: node validator/okf-validate.mjs ./your-bundle. It returns pass or fail, names every rule a file tripped, and exits with a code you can gate CI on.

Point any agent at the directory:

import pathlib

def load_okf_bundle(bundle_path: str) -> dict:
    concepts = {}
    for path in pathlib.Path(bundle_path).rglob("*.md"):
        if path.name in ("index.md", "log.md"):
            continue
        text = path.read_text()
        concept_id = str(path.relative_to(bundle_path)).removesuffix(".md")
        concepts[concept_id] = text
    return concepts

# Load bundle and inject into agent context
bundle = load_okf_bundle("./my-knowledge-bundle")
context = "\n\n".join(f"# {k}\n{v}" for k, v in bundle.items())

⚠️ What to Know Before Adopting

It's v0.1 and explicitly experimental. OKF v0.1 is an early, experimental spec that Google calls a starting point, not a finished standard. Expect the spec to evolve. Don't build mission-critical tooling that assumes field names won't change.

Structural interoperability, not semantic. OKF already gives agents a shared way to find and read context. It does not yet give them shared semantics for what that context means. Two teams using OKF can exchange bundles. Whether their agents interpret them identically depends on conventions that aren't yet standardized.

No vendor lock-in by design. An OKF bundle lives in any git repository, on any filesystem, readable by any tool that can parse markdown. You can switch consumers without migrating your content.

Complements AGENTS.md and CLAUDE.md. Those convention files tell an agent how to behave in a repo. OKF describes a body of data and knowledge. They solve different layers of the same problem.

Audit-ready by default. Every OKF bundle has optional log.md files at any directory level. Changes are tracked in ISO 8601 format. For regulated industries, this means your knowledge base is audit-ready by default.

⏭️ The Early Mover Case

The spec is Apache 2.0, free to use, and takes five minutes to learn. The honest case for adopting now is the same as schema markup a decade ago: it is cheap to ship, it makes your knowledge legible to the agents that are starting to answer questions about you, and early movers learn the format before it matters.

Start with one team's most pain-prone knowledge: the tables that always get misused, the metrics that always get mis-defined, the runbook that nobody can find at 2am. Write three OKF concept files. Point an agent at them. See what changes.

The spec, reference implementations, and sample bundles are on GitHub: github.com/GoogleCloudPlatform/knowledge-catalog

Found this helpful? Follow for more AI architecture deep-dives! 💬