The web should be readable by machines. Here's a simple way to do it.

#agents #ai #html #webdev

The problem is simple

Every AI agent on the internet is doing the same thing: fetching HTML, guessing what's content, and getting it wrong.

When an AI tool tries to use your website, this is what it sees:

<div class="post-container mx-4">
  <div class="flex items-center gap-2">
    <img src="/avatars/sarah.jpg" />
    <span class="text-sm">Sarah Chen</span>
  </div>
  <h2 class="font-bold mt-4">
    AI Agents Need Structure
  </h2>
  <div class="prose mt-2">
    <p>The web was built for...</p>
    <div class="ad-banner">BUY NOW</div>
  </div>
</div>

Is "Sarah Chen" the author or a commenter? Where does the article end and the ad begin? The machine has to guess. It often guesses wrong.

We have robots.txt to tell machines what to stay away from. We have nothing to tell them what we have.

What I believe

The web should be readable by machines — not just humans. AI agents are becoming how people find information. If your content isn't structured for them, it's increasingly invisible.
Structured data shouldn't require a custom API. Every integration today is bespoke. Every scraper is a hack. There should be one simple convention that works everywhere.
Attribution shouldn't be optional. If a machine reads your content, it should know who made it and how to credit them. That should be part of the protocol, not an afterthought.
Open beats proprietary. If we don't build an open standard for this, every AI company will build their own closed pipeline. That's worse for everyone.

So I built something

FlyWeb is a JSON file at /.well-known/flyweb.json. It lets any website describe its content in a way machines can understand.

{
  "flyweb": "1.0",
  "entity": "My Tech Blog",
  "type": "blog",
  "attribution": {
    "required": true,
    "must_link": true
  },
  "resources": {
    "posts": {
      "path": "/.flyweb/posts",
      "format": "jsonl",
      "fields": ["title", "author", "date", "tags", "content", "url"],
      "access": "free",
      "query": "?tag={tag}&limit={n}"
    }
  }
}

One file. An AI agent that finds it knows what content you have, where to get it as clean data, how to query it, and how to credit you.

That's it. No SDK. No API key. No OAuth. Just a file and a convention.

How it works

Discovery — AI agents check /.well-known/flyweb.json, like how crawlers check robots.txt.

Structure — Content is served as clean JSON or JSONL at paths you define.

GET /.flyweb/posts

{"title": "Why AI Needs Structure", "author": "Sarah Chen", "date": "2026-02-15", "content": "..."}
{"title": "The Future of Web Protocols", "author": "Sarah Chen", "date": "2026-02-10", "content": "..."}

Query — Standard URL parameters. Nothing fancy.

GET /.flyweb/posts?tag=ai&limit=5

Before and after

Without FlyWeb, the AI guesses. It parses your Tailwind classes, hopes it finds the right <div>, and gives you zero credit.

With FlyWeb, the AI gets this:

{
  "title": "AI Agents Need Structure",
  "author": "Sarah Chen",
  "date": "2026-02-15",
  "tags": ["ai", "web"],
  "content": "The web was built for...",
  "url": "https://example.com/posts/42"
}

No guessing. No scraping. No hallucinated metadata.

Attribution is not optional

This is the part I care about most.

"attribution": {
  "required": true,
  "license": "CC-BY-4.0",
  "must_link": true
}

You can give your content away for free. You shouldn't have to give up credit. In FlyWeb, attribution is part of the protocol. Not a suggestion. Not a best practice. Part of the spec.

Adding it takes minutes

CLI:

npx flyweb init

Framework plugins:

npm i next-flyweb      # Next.js
npm i astro-flyweb     # Astro
npm i sveltekit-flyweb # SvelteKit
npm i nuxt-flyweb      # Nuxt
npm i express-flyweb   # Express

WordPress:
There's a plugin that auto-generates the config from your posts and pages.

Validate:

npx flyweb check https://your-site.com

For AI developers

Client SDK for consuming FlyWeb data:

import { discover, fetchResource } from 'flyweb/client';

const site = await discover('https://techcrunch.com');
const articles = await fetchResource(
  'https://techcrunch.com',
  site.config.resources.articles,
  { params: { tag: 'ai' }, limit: 10 }
);
// Clean JSON. No scraping.

MCP server for Claude Code, Cursor, and similar tools:

{
  "mcpServers": {
    "flyweb": {
      "command": "npx",
      "args": ["-y", "flyweb-mcp"]
    }
  }
}

I don't know if this will work

I'm not going to pretend this is guaranteed to succeed. Protocols are hard. Adoption is harder.

But the problem is real. AI agents are scraping the web blind, and content creators are getting zero credit. Every month that passes without an open standard is another month where proprietary pipelines get more entrenched.

FlyWeb is a small bet that a simple, open convention can fix this before it's too late to fix.