DEV Community

Sukru Can
Sukru Can

Posted on

The web should be readable by machines. Here's a simple way to do it.

The problem is simple

Every AI agent on the internet is doing the same thing: fetching HTML, guessing what's content, and getting it wrong.

When an AI tool tries to use your website, this is what it sees:

<div class="post-container mx-4">
  <div class="flex items-center gap-2">
    <img src="/avatars/sarah.jpg" />
    <span class="text-sm">Sarah Chen</span>
  </div>
  <h2 class="font-bold mt-4">
    AI Agents Need Structure
  </h2>
  <div class="prose mt-2">
    <p>The web was built for...</p>
    <div class="ad-banner">BUY NOW</div>
  </div>
</div>
Enter fullscreen mode Exit fullscreen mode

Is "Sarah Chen" the author or a commenter? Where does the article end and the ad begin? The machine has to guess. It often guesses wrong.

We have robots.txt to tell machines what to stay away from. We have nothing to tell them what we have.

What I believe

  1. The web should be readable by machines — not just humans. AI agents are becoming how people find information. If your content isn't structured for them, it's increasingly invisible.

  2. Structured data shouldn't require a custom API. Every integration today is bespoke. Every scraper is a hack. There should be one simple convention that works everywhere.

  3. Attribution shouldn't be optional. If a machine reads your content, it should know who made it and how to credit them. That should be part of the protocol, not an afterthought.

  4. Open beats proprietary. If we don't build an open standard for this, every AI company will build their own closed pipeline. That's worse for everyone.

So I built something

FlyWeb is a JSON file at /.well-known/flyweb.json. It lets any website describe its content in a way machines can understand.

{
  "flyweb": "1.0",
  "entity": "My Tech Blog",
  "type": "blog",
  "attribution": {
    "required": true,
    "must_link": true
  },
  "resources": {
    "posts": {
      "path": "/.flyweb/posts",
      "format": "jsonl",
      "fields": ["title", "author", "date", "tags", "content", "url"],
      "access": "free",
      "query": "?tag={tag}&limit={n}"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

One file. An AI agent that finds it knows what content you have, where to get it as clean data, how to query it, and how to credit you.

That's it. No SDK. No API key. No OAuth. Just a file and a convention.

How it works

Discovery — AI agents check /.well-known/flyweb.json, like how crawlers check robots.txt.

Structure — Content is served as clean JSON or JSONL at paths you define.

GET /.flyweb/posts
Enter fullscreen mode Exit fullscreen mode
{"title": "Why AI Needs Structure", "author": "Sarah Chen", "date": "2026-02-15", "content": "..."}
{"title": "The Future of Web Protocols", "author": "Sarah Chen", "date": "2026-02-10", "content": "..."}
Enter fullscreen mode Exit fullscreen mode

Query — Standard URL parameters. Nothing fancy.

GET /.flyweb/posts?tag=ai&limit=5
Enter fullscreen mode Exit fullscreen mode

Before and after

Without FlyWeb, the AI guesses. It parses your Tailwind classes, hopes it finds the right <div>, and gives you zero credit.

With FlyWeb, the AI gets this:

{
  "title": "AI Agents Need Structure",
  "author": "Sarah Chen",
  "date": "2026-02-15",
  "tags": ["ai", "web"],
  "content": "The web was built for...",
  "url": "https://example.com/posts/42"
}
Enter fullscreen mode Exit fullscreen mode

No guessing. No scraping. No hallucinated metadata.

Attribution is not optional

This is the part I care about most.

"attribution": {
  "required": true,
  "license": "CC-BY-4.0",
  "must_link": true
}
Enter fullscreen mode Exit fullscreen mode

You can give your content away for free. You shouldn't have to give up credit. In FlyWeb, attribution is part of the protocol. Not a suggestion. Not a best practice. Part of the spec.

Adding it takes minutes

CLI:

npx flyweb init
Enter fullscreen mode Exit fullscreen mode

Framework plugins:

npm i next-flyweb      # Next.js
npm i astro-flyweb     # Astro
npm i sveltekit-flyweb # SvelteKit
npm i nuxt-flyweb      # Nuxt
npm i express-flyweb   # Express
Enter fullscreen mode Exit fullscreen mode

WordPress:
There's a plugin that auto-generates the config from your posts and pages.

Validate:

npx flyweb check https://your-site.com
Enter fullscreen mode Exit fullscreen mode

For AI developers

Client SDK for consuming FlyWeb data:

import { discover, fetchResource } from 'flyweb/client';

const site = await discover('https://techcrunch.com');
const articles = await fetchResource(
  'https://techcrunch.com',
  site.config.resources.articles,
  { params: { tag: 'ai' }, limit: 10 }
);
// Clean JSON. No scraping.
Enter fullscreen mode Exit fullscreen mode

MCP server for Claude Code, Cursor, and similar tools:

{
  "mcpServers": {
    "flyweb": {
      "command": "npx",
      "args": ["-y", "flyweb-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

I don't know if this will work

I'm not going to pretend this is guaranteed to succeed. Protocols are hard. Adoption is harder.

But the problem is real. AI agents are scraping the web blind, and content creators are getting zero credit. Every month that passes without an open standard is another month where proprietary pipelines get more entrenched.

FlyWeb is a small bet that a simple, open convention can fix this before it's too late to fix.

The protocol is open

MIT licensed. No vendor lock-in. No payment. If you think the web should be readable by machines, try it out. If you have ideas, PRs are open.

The web was built for human eyes. It shouldn't stay that way.

Top comments (1)

Collapse
 
cellurl profile image
cellurl

2026 will be fun.