agent.json: The Missing robots.txt for AI Agents

#ai #webdev #standards #opensource

Your website probably has a robots.txt. It tells search engine crawlers what they can and can't access. Simple, universal, effective.

But there's nothing equivalent for AI agents.

When a browser agent visits your site, it has zero context about what it can do there. So it dumps your entire DOM into an LLM (85,000+ tokens for a typical page), asks "what should I click?", and hopes for the best. This is expensive, slow, and error-prone.

What agent.json does

agent.json is a proposed standard that lets websites declare their capabilities for AI agents. A site publishes it at /.well-known/agent.json:

{
  "capabilities": {
    "search": {
      "selector": "input#search-box",
      "method": "fill_and_submit"
    },
    "add_to_cart": {
      "api": "/api/cart/add",
      "method": "POST"
    }
  }
}

Instead of parsing the entire page, an agent reads the manifest and knows exactly how to interact. The difference in token cost is 85,000 tokens vs ~200.

Why this matters now

AI agents are proliferating fast. Claude, GPT, Gemini — they all increasingly need to interact with websites. Without a standard, every agent framework invents its own discovery mechanism. Sites get hammered with full-page scrapes. Everyone pays more.

agent.json gives site owners control over how agents interact with their properties, while giving agents a fast, cheap path to action.

Current state

Over 2,225 domains are indexed in the public capability graph. Sites can publish their own agent.json, or capabilities get learned automatically through agent interaction reports (a collective intelligence approach where agents teach each other what works).

The spec is open and evolving: agentinternetruntime.com/spec/agent-json

For site owners

Publishing agent.json means:

You control what agents can do on your site
Agent interactions become predictable and auditable
Your users get faster, cheaper agent experiences
You reduce unnecessary DOM scraping load on your servers

For agent developers

Consuming agent.json means:

Skip DOM-to-LLM on every page visit
Deterministic interactions instead of probabilistic LLM guesses
Dramatically lower token costs (we measured 7,000x reduction on cached sites)
Works as an MCP server — install once, benefit everywhere

The reference implementation is open source: AIR SDK on GitHub

The web needs standards that work for AI agents the same way HTTP, robots.txt, and sitemap.xml work for traditional crawlers. agent.json is a step toward that.

DEV Community