DEV Community

WP
WP

Posted on

AI Manifest: How I Cut AI Agent Tokens by 82% on Multi-Step Web UIs

AI Manifest: How I Cut AI Agent Tokensai, webdev, opensource, standards by 82% on Multi-## TL;DRToday I shipped an open protocol that lets AI agents (Claude, MCP clients, Playwright-driven bots) execute multi-step web workflows without repeatedly analyzing the DOM.Benchmark results (30 iterations, ERP-style two-step order entry, tiktoken cl100k_base):| Metric | Baseline (DOM analysis) | AI Manifest | Improvement ||---|:---:|:---:|:---:|| Mean input tokens | 1887.6 | 341.0 | −81.9% || Task success rate | 20% (6/30) | 100% (30/30) | +80 %p |Released today under MIT license (code) + FRAND terms (patent claims):- Repo: github.com/11pyo/AINavManifest- IETF Internet-Draft: draft-han-ai-manifest-00- Korean Patent Application: KR 10-2026-0071716---## The problemWatch any modern AI agent operate a complex web UI — an ERP transaction, a journal submission form, a government e-service — and you see the same pattern:1. Load the page2. Read the entire DOM (or screenshot) into the LLM context3. Ask the LLM: "which element should I click?"4. Click it5. Repeat for every single stepEvery step burns thousands of input tokens on DOM content the agent has already seen. And even then, the agent misidentifies ambiguous form fields and fails on roughly 80% of multi-step transactions in my benchmarks.It's inefficient and unreliable — both problems with the same root cause.## The insightFor many workflows, the website operator already knows the correct UI operation sequence. There is no ambiguity on their side. The ambiguity only exists inside the agent, which is rediscovering deterministic knowledge every single session.So instead of making the agent guess, let the site publish an executable declaration of the workflow.## The protocol*AI Manifest* is a JSON document the site embeds in (or alongside) the page:

json{ "version": "1.0", "publisher": "acme.com", "manifestId": "po_submission_v1", "registry_url": "https://registry.aimanifest.io/verify", "task": { "id": "submit_purchase_order", "steps": [ {"step": 1, "action": "fill", "selector": "#vendor", "value": "{vendor}"}, {"step": 2, "action": "fill", "selector": "#item_code", "value": "{item_code}"}, {"step": 3, "action": "fill", "selector": "#quantity", "value": "{quantity}"}, {"step": 4, "action": "click", "selector": "#btn_next"}, {"step": 5, "action": "fill", "selector": "#ship_to", "value": "{ship_to}"}, {"step": 6, "action": "fill", "selector": "#delivery_date", "value": "{delivery_date}"}, {"step": 7, "action": "click", "selector": "#btn_submit"} ] }}

Three ways to embed it*Method A — Well-Known URI* (recommended):


html<meta name="ai-manifest" content="/.well-known/ai-manifest.json">

with the JSON served at /.well-known/ai-manifest.json per IETF RFC 8615.Method B — Hidden DOM element:

html<div id="ai-manifest" style="display:none" aria-hidden="true" data-manifest='{"version":"1.0", ...}'></div>

Method C — HTTP response header:

X-AI-Manifest: url=/.well-known/ai-manifest.json; hash=sha256:...

Sites can combine methods (A + C is particularly clean — one request-response confirms the URL and the hash, before fetching the body).### Agent-side flow1. Fetch headers / check for meta tag / search DOM (in that priority)2. If a manifest is found, compute its SHA-256 hash over a canonical form (keys sorted, UTF-8 encoded)3. POST {publisher, manifestId, hash} to the registry URL over HTTPS4. Registry returns {status: "white" | "black" | "unknown"}5. On white: execute the steps array in declared order, skipping any additional DOM-based LLM reasoning6. On black: refuse and warn the user7. On unknown: warn the user, optionally fall back to DOM inference## Why this matters: prompt injection defenseThe obvious worry with "just execute what the page tells you" is that a malicious page could tell the agent to exfiltrate credentials or click the wrong button.The central trust registry is the mitigation:- Publishers pre-register their manifests (hash-only — the registry doesn't need the body).- The registry performs static analysis at registration time: it rejects or black-lists manifests whose steps contain suspicious patterns — selectors targeting iframe for cross-origin form submission, actions outside the registered safe set (fill / click / select / upload), value fields that would trigger requests to external URLs.- Community reporting channel for black-listing compromised manifests.Multiple interoperable registries can coexist — each manifest declares which one is authoritative via registry_url.## The benchmark in detailThe experimental setup:- A Flask server that implements an ERP-style two-step order-entry flow (vendor, item_code, quantity, currency → delivery info → submit)- Two AI agents: - Baseline: reads the full DOM into the LLM context every step - Manifest-aware: fetches the manifest, verifies with the registry, executes deterministically- 30 iterations per agent with seeds 1000-1029- Input tokens counted with tiktoken and cl100k_base encoding (Anthropic/OpenAI compatible)Raw numbers:| | Baseline mean | Baseline std | Manifest mean | Manifest std ||---|:---:|:---:|:---:|:---:|| Input tokens | 1887.6 | 634.8 | 341.0 | 0.0 || Success rate | 20.0% | — | 100.0% | — || Mean LLM calls | 1.4 | — | 1.0 | — || Wall time (ms) | 28.6 | 16.2 | 54.2 | 8.6 |The manifest agent is slightly slower in wall-clock because it makes 3.4 more HTTP requests (the registry verification). But each of those requests carries a payload two orders of magnitude smaller than DOM-based inference, and the result is cachable by hash, so the real cost is dominated by tokens — which dropped 5.5×.The full benchmark harness is in the repo: clone, pip install -r requirements.txt, python benchmark/run_benchmark.py --repeats 30.## Licensing modelThe code and schema are MIT-licensed. The patent claims are offered under Fair, Reasonable, and Non-Discriminatory (FRAND) terms — any good-faith implementer of the published specification is covered, with defensive termination only if someone sues the applicant over technology covered by this spec.Rationale: the value of a protocol like this is in its ubiquity, not in a rent on each implementation. The patent's role is to keep anyone else from enclosing it.This mirrors how Apple offered FRAND terms on H.264-era codec patents, how Microsoft did on some OOXML patents, and how the Apache/Mozilla foundations coordinate with their members: broad freedom to implement, defensive shield preserved.## What's next- IETF draft revisions: draft-01 in ~6 months based on implementer feedback.- Reference registry deployment: registry.aimanifest.io goes live once I have feedback from at least one serious browser-agent implementer.- SDK auto-generators: tools that read a site's UI and propose a starting manifest.- Integration with MCP / OpenAI Agents SDK / Gemini Deep Research: the protocol composes cleanly with all of them; looking for partners.## Feedback wantedIf you're building agent systems, running a site with known AI agent traffic, or thinking about agent-web interop standards: please open a GitHub issue, comment below, or email me (pk102h@naver.com).Specific things I'd love input on:- Registry governance: single operator, federated, fully site-declared?- Manifest versioning strategy when a site UI changes- Interaction with existing agents.txt / llms.txt / ai-plugin.json- Benchmark scenarios beyond ERP (ScholarOne submission, SAP MM01, e-gov) — send yours## Links- Repo: github.com/11pyo/AINavManifest- JSON Schema: ai-manifest.schema.json- IETF Internet-Draft: draft-han-ai-manifest-00- FRAND declaration: docs/FRAND.md- Benchmark code: validation/If this is useful to you, a GitHub star helps with visibility for the IETF process.— Won-pyoStep Web UIs

Top comments (0)