AI agents crawl your site now. Here's what they look for (and why fixing it is fragmented)

#ai #seo #webdev #tutorial

AI agents are crawling the web at a rate the SEO playbook never planned for. GPTBot, PerplexityBot, ClaudeBot, and a growing zoo of MCP-driven agents hit sites daily, and most sites give them a 200 OK page full of divs, JavaScript, and no machine-readable intent. Search bots did fine with that. Agents do not.

I have been building a tool in this space for a few months and reading every scanner and spec I can find. Here is the short, honest version of what agent-readiness actually means in 2026, why it is worth caring about, and where the real work is.

The signals that agents look for

There is no single spec. There is a stack of overlapping conventions, and different scanners weight them differently. The ones that matter today:

llms.txt (llmstxt.org). A markdown index at your site root that tells an LLM what your site is about and which URLs are canonical. Roughly the robots.txt of the agent era, except it exists for the LLM's benefit, not yours.
agents.txt. A newer convention that describes which agents you welcome and under what terms. Less standardized than llms.txt, more of a signal than a rule.
MCP (Model Context Protocol). A protocol for exposing tools to agents. If your site sells anything, an MCP endpoint lets an agent buy without scraping a checkout form.
Structured data. Old-school schema.org JSON-LD, but with fresh weight. Agents parse it before they parse your prose. Missing Product, Offer, FAQPage, or Organization is a quiet way to be invisible.
Payment protocols. x402, L402, MPP. Emerging standards for machine-payable resources. Not table-stakes yet, but scanners already grade for them.
Server signals. Reachable canonical URL, no JS-only content, sensible robots.txt, working sitemap, valid TLS, correct Content-Type. Boring but decisive.

You can go check any of this on your own site right now. Cloudflare has a free URL scanner that includes an Agent Readiness Score. AgentGrade at agentgrade.com is fully free and covers protocol coverage in more depth. Agent Checker runs a full audit (paid tier around £19) that actually drives a browser as an agent would.

What the scanners will not tell you

The scanners are useful. They give you a score, a list of failed checks, and a screenshot of the problem. What they do not do is fix anything.

That is the gap I kept walking into. A WordPress owner runs a scan, gets a 42/100, sees "missing llms.txt, missing JSON-LD Product schema, robots.txt blocks GPTBot," and closes the tab. Because the fix is not one file. It is:

editing wp-content/themes/<theme>/functions.php without breaking updates
installing a schema plugin, configuring it, dealing with conflicts
writing an llms.txt that matches your actual sitemap
adjusting robots.txt served by a plugin, not the filesystem

Multiply that by Webflow (custom code injection, different constraints), Shopify (Liquid, theme layer), and Tilda (built-in SEO tab, closed platform), and the fix surface is very different per CMS. That is why the scanners stop at "here is your score." Actually fixing it is either a $200/hour consulting problem or a weekend of docs-diving.

A minimal checklist per platform

If you are doing this by hand, here is where I would start, per CMS. Nothing exotic, just the highest-leverage moves.

WordPress.

Serve a static llms.txt at the root (Yoast can do it, or drop it in the theme root and route via functions.php).
Install a schema plugin (Rank Math, WP SEO Structured Data) and enable at least Organization, WebSite, and Product (if you sell).
Audit robots.txt: many SEO plugins silently block GPTBot. If you want AI traffic, allow it.
If you sell, add Offer schema per product. Agents cannot buy from an unlabelled div.

Webflow.

Custom Code -> Head Code: paste JSON-LD for Organization and WebSite.
Publish a static llms.txt (Assets or Netlify redirect if you host outside Webflow hosting).
In Project Settings -> SEO, verify robots.txt allows agent user agents.
For CMS collections (e.g. products), inject per-item schema via the collection page's embed component.

Shopify.

Use a theme that supports application/ld+json blocks in theme.liquid.
Add Product schema per product page. Most modern themes do this, but many custom ones do not.
Serve llms.txt via a page template or app (the storefront root does not accept arbitrary files by default).
Consider a Merchant schema for organization identity.

Tilda.

Use the built-in SEO tab per page, but add JSON-LD via the T123 (HTML) block.
Tilda does not let you drop a file at root easily. You can host llms.txt on a subdomain and reference it, or use a redirect on your DNS provider.

None of this is glamorous. All of it moves the score.

What I would want as a site owner

Honestly, a scan by itself is not that useful. If I already know I have a problem, I want a package I can drop in. This is the direction I took with AgentFix: scan for free, and if you want the fix, buy a per-platform pack ($1 to $99) with the exact files, the exact snippets, and the exact settings changes for your CMS. No consulting, no plugin bloat, no weekend of research.

That is not a pitch, it is the answer to "why does this exist." Nothing about that model is proprietary. If you want to build the same thing for a niche I do not cover, the ingredients are all public: the scanner logic is 33 signals with known rules, and the packs are just Git-managed CMS-specific bundles. The moat is remediation, not diagnosis.

Takeaways

Agent-readiness is a real, measurable thing in 2026. Free tools will grade you today.
The signals are stackable but fragmented. llms.txt + JSON-LD + MCP + payment protocols, ranked in that priority for most sites.
The scanners tell you the "what." The "how to fix" is CMS-specific and, right now, mostly manual.
If you own a site and want AI-agent traffic, put 30 minutes into a scan and a checklist before you spend anything on it.

If you found this useful, and want a shortcut for the fix step, that is what agentfix.pro is for. Otherwise the checklists above are a solid start.

Canonical: agentfix.pro