Every scanner in the AI-readiness space checks for llms.txt. Half the sites that fail the check fail it by having the wrong file, not a missing one. A 200 OK response with an llms.txt full of hand-waving does not help an agent, and it does not fool the scoring either. This is what I have learned actually shipping and testing these files across WordPress, Webflow, Shopify, and static sites.
What llms.txt actually is
The spec lives at llmstxt.org. Short version: a markdown file at your site root that gives an LLM a map of your site written for the LLM, not for humans. Not a legal agreement (that is closer to robots.txt plus agents.txt). Not a training-data opt-out. It is a discovery and disambiguation aid, the way a hand-drawn map is easier for a stranger than a satellite photo.
Two files are worth knowing:
-
llms.txtat the root: index. Short, hand-curated links to canonical pages. -
llms-full.txtat the root: expanded. Concatenated content of the pages you want the LLM to actually ingest.
Most sites need only the first. Docs-heavy sites benefit from both.
The anatomy of a file that scores
The spec allows more than what agents actually seem to use. In practice, the sections that matter are the ones a scanner or a well-behaved agent can parse deterministically.
# Site Name
> One sentence describing what this site is and who it is for.
Optional short paragraph, up to a few sentences. This is the freeform block.
Keep it factual. Agents will summarize it verbatim.
## Docs
- [Getting started](https://example.com/docs/start): the fastest path from zero to a running install.
- [API reference](https://example.com/docs/api): every endpoint, request and response schemas.
## Products
- [Basic pack](https://example.com/pricing/basic): one-time $29, covers X, Y, Z.
## Optional
- [Changelog](https://example.com/changelog)
- [Blog](https://example.com/blog)
A few things this example gets right that most sites get wrong.
The H1 is the site name, not the file title. A lot of generators produce # llms.txt for Example.com. That is noise. The H1 is what the LLM will most likely use to name your site when it summarizes.
The blockquote is the one-liner. Not marketing copy. A single sentence that a person could actually read out loud. If an agent has to pick one line to describe your site, this is the line it picks.
Links are absolute. Relative links break for anyone consuming the file outside its origin, which is the entire point (the file exists to be pulled and processed elsewhere).
Each link has a colon-separated description. Agents need to know why to follow a link. - [Getting started](https://example.com/docs/start) alone is uninformative. Descriptions after the colon give the agent enough context to rank links before fetching them.
Sections are semantic, not aesthetic. ## Docs, ## Products, ## API, ## Optional. Avoid decorative headings like ## Everything you need to know. The scanner splits on ##, and the H2 is metadata.
What breaks it in practice
Six mistakes I keep seeing when I scan sites for a living:
JavaScript-rendered llms.txt. A Next.js or Nuxt route that serves
llms.txtfrom an API handler with the wrongContent-Type. Agents fetch with a minimal HTTP client, no JS. Serve the file statically,text/plain; charset=utf-8. If your framework prevents that, put it in/publicor the CDN.CDN caching stale versions. You updated the file two hours ago, the scanner still sees yesterday. Purge after every edit. Cloudflare users can hit
POST /zones/:id/purge_cachewith{ "purge_everything": true }or purge by URL.robots.txt disagreement. llms.txt says "here is my content, come look," robots.txt blocks
GPTBotorClaudeBot. Agents that respect robots will not read your llms.txt. Decide once: either you want AI traffic and you allow the crawlers, or you do not. Do not send both signals at once.llms-full.txtbigger than 1 MB and served gzipped only. Some agents pull the file with clients that do not accept-encoding. Serve identity-encoded too, keep total size under a few MB, and split into per-topic files if you must go bigger.No canonical URL match. Links in llms.txt should match the URLs the LLM will actually fetch. If your llms.txt links to
example.com/docsbut the docs live ondocs.example.com, agents get a redirect and lose context. Match canonicals to what the rest of your site declares.Marketing copy where facts should be. "Best-in-class AI-powered platform for modern teams" is worse than useless in an llms.txt. An LLM ingesting this will either quote it (bad) or ignore it (worse, because now you have no summary). Write plainly. What does the site do, who is it for, what does it cost.
How to test that agents actually read it
You can spend an afternoon guessing, or you can check.
Static check. curl -sSI https://yoursite.com/llms.txt and verify: 200, Content-Type: text/plain, Content-Length non-trivial, Last-Modified recent. Then curl -s https://yoursite.com/llms.txt | head -60 and eyeball it.
Agent check. Ask a real agent. Open Claude, ChatGPT with browsing, or Perplexity. Prompt: "Fetch https://yoursite.com/llms.txt and summarize what this site is, in one sentence, based only on that file." If the answer is close to what you wrote in the blockquote, you are good. If the answer is generic or wrong, your file is not doing its job.
Scanner check. Cloudflare's URL Scanner at radar.cloudflare.com/scan surfaces an Agent Readiness Score that partly hinges on llms.txt. AgentGrade at agentgrade.com is fully free and gives a letter grade with sub-scores. My own tool at agentfix.pro runs 33 signals and shows per-signal pass/fail, which is what I built when I got tired of scores that did not tell me what to fix.
Where the industry is heading
Two things are worth watching in the next six months.
agents.txt is stabilizing. It complements llms.txt by expressing permission and terms, closer to robots.txt in spirit but with structured fields for pricing, rate limits, and preferred contact for commercial use. Expect scanners to add it as a signal soon.
MCP endpoints alongside llms.txt. Some sites already advertise a mcp link inside llms.txt pointing at a Model Context Protocol server. That is the practical bridge from "an agent can read your site" to "an agent can do things on your site." If you sell anything, this is where the real leverage lives.
Closing thought
A llms.txt file takes twenty minutes to write and five minutes to serve. Getting it right is worth those twenty-five minutes for anyone whose site an agent might visit, which by 2026 is roughly every site. The interesting question is not whether to write one, but whether the one you have already written is doing what you think it is.
If you want a second opinion on your own file, the tools above will tell you. If you want the fix as a file you can drop in, that is what I have been building for WordPress, Webflow, Shopify, and Tilda at agentfix.pro. Either way, do not trust a scanner score without reading your own llms.txt as if you were the agent.
Cross-posted from agentfix.pro.
Top comments (0)