<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Roger Remacle - Lab451.org</title>
    <description>The latest articles on DEV Community by Roger Remacle - Lab451.org (@lab451).</description>
    <link>https://dev.to/lab451</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3951426%2Fddbaa89c-c81a-4d31-bc95-b8cba8952a59.png</url>
      <title>DEV Community: Roger Remacle - Lab451.org</title>
      <link>https://dev.to/lab451</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lab451"/>
    <language>en</language>
    <item>
      <title>llms.txt vs llms-full.txt: What's the Difference? (2026)</title>
      <dc:creator>Roger Remacle - Lab451.org</dc:creator>
      <pubDate>Wed, 17 Jun 2026 15:10:00 +0000</pubDate>
      <link>https://dev.to/lab451/llmstxt-vs-llms-fulltxt-whats-the-difference-2026-3lhl</link>
      <guid>https://dev.to/lab451/llmstxt-vs-llms-fulltxt-whats-the-difference-2026-3lhl</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is a cross-post from &lt;a href="https://lab451.org/blog/llms-txt-vs-llms-full-txt" rel="noopener noreferrer"&gt;lab451.org/blog&lt;/a&gt;. I've been working on tooling for the llms.txt standard for a few months and the most common question I get is "which file do I actually need." Here's the answer at length.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They sound nearly identical. They live next to each other at the root of your site. They serve the same broad audience. But they do fundamentally different jobs — and knowing which one to ship (or whether to ship both) is the difference between AI models &lt;em&gt;understanding&lt;/em&gt; your site and AI models actually &lt;em&gt;answering questions&lt;/em&gt; about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  On this page
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The TL;DR&lt;/li&gt;
&lt;li&gt;Two files, two jobs&lt;/li&gt;
&lt;li&gt;Side by side&lt;/li&gt;
&lt;li&gt;Format differences in detail&lt;/li&gt;
&lt;li&gt;Size, budget, and context windows&lt;/li&gt;
&lt;li&gt;Which AI models read which file&lt;/li&gt;
&lt;li&gt;When to ship just llms.txt&lt;/li&gt;
&lt;li&gt;When to ship both&lt;/li&gt;
&lt;li&gt;How they work together at inference time&lt;/li&gt;
&lt;li&gt;Generating llms-full.txt without going insane&lt;/li&gt;
&lt;li&gt;Common mistakes&lt;/li&gt;
&lt;li&gt;FAQ&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The TL;DR
&lt;/h2&gt;

&lt;p&gt;If you only read one paragraph: &lt;code&gt;llms.txt&lt;/code&gt; is a small Markdown file that tells AI models &lt;em&gt;what your site is and where the important pages live&lt;/em&gt;. &lt;code&gt;llms-full.txt&lt;/code&gt; is a much larger Markdown file that contains &lt;em&gt;the actual text of those pages&lt;/em&gt; already extracted, cleaned, and concatenated. The first is a table of contents; the second is the book.&lt;/p&gt;

&lt;p&gt;Marketing sites and small blogs usually only need &lt;code&gt;llms.txt&lt;/code&gt;. Documentation sites, API references, knowledge bases, and anything technical where you actually want models to be able to &lt;em&gt;answer questions&lt;/em&gt; about your content should ship both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two files, two jobs
&lt;/h2&gt;

&lt;p&gt;The reason there are two files instead of one comes down to a tradeoff every LLM has to make: &lt;strong&gt;context window space is expensive, but fetching pages is slow&lt;/strong&gt;. Different situations call for different sides of that tradeoff.&lt;/p&gt;

&lt;p&gt;When a user asks ChatGPT "what does Lab451 do," the model needs the smallest possible amount of context to answer accurately. It doesn't need your full pricing page, your terms of service, or every blog post — it needs a sentence or two. &lt;code&gt;llms.txt&lt;/code&gt; is exactly that: a tiny file the model can fetch in milliseconds, parse in a few tokens, and use to give a quick, accurate one-paragraph answer.&lt;/p&gt;

&lt;p&gt;When a user asks ChatGPT "what's the best way to set up &lt;code&gt;llms-full.txt&lt;/code&gt; for a 500-page documentation site," the model needs much more. It needs to understand your full documentation, find the relevant sections, and synthesize specifics. Following links one by one from &lt;code&gt;llms.txt&lt;/code&gt; would mean ten or twenty separate fetches and a lot of wasted context. &lt;code&gt;llms-full.txt&lt;/code&gt; sidesteps that entire dance: download once, answer in detail.&lt;/p&gt;

&lt;p&gt;The two files don't compete — they complement each other. The model can grab whichever is the right tool for the question being asked. Some models check both in sequence; some pick one based on the query depth. Either way, having both available means you've covered both ends of the tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side by side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;llms.txt&lt;/th&gt;
&lt;th&gt;llms-full.txt&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Map of your site&lt;/td&gt;
&lt;td&gt;Full text of your site&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured Markdown (H1 + blockquote + H2 link lists)&lt;/td&gt;
&lt;td&gt;Free-form Markdown (concatenated page bodies)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typical size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1–10 KB&lt;/td&gt;
&lt;td&gt;100 KB – several MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token cost when read&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~250 – 2,500 tokens&lt;/td&gt;
&lt;td&gt;~25,000 – 1,000,000+ tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Update frequency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;When site structure changes&lt;/td&gt;
&lt;td&gt;Every meaningful content change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Any site (marketing, blog, SaaS landing)&lt;/td&gt;
&lt;td&gt;Docs, API references, knowledge bases, technical content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read once, follow links as needed&lt;/td&gt;
&lt;td&gt;Read once, answer from memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hosted at&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/llms.txt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/llms-full.txt&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Required?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recommended for everyone&lt;/td&gt;
&lt;td&gt;Recommended for content-heavy sites&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Format differences in detail
&lt;/h2&gt;

&lt;p&gt;Both files are Markdown, both are plain text, both live at the root of your domain. The differences are structural.&lt;/p&gt;

&lt;h3&gt;
  
  
  llms.txt: prescriptive structure
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;llms.txt&lt;/code&gt; follows a tight, parseable shape. There's exactly one H1, an optional blockquote summary, optional free-form Markdown, and then H2 sections each containing nothing but link lists. A spec-compliant parser can extract these elements deterministically. Here's the canonical shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Lab451&lt;/span&gt;
&lt;span class="gt"&gt;
&amp;gt; Lab451 generates llms.txt, llms-full.txt, sitemap.xml, and robots.txt&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; for any public website in about 30 seconds.&lt;/span&gt;

&lt;span class="gu"&gt;## Docs&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Getting started&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://lab451.org/docs/quickstart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Generate your first set of files
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;API reference&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://lab451.org/docs/api&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: Endpoints, authentication, rate limits

&lt;span class="gu"&gt;## Optional&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Terms of service&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://lab451.org/terms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Privacy policy&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://lab451.org/privacy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's &lt;em&gt;not&lt;/em&gt; there: no page content. No paragraphs from the Getting Started doc. No code samples from the API reference. Just titles, URLs, and one-line descriptions. The file is a finger pointing at the pages, not the pages themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  llms-full.txt: prescriptive content, flexible structure
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;llms-full.txt&lt;/code&gt; takes a different approach. The format is much looser — there's no required shape — but the &lt;em&gt;content&lt;/em&gt; requirements are stricter. It should contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The full text of every important page on your site&lt;/li&gt;
&lt;li&gt;Cleaned of navigation, headers, footers, cookie banners, and chrome&lt;/li&gt;
&lt;li&gt;Converted to clean Markdown (or at minimum, structured plain text)&lt;/li&gt;
&lt;li&gt;Concatenated into a single file with clear section breaks between pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a shortened example of what one looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Lab451 — Full Documentation&lt;/span&gt;
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## Getting Started&lt;/span&gt;

Lab451 generates the four files AI models need to understand your site.
You give it a URL, choose a file type, and click Generate. The crawler
maps your site, extracts content, and produces spec-compliant output.

To get started, visit lab451.org and paste your domain. The free plan
handles sites up to 50 pages without an account...
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## API Reference&lt;/span&gt;

&lt;span class="gu"&gt;### Authentication&lt;/span&gt;

All API requests require a Bearer token in the Authorization header.
Get your token from the Account page under "API Keys"...

&lt;span class="gu"&gt;### Endpoints&lt;/span&gt;

&lt;span class="gu"&gt;#### POST /api/generate&lt;/span&gt;

Generates a single file type for a given domain. Required parameters:
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`domain`&lt;/span&gt; — the target URL (must include https://)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`fileType`&lt;/span&gt; — one of: llms, llms-full, sitemap, robots
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`maxPages`&lt;/span&gt; — page cap (defaults to plan limit)
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;---&lt;/code&gt; horizontal rule between sections is convention, not requirement. What matters is that each page is identifiable as its own chunk, the headings reflect the original page hierarchy, and the model can navigate to any subsection by scanning H2s and H3s.&lt;/p&gt;

&lt;h2&gt;
  
  
  Size, budget, and context windows
&lt;/h2&gt;

&lt;p&gt;The size difference between the two files is enormous, and it has real consequences for how models consume them.&lt;/p&gt;

&lt;p&gt;A typical &lt;code&gt;llms.txt&lt;/code&gt; file for a small-to-medium site is between 1 KB and 10 KB. That's roughly 250 to 2,500 tokens — a tiny fraction of any modern model's context window. Reading &lt;code&gt;llms.txt&lt;/code&gt; is essentially free, which is why models will cheerfully fetch it on almost any query that touches your domain.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;llms-full.txt&lt;/code&gt; is a different beast. A documentation site with 200 pages of meaningful content might produce a 500 KB file — around 125,000 tokens. That fits comfortably in modern long-context models (GPT-4o, Claude 4, Gemini 2.5 Pro all handle this easily), but it's a real chunk of context the model has to weigh against everything else in the conversation.&lt;/p&gt;

&lt;p&gt;The practical limits as of mid-2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File size&lt;/th&gt;
&lt;th&gt;Token equivalent&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Under 50 KB&lt;/td&gt;
&lt;td&gt;~12,500 tokens&lt;/td&gt;
&lt;td&gt;Read fully by every major model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50 KB – 500 KB&lt;/td&gt;
&lt;td&gt;~12,500 – 125,000 tokens&lt;/td&gt;
&lt;td&gt;Read fully by long-context models; chunked or summarized by smaller ones&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500 KB – 2 MB&lt;/td&gt;
&lt;td&gt;~125,000 – 500,000 tokens&lt;/td&gt;
&lt;td&gt;Read partially; models may retrieve only relevant sections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Over 2 MB&lt;/td&gt;
&lt;td&gt;500,000+ tokens&lt;/td&gt;
&lt;td&gt;Usually retrieval-only; rarely loaded whole&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The honest takeaway: if your &lt;code&gt;llms-full.txt&lt;/code&gt; is over a megabyte, you're approaching the practical ceiling. Beyond that, models increasingly fall back to retrieval-style consumption (grep for relevant chunks) rather than holistic reading. That's not necessarily bad — it still works — but it changes the equation. For the largest sites, the answer isn't "bigger llms-full.txt"; it's "smarter chunking and well-named sections."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Rule of thumb:&lt;/strong&gt; aim for an &lt;code&gt;llms-full.txt&lt;/code&gt; under 500 KB if possible. If you're over that, scrutinize what's actually in there. Old blog posts, deprecated docs, terms of service, and changelogs rarely earn their place. The point is to give models the content that actually answers questions, not every word you've ever published.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Which AI models read which file
&lt;/h2&gt;

&lt;p&gt;As of mid-2026, the picture is uneven but converging. Some crawlers explicitly fetch both files; some only fetch &lt;code&gt;llms.txt&lt;/code&gt;; some only honor &lt;code&gt;llms-full.txt&lt;/code&gt; when explicitly linked. The practical state of play:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Crawler&lt;/th&gt;
&lt;th&gt;llms.txt&lt;/th&gt;
&lt;th&gt;llms-full.txt&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT (GPTBot, OAI-SearchBot)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (when discoverable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude (ClaudeBot)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Perplexity (PerplexityBot)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google (Googlebot, Google-Extended)&lt;/td&gt;
&lt;td&gt;Indexed&lt;/td&gt;
&lt;td&gt;Indexed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bing / Copilot (Bingbot)&lt;/td&gt;
&lt;td&gt;Indexed&lt;/td&gt;
&lt;td&gt;Indexed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok (xAI-Bot)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral, Meta&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"Yes" means the crawler reliably fetches the file and there's reasonable evidence it's used. "Indexed" means the file gets indexed alongside other site content but its specific use is unclear. "Partial" means fetching happens but isn't consistent across all queries.&lt;/p&gt;

&lt;p&gt;The pragmatic conclusion: any crawler that fetches &lt;code&gt;llms.txt&lt;/code&gt; will follow a link from it to &lt;code&gt;llms-full.txt&lt;/code&gt; if one is listed in the Optional section. So even if a model doesn't crawl &lt;code&gt;llms-full.txt&lt;/code&gt; as a well-known URL, mentioning it from your &lt;code&gt;llms.txt&lt;/code&gt; ensures it gets discovered.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to ship just llms.txt
&lt;/h2&gt;

&lt;p&gt;There are good reasons to skip &lt;code&gt;llms-full.txt&lt;/code&gt; entirely. Ship only &lt;code&gt;llms.txt&lt;/code&gt; if any of these apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You're a marketing site.&lt;/strong&gt; Your homepage, About, Pricing, and Contact pages don't need to be read in detail by an AI model. Users asking AI about you want a one-paragraph summary; &lt;code&gt;llms.txt&lt;/code&gt; delivers that perfectly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're a small blog.&lt;/strong&gt; Individual posts are better read in their original context (where you have analytics, ads, related-posts widgets). Pointing models at the posts via &lt;code&gt;llms.txt&lt;/code&gt; is enough; serving the text again in &lt;code&gt;llms-full.txt&lt;/code&gt; just duplicates content without strategic benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your content is rapidly changing.&lt;/strong&gt; A news site, stock-tracker, or live-event dashboard would have to regenerate &lt;code&gt;llms-full.txt&lt;/code&gt; constantly. The maintenance overhead outweighs the benefit; better to let crawlers hit the live pages via &lt;code&gt;llms.txt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your content is gated.&lt;/strong&gt; If most of your pages require authentication or payment, you can't put their full text in a publicly-served &lt;code&gt;llms-full.txt&lt;/code&gt;. Listing the public-facing summaries in &lt;code&gt;llms.txt&lt;/code&gt; is the right level of disclosure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your content is primarily visual or interactive.&lt;/strong&gt; Tools, calculators, configurators, and data visualizations don't flatten into text well. Pointing models at the tool URL via &lt;code&gt;llms.txt&lt;/code&gt; is fine; trying to describe the UI in &lt;code&gt;llms-full.txt&lt;/code&gt; is usually worse than nothing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to ship both
&lt;/h2&gt;

&lt;p&gt;The case for shipping both is strongest when you have text-heavy content where the value is in the &lt;em&gt;specifics&lt;/em&gt; — exact API parameters, exact installation steps, exact configuration syntax. Specifically, ship both when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You have documentation.&lt;/strong&gt; If a user could plausibly ask an AI "how do I do X with your product," and the answer requires pointing to specific function signatures, command flags, or exact configuration syntax, you want &lt;code&gt;llms-full.txt&lt;/code&gt;. Models will quote your docs back at users; you want them quoting from a canonical, clean source.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have an API.&lt;/strong&gt; API references thrive in &lt;code&gt;llms-full.txt&lt;/code&gt;. The whole point is that a model can pull up your endpoint table, parameter list, and response format in a single fetch and answer questions accurately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have a knowledge base or help center.&lt;/strong&gt; Support articles often answer "how do I X" questions that AI assistants now field on your behalf. Putting them in &lt;code&gt;llms-full.txt&lt;/code&gt; means AI gives the same answer your support team would.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have evergreen technical content.&lt;/strong&gt; Tutorials, guides, and walkthroughs benefit from being in &lt;code&gt;llms-full.txt&lt;/code&gt; for the same reason — the value is in the specifics, and you want models quoting your version rather than a stale paraphrase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to be the canonical source on a topic.&lt;/strong&gt; Being in &lt;code&gt;llms-full.txt&lt;/code&gt; raises the odds that a model quotes your phrasing when summarizing the topic. If you've written the definitive guide on something, having it in &lt;code&gt;llms-full.txt&lt;/code&gt; is the difference between "according to Lab451" and "according to a guide I read somewhere."&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How they work together at inference time
&lt;/h2&gt;

&lt;p&gt;Watching a real model handle a query that triggers both files is instructive. Here's a simplified trace of how a request like "how do I add llms.txt to a Next.js site" might flow through a system that supports both files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The model needs to answer a specific technical question. It identifies potentially relevant sites (Lab451 included).&lt;/li&gt;
&lt;li&gt;It fetches &lt;code&gt;lab451.org/llms.txt&lt;/code&gt; first — small, fast, cheap. From this, it learns Lab451 is an llms.txt generator and that there's a "Docs" section containing relevant pages.&lt;/li&gt;
&lt;li&gt;It sees that &lt;code&gt;llms.txt&lt;/code&gt; mentions a &lt;code&gt;/llms-full.txt&lt;/code&gt; in the Optional section. The model decides — based on query complexity — to fetch it.&lt;/li&gt;
&lt;li&gt;It pulls down &lt;code&gt;lab451.org/llms-full.txt&lt;/code&gt;, finds the "Adding llms.txt to your site" section, finds the Next.js example, and quotes the relevant configuration directly.&lt;/li&gt;
&lt;li&gt;The user gets a precise answer with a citation pointing back to lab451.org.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without &lt;code&gt;llms-full.txt&lt;/code&gt;, step 3 would instead trigger a chain of fetches — first the Docs index, then the "Adding llms.txt" page, then the Next.js page — each one a separate request with its own HTML parsing, navigation chrome stripping, and context cost. The model probably still gets there, but it takes longer, costs more context, and is more likely to grab the wrong content along the way.&lt;/p&gt;

&lt;p&gt;Both files are tools. &lt;code&gt;llms.txt&lt;/code&gt; is the cheap, fast tool that handles 80% of queries. &lt;code&gt;llms-full.txt&lt;/code&gt; is the heavier tool that handles the 20% where specifics matter. Shipping both means the model can pick the right one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating llms-full.txt without going insane
&lt;/h2&gt;

&lt;p&gt;The reason a lot of sites only ship &lt;code&gt;llms.txt&lt;/code&gt; isn't that they don't want both — it's that maintaining &lt;code&gt;llms-full.txt&lt;/code&gt; by hand is miserable. Concatenating every page's content into a single file, stripping nav and chrome, keeping it in sync as the site evolves — that's a job for a script, not a human.&lt;/p&gt;

&lt;p&gt;A few practical approaches:&lt;/p&gt;

&lt;h3&gt;
  
  
  Static-site generators with built-in support
&lt;/h3&gt;

&lt;p&gt;Mintlify, Fern, Docusaurus (via plugin), and most modern docs platforms now ship &lt;code&gt;llms-full.txt&lt;/code&gt; generation out of the box. If your docs already build to static HTML, check whether your generator can also emit &lt;code&gt;llms-full.txt&lt;/code&gt; in the same build step. This is by far the lowest-effort path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build-time scripts
&lt;/h3&gt;

&lt;p&gt;For sites without built-in support, a build script can extract Markdown from your content directory, strip frontmatter or normalize it, and concatenate everything into a single file at &lt;code&gt;/llms-full.txt&lt;/code&gt;. This works especially well for Hugo, Eleventy, Astro, and Next.js sites where content already lives as Markdown files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crawl-based generation
&lt;/h3&gt;

&lt;p&gt;For sites that don't have source Markdown — WordPress sites, headless CMS sites, or anything served dynamically — the right answer is to crawl your own site and convert each page's rendered HTML to clean Markdown. This is what &lt;a href="https://lab451.org" rel="noopener noreferrer"&gt;Lab451&lt;/a&gt; does, and what other tools in this space do too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid: generate, then curate
&lt;/h3&gt;

&lt;p&gt;The highest-quality &lt;code&gt;llms-full.txt&lt;/code&gt; files combine automated generation with editorial review. Auto-generate the file from your docs source, then have a human pass over it once to remove anything that shouldn't be there (deprecated content, internal-only notes, accidental duplicates). This is overkill for most sites, but for teams whose AI presence matters strategically, it's worth the quarterly hour.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;p&gt;A few specific failure modes to avoid:&lt;/p&gt;

&lt;h3&gt;
  
  
  Putting full page content in llms.txt
&lt;/h3&gt;

&lt;p&gt;The biggest mistake by far. &lt;code&gt;llms.txt&lt;/code&gt; is a map, not the territory. If your &lt;code&gt;llms.txt&lt;/code&gt; contains paragraphs of body content, you've confused it with &lt;code&gt;llms-full.txt&lt;/code&gt;. Move the body content into &lt;code&gt;llms-full.txt&lt;/code&gt;, and put only links and descriptions in &lt;code&gt;llms.txt&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linking to llms-full.txt only from llms-full.txt
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;llms-full.txt&lt;/code&gt; isn't typically discovered by crawlers as a well-known URL the way &lt;code&gt;llms.txt&lt;/code&gt; is. The way crawlers find it is by reading &lt;code&gt;llms.txt&lt;/code&gt; and following a link. So if you ship both, make sure &lt;code&gt;llms.txt&lt;/code&gt; mentions &lt;code&gt;llms-full.txt&lt;/code&gt; in its Optional section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Leaving navigation chrome in llms-full.txt
&lt;/h3&gt;

&lt;p&gt;If your &lt;code&gt;llms-full.txt&lt;/code&gt; generator pulls page HTML and converts to Markdown without first stripping the header, footer, sidebar, and cookie banner, you end up with a file where 30% of the content is "Home | About | Contact" repeated 200 times. Models will still parse it, but they waste context on noise. Clean extraction is table stakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Including pages that shouldn't be there
&lt;/h3&gt;

&lt;p&gt;Search result pages, tag archives, pagination, login pages, and user-account pages don't belong in &lt;code&gt;llms-full.txt&lt;/code&gt;. Filter them out at generation time. The rule of thumb: if a page has unique, canonical, evergreen content that someone would want to read, it belongs. Otherwise it doesn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Letting it go stale
&lt;/h3&gt;

&lt;p&gt;An &lt;code&gt;llms-full.txt&lt;/code&gt; that's six months out of date is worse than no &lt;code&gt;llms-full.txt&lt;/code&gt; at all — models will confidently quote outdated pricing, deprecated API endpoints, and old product names. Tie regeneration to your deploy pipeline if you can, or set a monthly cron job. Make staleness impossible.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I have llms-full.txt without llms.txt?
&lt;/h3&gt;

&lt;p&gt;Technically yes; practically no. &lt;code&gt;llms-full.txt&lt;/code&gt; is discovered via &lt;code&gt;llms.txt&lt;/code&gt; in most crawler implementations. Without an &lt;code&gt;llms.txt&lt;/code&gt; linking to it, your &lt;code&gt;llms-full.txt&lt;/code&gt; sits unread. Always ship them together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is llms-full.txt just a single big Markdown file?
&lt;/h3&gt;

&lt;p&gt;Yes. The whole point is that a model can fetch one URL and get everything. Splitting it across multiple files defeats the purpose. If your content is genuinely too big for one file, the answer is smarter content curation, not file splitting.&lt;/p&gt;

&lt;h3&gt;
  
  
  What if my site is mostly visual or interactive?
&lt;/h3&gt;

&lt;p&gt;Skip &lt;code&gt;llms-full.txt&lt;/code&gt;. There's no useful way to flatten a Figma template gallery or an interactive calculator into Markdown. &lt;code&gt;llms.txt&lt;/code&gt; alone, pointing at descriptive pages, is the right choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does llms-full.txt hurt my SEO?
&lt;/h3&gt;

&lt;p&gt;No. &lt;code&gt;llms-full.txt&lt;/code&gt; isn't indexed as a web page in the traditional sense — it's a resource file, like &lt;code&gt;robots.txt&lt;/code&gt; or &lt;code&gt;sitemap.xml&lt;/code&gt;. Google and Bing index its existence but don't treat it as a duplicate of your pages. The content within isn't competing with your real pages for rankings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I gate llms-full.txt behind authentication?
&lt;/h3&gt;

&lt;p&gt;No. If you want models to read it, it has to be publicly fetchable. If you have content you don't want models to see, leave it out of &lt;code&gt;llms-full.txt&lt;/code&gt; rather than trying to gate the file itself. A gated &lt;code&gt;llms-full.txt&lt;/code&gt; is the same as no &lt;code&gt;llms-full.txt&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about pricing or rapidly-changing content?
&lt;/h3&gt;

&lt;p&gt;Two options. The simpler one: leave volatile content out of &lt;code&gt;llms-full.txt&lt;/code&gt; entirely, and rely on the model to fetch the live page via the &lt;code&gt;llms.txt&lt;/code&gt; link when asked. The more sophisticated one: regenerate &lt;code&gt;llms-full.txt&lt;/code&gt; on a short cadence (hourly cron for pricing, daily for changelogs) so it stays close-enough to current.&lt;/p&gt;

&lt;h3&gt;
  
  
  How big is too big?
&lt;/h3&gt;

&lt;p&gt;Practical ceiling: around 2 MB. Past that, most models stop reading the file whole and switch to retrieval-style access on chunks. That still works, but you lose the "everything in one shot" property that's the file's whole point. Aim under 500 KB if you can.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this useful, the &lt;a href="https://lab451.org/blog/llms-txt-vs-llms-full-txt" rel="noopener noreferrer"&gt;original post on lab451.org&lt;/a&gt; has a few extra resources, and there's a &lt;a href="https://lab451.org/blog/llms-txt-complete-guide-2026" rel="noopener noreferrer"&gt;longer guide on the llms.txt spec itself&lt;/a&gt; if you want the deeper background. Happy to answer questions in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>seo</category>
      <category>ai</category>
    </item>
    <item>
      <title>llms.txt — what it is, what AI models use it, how to write one, and the mistakes that quietly tank the whole exercise.</title>
      <dc:creator>Roger Remacle - Lab451.org</dc:creator>
      <pubDate>Tue, 26 May 2026 08:12:46 +0000</pubDate>
      <link>https://dev.to/lab451/complete-llmstxt-guide-for-2026-57d</link>
      <guid>https://dev.to/lab451/complete-llmstxt-guide-for-2026-57d</guid>
      <description>&lt;p&gt;&lt;strong&gt;What is llms.txt?&lt;/strong&gt;&lt;br&gt;
llms.txt is a plain-text file you put at the root of your domain — at &lt;a href="https://yoursite.com/llms.txt" rel="noopener noreferrer"&gt;https://yoursite.com/llms.txt&lt;/a&gt; — that tells large language models what your site is about, which pages matter, and how to navigate it. It's written in Markdown. It's small. It exists for one reason: AI models have very limited context windows, and they can't read your whole site, so you give them a curated map instead.&lt;/p&gt;

&lt;p&gt;The standard was proposed by Jeremy Howard of Answer.AI in September 2024, and over the following eighteen months it became the de facto convention for what's now called Generative Engine Optimization, or GEO — the practice of getting your site cited by ChatGPT, Claude, Perplexity, Gemini, and the other large language models that have started to replace the traditional search box.&lt;/p&gt;

&lt;p&gt;If you've ever written a robots.txt or a sitemap.xml, you already understand the shape of llms.txt. It's the same idea — a small file at a well-known URL that gives automated systems structured hints about your site — except the audience is language models rather than search-engine crawlers, and the format is Markdown rather than text directives or XML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists (the actual problem)&lt;/strong&gt;&lt;br&gt;
When you ask ChatGPT "what is FastHTML and how do I get started," ChatGPT might do three things behind the scenes: search the web, pick a handful of pages, and try to read enough of them to give you a sensible answer. The second and third steps are where modern sites fall over.&lt;/p&gt;

&lt;p&gt;A typical modern web page is 200KB to several megabytes of HTML, CSS, JavaScript, third-party scripts, cookie banners, navigation chrome, and ads. An LLM's context window — even a generous one — is small relative to that. The model has to either render the page through some kind of browser-like pipeline (slow, expensive, fragile) or strip the markup down to text (loses structure, mangles tables, drops semantics). Either way, by the time the actual content reaches the model, it's degraded.&lt;/p&gt;

&lt;p&gt;Worse, the model has no idea which pages on your site are important. Is your homepage the canonical statement of what you do? Is the About page the one to read for context? Is your documentation under /docs, /help, /wiki, or somewhere else? Without explicit guidance, the model guesses. It often guesses wrong.&lt;/p&gt;

&lt;p&gt;llms.txt solves both problems at once. It's a small, pre-cleaned, Markdown-formatted document that says: "This is what this site is. Here are the pages that matter. Here's where to go for deeper information." The model spends a fraction of its context budget and walks away with an accurate picture of your site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who actually reads llms.txt&lt;/strong&gt;&lt;br&gt;
This is the question every skeptical post about llms.txt opens with, and it deserves an honest answer. As of mid-2026, support is uneven but growing. Here's the realistic state of play:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgz89qr2azm80ovuwm9dn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgz89qr2azm80ovuwm9dn.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;None of these companies have published a formal commitment that says "we use llms.txt and weight it at X." What you'll see in practice is that crawlers fetch the file, log it, and use it as one signal among many. The honest framing is closer to schema.org markup in 2014 than to robots.txt in 2024: not strictly required, not universally honored, but adopted quickly enough that not having one is starting to look like a tell.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The defensible position in 2026:&lt;/strong&gt; ship an llms.txt because it costs you almost nothing, it makes your site more legible to anything that does read it, and the downside if no model ever reads it is zero. It's a hedge with no premium.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The spec, line by line&lt;/strong&gt;&lt;br&gt;
The formal specification at llmstxt.org is short — short enough to walk through end to end. Here's the structure, with every section explained.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Site or project name

&amp;gt; A one-paragraph blockquote summary.
&amp;gt; This is the only section besides the H1 that is parsed structurally.

Optional free-form Markdown details about the project. Paragraphs,
lists, anything except headings.

## Docs

- [Page name](https://example.com/page): Optional one-line description
- [Another page](https://example.com/another): What it covers

## Examples

- [Example](https://example.com/example): One-line context

## Optional

- [Less critical link](https://example.com/extra): Skippable if context is tight
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Exactly one H1.&lt;/strong&gt; This is the only required element. It's the name of the site or project, not a tagline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;An optional blockquote.&lt;/strong&gt; This is your "elevator pitch" summary. Models often quote it verbatim when asked "what is this site about." Make it good. One or two sentences, plain English.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Free-form Markdown&lt;/strong&gt; after the blockquote can contain paragraphs, lists, and emphasis — anything except additional headings, until you hit the H2 sections.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;H2 sections containing link lists.&lt;/strong&gt; Each H2 is a category (Docs, Tutorials, API Reference, Blog, Examples). Each list item is a Markdown link, optionally followed by a colon and a one-line description.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The "Optional" H2 is special.&lt;/strong&gt; Links here can be skipped by parsers that need a shorter context. Use it for secondary material — appendices, deeper references, anything not essential to understanding what your site is.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What's not allowed:&lt;/strong&gt; images, HTML, tables, code blocks, additional H1s, or nested headings inside the H2 link sections. Parsers expect Markdown text and Markdown lists only. The simpler you keep it, the more reliably it's read.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;llms.txt vs llms-full.txt&lt;/strong&gt;&lt;br&gt;
The base llms.txt file is a map. It tells a model where to go, but to follow the links the model still has to fetch each page. For documentation sites and other content where you want the model to have everything in one shot, there's a companion file: llms-full.txt.&lt;/p&gt;

&lt;p&gt;llms-full.txt contains the actual content of every page on your site, concatenated into a single Markdown document. No navigation, no boilerplate, no chrome — just the words. A model can download one file and have a complete picture of your product without making fifty follow-up requests.&lt;/p&gt;

&lt;p&gt;Here's a simplified mental model of the two:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9e5f44nh8mnuyya1a6zt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9e5f44nh8mnuyya1a6zt.png" alt=" " width="786" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a marketing site or a small blog, llms.txt alone is plenty. For a documentation site, an API reference, a product manual, or anything you actually want models to be able to answer questions about, ship both. The pairing was popularized by Mintlify and Anthropic in early 2025 and has stuck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it relates to robots.txt and sitemap.xml&lt;/strong&gt;&lt;br&gt;
These three files cover three different audiences with three different purposes. They complement each other; they don't replace each other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwacgoa1yhllwyd9hcpj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwacgoa1yhllwyd9hcpj.png" alt=" " width="799" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A common confusion: llms.txt doesn't block anything. It's not a permissions file. If you want to keep crawlers off certain paths, that goes in robots.txt. If you want them to know where your pages live for indexing, that's sitemap.xml. llms.txt is purely a here's what we are document.&lt;/p&gt;

&lt;p&gt;One subtle interaction: your llms.txt should not link to pages you've blocked in robots.txt. Some crawlers will silently drop the inconsistent links; others might log an error and deprioritize the whole file. Keep the three in agreement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing a good llms.txt&lt;/strong&gt;&lt;br&gt;
The mechanical rules above are necessary but not sufficient. A good llms.txt — one that actually changes how models talk about your site — follows a few principles that aren't in the spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lead with the summary that you want quoted back&lt;/strong&gt;&lt;br&gt;
The blockquote at the top is the single most important sentence in the file. When a model is asked "what is yoursite.com," there's a strong chance it will paraphrase your blockquote. Write it the way you'd want it to appear in a Perplexity answer or a ChatGPT citation. No marketing fluff, no superlatives, no "leading platform for." Plain language, what the site does, who it's for.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Lab451

&amp;gt; Lab451 generates llms.txt, llms-full.txt, sitemap.xml, and robots.txt
&amp;gt; for any public website in about 30 seconds. Free for sites under 50
&amp;gt; pages. No signup required for the basic flow.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare that to what a marketing draft would have produced ("Lab451 is the leading AI-discoverability platform empowering websites to maximize their generative engine presence"). The first version is what a model will repeat. The second is what a model will quietly rewrite into the first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Group links by intent, not by site architecture&lt;/strong&gt;&lt;br&gt;
The H2 sections shouldn't mirror your top nav. They should mirror the questions a user might ask a model. If someone asks "how do I get started with X," the model should find a section called Getting Started or Quickstart. If they ask "what does X cost," there should be a Pricing section. Think of the H2s as the answers to the questions you'd expect, not as a sitemap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One-line descriptions earn their keep&lt;/strong&gt;&lt;br&gt;
The &lt;a href="https://dev.tourl"&gt;Title&lt;/a&gt;: description pattern is optional, but the description is often what tips a model toward citing one page over another. Keep descriptions to a single line. State what the page covers, not how good it is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Docs

- [Getting started](https://lab451.org/docs/quickstart):
  Generate your first set of files in under a minute.
- [API reference](https://lab451.org/docs/api):
  Endpoints, authentication, rate limits, response formats.
- [WordPress integration](https://lab451.org/docs/wordpress):
  Drop-in instructions for self-hosted and managed WordPress sites.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use the Optional section for ballast&lt;/strong&gt;&lt;br&gt;
Old blog posts, deep references, legal pages, anything you'd be happy for a model to read if it had spare context but wouldn't lose sleep over. The Optional section is permission to include them without crowding the must-reads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep the file under 10 KB if you can&lt;/strong&gt;&lt;br&gt;
The spec doesn't set a size limit, but in practice the entire llms.txt for most sites should fit comfortably under 10 KB. If yours is larger, you're probably listing pages that belong in the sitemap, not the llms.txt. The map is curated; the sitemap is comprehensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real examples worth copying&lt;/strong&gt;&lt;br&gt;
Three llms.txt files in the wild that get the format right and are worth studying:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.fastht.ml/docs/llms.txt" rel="noopener noreferrer"&gt;FastHTML&lt;/a&gt; — the original reference implementation by Jeremy Howard. Clean H1, tight blockquote, sensible H2 grouping, well-used Optional section. The canonical example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/llms.txt" rel="noopener noreferrer"&gt;Anthropic Docs&lt;/a&gt; — large documentation site, organized by product surface. Notice how the H2s map to how a developer would ask for help, not how the docs are filed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lab451.org/llms.txt" rel="noopener noreferrer"&gt;Lab451&lt;/a&gt; — short and direct. Useful as a template for marketing sites that don't need the full content treatment.&lt;/p&gt;

&lt;p&gt;If you study these in a text editor (right-click, View Source) you'll notice they all share something: there's no clever formatting. No bullet points masquerading as paragraphs, no headings used for emphasis, no nested structures. The spec rewards restraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common mistakes&lt;/strong&gt;&lt;br&gt;
The errors that quietly tank an llms.txt are mostly format violations the model parses silently and then ignores. Here's the list, in rough order of how often we see them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treating the H1 as a tagline&lt;/strong&gt;&lt;br&gt;
The H1 is supposed to be the name of the site or project, full stop. "# The world's best widget for marketers" is not a name. "# WidgetCo" is. Save the positioning for the blockquote.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skipping the blockquote&lt;/strong&gt;&lt;br&gt;
Without the blockquote, models lose the structural cue that tells them "this is the summary." They fall back to inferring it from the link descriptions or the page contents, which is the slow, lossy path you were trying to avoid in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linking to pages that 404 or redirect&lt;/strong&gt;&lt;br&gt;
Sounds obvious; happens constantly. A model that follows a link in your llms.txt and hits a 404 will deprioritize the whole file. Treat llms.txt as production output and check it the same way you check your sitemap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adding headings inside the H2 sections&lt;/strong&gt;&lt;br&gt;
The structure is: H1, optional blockquote, optional free-form Markdown, then H2s with link lists. You can't have an H3 inside an H2 link list. If you need to group further, make more H2s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stuffing keywords into descriptions&lt;/strong&gt;&lt;br&gt;
The traditional SEO instinct is to pack the description with target keywords. In an llms.txt context, this backfires — models are trained to be suspicious of unnatural language and may discount the file. Write descriptions the way you'd explain the page to a colleague.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting to update it&lt;/strong&gt;&lt;br&gt;
An llms.txt that's six months out of date and links to a pricing page that no longer exists is worse than no llms.txt at all. Models will cheerfully repeat your outdated information. Build regeneration into the same workflow you use for the sitemap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hosting, caching, and deployment&lt;/strong&gt;&lt;br&gt;
The file lives at /llms.txt at the root of your domain. Not /static/llms.txt, not /.well-known/llms.txt, not /seo/llms.txt. Crawlers look at the root. If you can serve /robots.txt, you can serve /llms.txt — the deployment story is identical.&lt;/p&gt;

&lt;p&gt;A few practical notes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content-Type:&lt;/strong&gt; serve as text/plain; charset=utf-8 or text/markdown; charset=utf-8. Either works. Don't serve as application/octet-stream — some crawlers will refuse it.&lt;br&gt;
&lt;strong&gt;Caching:&lt;/strong&gt; a Cache-Control: public, max-age=3600 header is fine. The file changes infrequently; an hour of cache saves you nothing in bandwidth but prevents stale-serving headaches after updates.&lt;br&gt;
&lt;strong&gt;HTTPS:&lt;/strong&gt; serve over HTTPS. Most crawlers will follow from HTTP, but some increasingly treat insecure responses as a quality signal.&lt;br&gt;
&lt;strong&gt;Subdomains:&lt;/strong&gt; each subdomain gets its own llms.txt. blog.example.com/llms.txt and docs.example.com/llms.txt are independent files. Models do not look "up" to the apex.&lt;br&gt;
&lt;strong&gt;Multilingual sites:&lt;/strong&gt; the convention is still settling. The pragmatic answer for now is one llms.txt per language subdirectory (/en/llms.txt, /fr/llms.txt) plus a default at the root. Don't try to mix languages in one file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measuring whether it works&lt;/strong&gt;&lt;br&gt;
The honest answer here is that GEO measurement is in roughly the same place that SEO measurement was in 2002. There's no Google Search Console for "how often does ChatGPT cite you." There are a few proxies, and that's about it.&lt;/p&gt;

&lt;p&gt;Things you can measure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crawler hits on llms.txt&lt;/strong&gt; in your server logs. Filter by user-agent (GPTBot, ClaudeBot, PerplexityBot, etc.). Frequency is your strongest signal that the file is being consumed.&lt;br&gt;
&lt;strong&gt;Referrer traffic from AI chat interfaces.&lt;/strong&gt; chat.openai.com, perplexity.ai, claude.ai, gemini.google.com. Numbers will be small but rising. Tag them in your analytics.&lt;br&gt;
&lt;strong&gt;Direct citation checks.&lt;/strong&gt; Periodically ask each major LLM "what is yoursite.com" and see what they say. Save the answers. Track changes after you update llms.txt. This is manual, annoying, and the most reliable signal you'll get in 2026.&lt;br&gt;
&lt;strong&gt;Brand mention monitoring.&lt;/strong&gt; Tools like Mention, Brandwatch, and newer GEO-focused services (Profound, Goodie, Otterly) scrape LLM responses at scale. The category is young; the tools are improving fast.&lt;/p&gt;

&lt;p&gt;What you can't reliably measure: ranking, share-of-voice against competitors, or click-through rate from AI answers. The industry hasn't built that infrastructure yet. Anyone who claims they have is selling something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where this is heading in 2026 and beyond&lt;/strong&gt;&lt;br&gt;
A few trends worth watching, beyond the basic adoption curve:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Markdown shadow pages&lt;/strong&gt;&lt;br&gt;
The same proposal that introduced llms.txt also suggests that important pages should be available at their URL with .md appended — so &lt;a href="https://example.com/docs/intro.html" rel="noopener noreferrer"&gt;https://example.com/docs/intro.html&lt;/a&gt; would also be reachable at &lt;a href="https://example.com/docs/intro.html.md" rel="noopener noreferrer"&gt;https://example.com/docs/intro.html.md&lt;/a&gt; as a clean Markdown version. The FastHTML and nbdev ecosystems already do this. Expect more documentation generators to follow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pay-per-crawl economics&lt;/strong&gt;&lt;br&gt;
Cloudflare's mid-2025 launch of bot-payment infrastructure introduced the idea that AI crawlers might pay micropayments for the content they fetch. By late 2026, expect llms.txt files to start including pricing metadata for their referenced URLs — a 402 Payment Required handshake for the high-value pages. This is speculative but plausible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification and signed manifests&lt;/strong&gt;&lt;br&gt;
As llms.txt adoption grows, so does the incentive for sites to lie in them. A pricing page that claims to be free, an outdated API spec presented as current — the trust problem is obvious. Expect 2027-era extensions that allow signed llms.txt files (probably via DNS TXT records or HTTPS certificate metadata) for sites that want to make verifiable claims about themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Convergence with schema.org&lt;/strong&gt;&lt;br&gt;
There's an obvious overlap between llms.txt's "tell models what this site is" purpose and schema.org's "tell search engines what this entity is" purpose. The two haven't merged; they're solving related problems with different tools. Watch for proposals that reference schema entities from inside llms.txt, or schema.org WebSite objects that point to llms.txt locations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frequently asked questions&lt;/strong&gt;&lt;br&gt;
Do I need an llms.txt if I already have a sitemap.xml?&lt;br&gt;
Yes. They serve different audiences and answer different questions. The sitemap tells indexers what exists. The llms.txt tells models what matters. Most sites should have both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will llms.txt help me rank in Google search?&lt;/strong&gt;&lt;br&gt;
Almost certainly not directly. Google has been clear that Googlebot doesn't use llms.txt as a ranking signal. Where it might help is in Google's AI Overviews, where Gemini does seem to weight clean, structured content sources. The honest answer: small, indirect, and not the main reason to ship one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I block AI crawlers in robots.txt instead of writing an llms.txt?&lt;/strong&gt;&lt;br&gt;
That's a separate decision. If you don't want AI models reading your content, block them in robots.txt and skip llms.txt entirely. If you do want them reading your content, write an llms.txt that makes them efficient at it. The two strategies aren't on a spectrum; they're opposite ends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I dynamically generate llms.txt server-side?&lt;/strong&gt;&lt;br&gt;
Yes, and many large sites do. The file is just text — there's no requirement that it be a static file. A common pattern is to generate it at build time, cache it for an hour, and regenerate on content changes. Just make sure the response is fast (under 200ms) and stable (returns the same content for the same URL within a reasonable window).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the difference between llms.txt and AI.txt?&lt;/strong&gt;&lt;br&gt;
ai.txt was a 2023 proposal from Spawning that focused on consent — telling AI training pipelines whether they could use your content. llms.txt, proposed a year later by Answer.AI, focuses on structure — telling AI models how to use your site at inference time. They're orthogonal; some sites have both. The industry's center of gravity is firmly on llms.txt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How often should I regenerate llms.txt?&lt;/strong&gt;&lt;br&gt;
Whenever your site's structure changes meaningfully — new product pages, restructured documentation, retired sections. For active sites, monthly is a reasonable cadence. For mostly-static sites, regenerate when you ship a notable update. Tying it to your existing sitemap-regeneration workflow is the cleanest pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need to submit llms.txt anywhere?&lt;/strong&gt;&lt;br&gt;
No. Unlike sitemap.xml (which you can submit to Google Search Console), there's no "submit to AI" button. Crawlers find /llms.txt by convention, same as they find /robots.txt. Put it at the right URL and the right crawlers will fetch it on their next visit.&lt;/p&gt;




&lt;p&gt;Can I dynamically generate llms.txt server-side?&lt;br&gt;
Yes, and many large sites do. The file is just text — there's no requirement that it be a static file. A common pattern is to generate it at build time, cache it for an hour, and regenerate on content changes. Just make sure the response is fast (under 200ms) and stable (returns the same content for the same URL within a reasonable window).&lt;/p&gt;

&lt;p&gt;What's the difference between llms.txt and AI.txt?&lt;br&gt;
ai.txt was a 2023 proposal from Spawning that focused on consent — telling AI training pipelines whether they could use your content. llms.txt, proposed a year later by Answer.AI, focuses on structure — telling AI models how to use your site at inference time. They're orthogonal; some sites have both. The industry's center of gravity is firmly on llms.txt.&lt;/p&gt;

&lt;p&gt;How often should I regenerate llms.txt?&lt;br&gt;
Whenever your site's structure changes meaningfully — new product pages, restructured documentation, retired sections. For active sites, monthly is a reasonable cadence. For mostly-static sites, regenerate when you ship a notable update. Tying it to your existing sitemap-regeneration workflow is the cleanest pattern.&lt;/p&gt;

&lt;p&gt;Do I need to submit llms.txt anywhere?&lt;br&gt;
No. Unlike sitemap.xml (which you can submit to Google Search Console), there's no "submit to AI" button. Crawlers find /llms.txt by convention, same as they find /robots.txt. Put it at the right URL and the right crawlers will fetch it on their next visit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Further reading and generating your llms.txt files&lt;/strong&gt;&lt;br&gt;
Lab451 produces a spec-compliant llms.txt — plus llms-full.txt, sitemap.xml, and robots.txt — for any public website. Free for sites under 50 pages (llms.txt and llms-plus.txt) files and up 100 sitemap.xml files. No signup, no plugin, no OAuth dance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lab451.org/" rel="noopener noreferrer"&gt;Lab451 - Generate your files →&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>showdev</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
