王凯

Posted on Mar 10

llms.txt: The New Standard for AI-Ready Websites

#seo #ai #webdev #tutorial

There is a quiet revolution happening in how websites communicate with AI systems. It is called llms.txt, and if you have not heard of it yet, you will soon.

The Problem llms.txt Solves

AI language models have a context window problem. When a crawler visits your website, it encounters hundreds or thousands of pages: navigation menus, footers, legal pages, marketing copy, documentation, blog posts, changelogs, and everything in between. The signal-to-noise ratio is terrible.

Traditional sitemaps (sitemap.xml) tell crawlers where your pages are, but nothing about what matters. Your API documentation and your cookie policy get equal treatment. An AI system has to figure out on its own which content is authoritative, which is primary, and which is peripheral.

llms.txt fixes this by giving you a way to tell AI systems: "Here is what my site is about, and here are the pages that matter most."

What Is llms.txt?

The llms.txt specification, proposed by Jeremy Howard (of fast.ai), defines a simple Markdown file placed at your domain root that provides a structured overview of your website for AI consumption.

It lives at https://yourdomain.com/llms.txt and follows a straightforward format:

# Site Name

> A brief description of what this site/organization does.

## Section Name
- [Page Title](https://example.com/page): Brief description of this page
- [Another Page](https://example.com/other): What this page covers

## Another Section
- [Resource](https://example.com/resource): Description

That is it. Markdown headings, blockquote descriptions, and bullet-pointed links with annotations.

A Real-World Example

Here is what an llms.txt might look like for a SaaS company:

# Acme Analytics

> Acme Analytics is a real-time web analytics platform focused on privacy-first tracking for businesses of all sizes.

## Core Documentation
- [Getting Started](https://acme.com/docs/start): Installation guide for the tracking script, covering all major frameworks
- [API Reference](https://acme.com/docs/api): Complete REST API documentation with authentication, endpoints, and rate limits
- [Data Model](https://acme.com/docs/data-model): How events, sessions, and users are structured

## Product Pages
- [Features Overview](https://acme.com/features): Full list of platform capabilities
- [Pricing](https://acme.com/pricing): Plans, limits, and enterprise options
- [Integrations](https://acme.com/integrations): Supported platforms and third-party tools

## Blog (Selected)
- [Why We Built Acme](https://acme.com/blog/why-we-built-acme): Founding story and product philosophy
- [Privacy-First Analytics](https://acme.com/blog/privacy-first): Our approach to GDPR-compliant tracking without cookies

## Legal
- [Privacy Policy](https://acme.com/privacy): How we handle user data
- [Terms of Service](https://acme.com/terms): Service agreement

Notice how the file curates content. It does not list every blog post -- just the ones that define the company's thinking. It highlights documentation over marketing pages. It provides context that a crawler would otherwise have to infer.

llms.txt vs llms-full.txt

The specification also defines an optional llms-full.txt file. While llms.txt contains links and summaries, llms-full.txt includes the actual content of your key pages in a single document, pre-formatted for LLM consumption.

This is useful for sites with content behind JavaScript rendering or complex navigation that crawlers might struggle with. It is also helpful when you want to ensure AI systems get the exact text you intend, without parsing artifacts.

Why Should You Care?

Three practical reasons:

1. AI Citation Quality

When ChatGPT, Perplexity, or Gemini cite your website, the quality of that citation depends on how well the AI understood your content. An llms.txt file helps the AI system quickly identify your most authoritative pages, increasing the chance that citations point to the right content rather than a random blog post or your 404 page.

2. Context Window Efficiency

LLMs have limited context windows. By providing a curated summary, you help AI systems spend their context budget on your best content rather than your cookie banner.

3. Competitive Advantage

Adoption is still early. Most websites do not have an llms.txt file. Adding one now puts you ahead of competitors who have not started thinking about AI discoverability.

How to Implement llms.txt

Step 1: Audit Your Content

Before writing anything, identify your most important pages. Ask yourself:

What pages would I want an AI to cite when someone asks about my product?
What documentation is essential for understanding what we do?
Which blog posts represent our core thinking?

Step 2: Write the File

Follow the format above. Keep descriptions concise -- one sentence per link. Use sections to organize logically.

Step 3: Deploy It

Place the file at your domain root so it is accessible at https://yourdomain.com/llms.txt.

For static sites, just drop the file in your public directory. For dynamic frameworks:

Next.js -- place in public/llms.txt

Django -- add a URL route:

# urls.py
from django.views.generic import TemplateView

urlpatterns = [
    path('llms.txt', TemplateView.as_view(
        template_name='llms.txt',
        content_type='text/plain'
    )),
]

Nginx -- serve directly:

location = /llms.txt {
    alias /var/www/site/llms.txt;
    default_type text/plain;
}

Express.js:

const path = require('path');

app.get('/llms.txt', (req, res) => {
  res.type('text/plain');
  res.sendFile(path.join(__dirname, 'public', 'llms.txt'));
});

Step 4: Verify

After deployment, check that the file is accessible and properly formatted. You can use GEOScore to scan your site -- it checks for llms.txt presence as one of its 11 GEO signals, alongside robots.txt configuration, structured data, and other AI readiness factors. The scan is free and takes about 30 seconds.

Common Mistakes

Listing every page. The point is curation. If your llms.txt has 500 links, it defeats the purpose. Aim for 10-50 of your most important pages.

No descriptions. Bare links without context force the AI to visit each page to understand what it is about. The one-line descriptions are not optional -- they are the most valuable part of the file.

Stale content. If you link to pages that no longer exist or have moved, you are worse off than having no llms.txt at all. Add it to your deployment checklist: when you restructure your site, update llms.txt.

Wrong content type. Serve it as text/plain or text/markdown, not text/html. Some servers default to HTML, which can cause parsing issues.

Who Is Adopting It?

As of early 2026, adoption is growing steadily. Several major documentation platforms have added llms.txt support, and static site generators are beginning to include it as a build option. Cloudflare, Vercel, and Netlify have all published guides on implementing it.

The specification is still technically a proposal, not a ratified standard. But given the pace of AI search growth, waiting for formal standardization means falling behind. The file is simple, low-risk, and takes 15 minutes to create.

Getting Started Today

List your 10-30 most important pages.
Write one-sentence descriptions for each.
Organize them into logical sections.
Save as llms.txt at your domain root.
Scan your site at geoscoreai.com to verify it is detected.

The sites that are easiest for AI to understand will be the ones that get cited most. llms.txt is the simplest step you can take to make that happen.

DEV Community