Alexandre Caramaschi

Posted on Mar 20 • Edited on Mar 24

The Complete Guide to llms.txt: Make Your Site Readable by AI

#webdev #ai #tutorial #opensource

If you've optimized for Google with robots.txt and sitemap.xml, you already understand the idea: give crawlers a structured entry point to your content. The llms.txt file does the same thing — but for large language models.

This guide covers everything you need to implement llms.txt from scratch, validate it, and deploy it on your stack.

What Is llms.txt?

llms.txt is a proposed specification that provides LLMs with a machine-readable summary of your site's content. While robots.txt tells crawlers what they can access, llms.txt tells AI engines what they should read and how your site is organized.

The file lives at the root of your domain:

https://yourdomain.com/llms.txt

File Structure and Syntax

An llms.txt file uses a simple markdown-like format:

# Your Site Name

> A one-line description of what your site or product does.

## Docs

- [Getting Started](https://yourdomain.com/docs/getting-started): How to set up the product.
- [API Reference](https://yourdomain.com/docs/api): Complete REST API documentation.

## Blog

- [Launching v2.0](https://yourdomain.com/blog/v2-launch): Announcement of version 2.0.
- [Performance Guide](https://yourdomain.com/blog/performance): Benchmarks and optimization tips.

## Optional

- [Pricing](https://yourdomain.com/pricing): Plans and pricing details.
- [About](https://yourdomain.com/about): Company background and team.

Syntax Rules

Title (required): A single # heading with your site/product name.
Description (required): A blockquote (>) with a one-sentence summary.
Sections (required, at least one): ## headings that group your links.
Links (required): Markdown links in list format: - [Label](URL): Description.
Optional section: A ## Optional section for pages that are useful but not critical.

What to Include

Focus on pages with high informational density:

Documentation and guides
Technical blog posts
API references
Product descriptions
Key landing pages with substantive content

Avoid thin pages, login screens, terms of service, or paginated list views.

Building llms.txt for a Next.js Site

Option A: Static File

Place the file in your public/ directory:

your-nextjs-project/
  public/
    llms.txt
    llms-full.txt

Option B: Dynamic Route (App Router)

// app/llms.txt/route.ts
import { getAllPosts } from '@/lib/content';

export async function GET() {
  const posts = await getAllPosts();
  const blogLinks = posts
    .map((post) => `- [${post.title}](https://yourdomain.com/blog/${post.slug}): ${post.description}`)
    .join('\n');

  const content = `# Your Site Name\n\n> One-line description.\n\n## Blog\n\n${blogLinks}\n`;

  return new Response(content, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control': 'public, max-age=86400',
    },
  });
}

Validation

Simple Python validator:

import re, sys

def validate_llms_txt(filepath):
    errors = []
    with open(filepath, 'r', encoding='utf-8') as f:
        lines = f.read().strip().split('\n')
    if not lines or not lines[0].startswith('# '):
        errors.append("Missing title heading.")
    has_blockquote = any(l.strip().startswith('> ') for l in lines[:5])
    if not has_blockquote:
        errors.append("Missing description blockquote.")
    sections = [l for l in lines if l.startswith('## ')]
    if not sections:
        errors.append("Missing sections.")
    link_pattern = re.compile(r'^- \[.+\]\(https?://.+\)')
    links = [l for l in lines if link_pattern.match(l.strip())]
    if not links:
        errors.append("No valid links found.")
    return errors

if __name__ == '__main__':
    fp = sys.argv[1] if len(sys.argv) > 1 else 'llms.txt'
    errors = validate_llms_txt(fp)
    if errors:
        print(f"Failed with {len(errors)} error(s):")
        for e in errors: print(f"  - {e}")
        sys.exit(1)
    else:
        print("llms.txt is valid.")

Deploying and Linking

Add a reference in robots.txt:

# llms.txt — structured content map for AI engines
# https://yourdomain.com/llms.txt

Add a <link> tag in your HTML head:

<link rel="alternate" type="text/plain" href="/llms.txt" title="LLMs Content Map" />

Common Mistakes

Using relative URLs. Always use full absolute URLs.
Including hundreds of links. Keep it focused — 20-40 high-value links is ideal.
Forgetting the description blockquote. LLMs use this to understand your site's purpose.
Not updating after content changes.
Blocking AI crawlers in robots.txt while serving llms.txt.

Summary

llms.txt is a lightweight, zero-dependency way to help AI engines understand your site. It takes 15 minutes to implement and can meaningfully improve how LLMs reference your content.

The specification is still evolving. Track the latest at llmstxt.org.

I'm Alexandre Caramaschi, CEO of Brasil GEO. I write about Generative Engine Optimization — the practice of making your content visible to AI search. More at alexandrecaramaschi.com.

DEV Community