If you've optimized for Google with robots.txt and sitemap.xml, you already understand the idea: give crawlers a structured entry point to your content. The llms.txt file does the same thing — but for large language models.
This guide covers everything you need to implement llms.txt from scratch, validate it, and deploy it on your stack.
What Is llms.txt?
llms.txt is a proposed specification that provides LLMs with a machine-readable summary of your site's content. While robots.txt tells crawlers what they can access, llms.txt tells AI engines what they should read and how your site is organized.
The file lives at the root of your domain:
https://yourdomain.com/llms.txt
File Structure and Syntax
An llms.txt file uses a simple markdown-like format:
# Your Site Name
> A one-line description of what your site or product does.
## Docs
- [Getting Started](https://yourdomain.com/docs/getting-started): How to set up the product.
- [API Reference](https://yourdomain.com/docs/api): Complete REST API documentation.
## Blog
- [Launching v2.0](https://yourdomain.com/blog/v2-launch): Announcement of version 2.0.
- [Performance Guide](https://yourdomain.com/blog/performance): Benchmarks and optimization tips.
## Optional
- [Pricing](https://yourdomain.com/pricing): Plans and pricing details.
- [About](https://yourdomain.com/about): Company background and team.
Syntax Rules
-
Title (required): A single
#heading with your site/product name. -
Description (required): A blockquote (
>) with a one-sentence summary. -
Sections (required, at least one):
##headings that group your links. -
Links (required): Markdown links in list format:
- [Label](URL): Description. -
Optional section: A
## Optionalsection for pages that are useful but not critical.
What to Include
Focus on pages with high informational density:
- Documentation and guides
- Technical blog posts
- API references
- Product descriptions
- Key landing pages with substantive content
Avoid thin pages, login screens, terms of service, or paginated list views.
Building llms.txt for a Next.js Site
Option A: Static File
Place the file in your public/ directory:
your-nextjs-project/
public/
llms.txt
llms-full.txt
Option B: Dynamic Route (App Router)
// app/llms.txt/route.ts
import { getAllPosts } from '@/lib/content';
export async function GET() {
const posts = await getAllPosts();
const blogLinks = posts
.map((post) => `- [${post.title}](https://yourdomain.com/blog/${post.slug}): ${post.description}`)
.join('\n');
const content = `# Your Site Name\n\n> One-line description.\n\n## Blog\n\n${blogLinks}\n`;
return new Response(content, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'public, max-age=86400',
},
});
}
Validation
Simple Python validator:
import re, sys
def validate_llms_txt(filepath):
errors = []
with open(filepath, 'r', encoding='utf-8') as f:
lines = f.read().strip().split('\n')
if not lines or not lines[0].startswith('# '):
errors.append("Missing title heading.")
has_blockquote = any(l.strip().startswith('> ') for l in lines[:5])
if not has_blockquote:
errors.append("Missing description blockquote.")
sections = [l for l in lines if l.startswith('## ')]
if not sections:
errors.append("Missing sections.")
link_pattern = re.compile(r'^- \[.+\]\(https?://.+\)')
links = [l for l in lines if link_pattern.match(l.strip())]
if not links:
errors.append("No valid links found.")
return errors
if __name__ == '__main__':
fp = sys.argv[1] if len(sys.argv) > 1 else 'llms.txt'
errors = validate_llms_txt(fp)
if errors:
print(f"Failed with {len(errors)} error(s):")
for e in errors: print(f" - {e}")
sys.exit(1)
else:
print("llms.txt is valid.")
Deploying and Linking
Add a reference in robots.txt:
# llms.txt — structured content map for AI engines
# https://yourdomain.com/llms.txt
Add a <link> tag in your HTML head:
<link rel="alternate" type="text/plain" href="/llms.txt" title="LLMs Content Map" />
Common Mistakes
- Using relative URLs. Always use full absolute URLs.
- Including hundreds of links. Keep it focused — 20-40 high-value links is ideal.
- Forgetting the description blockquote. LLMs use this to understand your site's purpose.
- Not updating after content changes.
- Blocking AI crawlers in robots.txt while serving llms.txt.
Summary
llms.txt is a lightweight, zero-dependency way to help AI engines understand your site. It takes 15 minutes to implement and can meaningfully improve how LLMs reference your content.
The specification is still evolving. Track the latest at llmstxt.org.
I'm Alexandre Caramaschi, CEO of Brasil GEO. I write about Generative Engine Optimization — the practice of making your content visible to AI search. More at alexandrecaramaschi.com.
Top comments (0)