AI assistants like ChatGPT, Claude, and Perplexity are increasingly crawling the web for context. But most websites aren't optimised for AI readability — they're built for human browsers with complex HTML, JavaScript navigation, and boilerplate-heavy layouts.
The llms.txt standard is changing this. It's a simple convention: place a llms.txt file at your site root that gives AI systems clean, structured content they can actually understand.
I built a tool that generates these files automatically for any website.
What is llms.txt?
Think of it as robots.txt but for LLMs. Three files form the standard:
-
llms.txt— A curated summary of your site with key links -
llms-full.txt— Complete site content in clean markdown - Per-page data — Structured JSON with extracted content per URL
The Generator
The llms.txt Generator crawls any website using BFS (Breadth-First Search) and:
- Respects configurable crawl depth and URL filters
- Extracts clean content via trafilatura (not regex — actual text extraction)
- Outputs markdown or plaintext
- Handles JavaScript-rendered pages
- Produces both summary and full-content files
Why This Matters for SEO
Traditional SEO targets Google's crawler. But a new category is emerging: SEO for AI.
When a user asks ChatGPT "what is [your product]?, the AI searches its training data and web results. If your site has a clean llms.txt, the AI gets structured, accurate content instead of parsing your homepage HTML.
Input Parameters
| Parameter | Default | Description |
|---|---|---|
startUrls |
required | Website URLs to crawl |
maxPages |
50 | Maximum pages to process |
outputFormat |
markdown | Output format (markdown/plaintext) |
includePatterns |
[] | URL patterns to include |
excludePatterns |
[] | URL patterns to exclude |
Example: Documenting a Python Library
I tested it on Pydantic's documentation (docs.pydantic.dev). The crawler:
- Started at the root docs page
- Followed internal links via BFS
- Extracted clean content from each page
- Produced a structured dataset with per-page markdown
Result: 2 pages processed, full content extracted with zero boilerplate.
Try It
Live on the Apify Store: llms.txt Generator
Pricing is $0.01 per page processed. Free tier covers ~50 pages.
The llms.txt standard is still emerging, but early adopters will have an advantage as AI-driven search grows. Is your website AI-readable?
Top comments (0)