DEV Community

Oaida Adrian
Oaida Adrian

Posted on • Originally published at apify.com

Make Any Website AI-Readable: Generating llms.txt Files with Python

AI assistants like ChatGPT, Claude, and Perplexity are increasingly crawling the web for context. But most websites aren't optimised for AI readability — they're built for human browsers with complex HTML, JavaScript navigation, and boilerplate-heavy layouts.

The llms.txt standard is changing this. It's a simple convention: place a llms.txt file at your site root that gives AI systems clean, structured content they can actually understand.

I built a tool that generates these files automatically for any website.

What is llms.txt?

Think of it as robots.txt but for LLMs. Three files form the standard:

  • llms.txt — A curated summary of your site with key links
  • llms-full.txt — Complete site content in clean markdown
  • Per-page data — Structured JSON with extracted content per URL

The Generator

The llms.txt Generator crawls any website using BFS (Breadth-First Search) and:

  • Respects configurable crawl depth and URL filters
  • Extracts clean content via trafilatura (not regex — actual text extraction)
  • Outputs markdown or plaintext
  • Handles JavaScript-rendered pages
  • Produces both summary and full-content files

Why This Matters for SEO

Traditional SEO targets Google's crawler. But a new category is emerging: SEO for AI.

When a user asks ChatGPT "what is [your product]?, the AI searches its training data and web results. If your site has a clean llms.txt, the AI gets structured, accurate content instead of parsing your homepage HTML.

Input Parameters

Parameter Default Description
startUrls required Website URLs to crawl
maxPages 50 Maximum pages to process
outputFormat markdown Output format (markdown/plaintext)
includePatterns [] URL patterns to include
excludePatterns [] URL patterns to exclude

Example: Documenting a Python Library

I tested it on Pydantic's documentation (docs.pydantic.dev). The crawler:

  1. Started at the root docs page
  2. Followed internal links via BFS
  3. Extracted clean content from each page
  4. Produced a structured dataset with per-page markdown

Result: 2 pages processed, full content extracted with zero boilerplate.

Try It

Live on the Apify Store: llms.txt Generator

Pricing is $0.01 per page processed. Free tier covers ~50 pages.


The llms.txt standard is still emerging, but early adopters will have an advantage as AI-driven search grows. Is your website AI-readable?

Top comments (0)