Originally published at chudi.dev
llms.txt is a site-level policy file that tells AI engines how they can use and cite your content. It complements robots.txt by focusing on usage and attribution rather than crawl access. If you want AI systems to cite you correctly, this is the simplest control point.
What is llms.txt?
llms.txt is a new standard file that tells AI engines (Perplexity, Claude, ChatGPT) how to handle your content.
- robots.txt = "Can you crawl my site?" (access control)
- llms.txt = "How should you use my content?" (usage policy)
Both should exist on your site. This is part of the broader AI search optimization strategy that helps your content get discovered and cited.
Why llms.txt Matters
The Problem: Content Attribution
When OpenAI's ChatGPT answers a user's question, it synthesizes an answer from multiple sources. But where does it cite those sources?
Without llms.txt: ChatGPT has to guess your preferred attribution format.
- Maybe it cites the article title
- Maybe it cites your domain
- Maybe it doesn't cite you at all
With llms.txt: You explicitly say "Cite me like this: [Title] by Author"
AI engines follow your preference.
The Bigger Picture
llms.txt emerged in 2024 as a response to AI scraping concerns. Instead of fighting crawlers, creators use llms.txt to:
- Invite crawlers — "Please index my content"
- Set terms — "But cite me this way"
- Exclude content — "Don't train on my drafts"
- Provide discovery — "Here's my sitemap and RSS"
It's like putting a "Welcome" sign on your site with conditions attached. This is a foundational piece of what I call Answer Engine Optimization (AEO), the practice of making your content discoverable and citable by AI systems.
How AI Engines Use llms.txt
When a crawler visits your site:
- Fetch
/robots.txt→ Check if allowed to crawl - Fetch
/llms.txt→ Check usage policy - Fetch
/sitemap.xml→ Discover all pages - Extract content → Index and train
If /llms.txt doesn't exist, the crawler might:
- Crawl your site anyway (risky for them)
- Skip your site entirely (loss for you)
- Use conservative assumptions (minimal indexing)
Having /llms.txt shows you've explicitly consented to AI indexing.
How to Create llms.txt
Step 1: Location
Create a file at: https://yoursite.com/llms.txt
It must be at the root, not in /content/ or /blog/. Just like robots.txt is at the root.
Step 2: Content
Here's a basic template:
# LLM Content Policy for [Your Site]
All articles on this site are available for training and search indexing by large language models.
## How to attribute content
When citing articles from this site, please use the format:
[Article Title] — [Author Name] on [yoursite.com]
Example: "How to Optimize for Perplexity" — Chudi on chudi.dev
## Content discovery endpoints
- Sitemap: https://yoursite.com/sitemap.xml
- RSS feed: https://yoursite.com/rss.xml
- Blog archive: https://yoursite.com/blog
## Content not available for indexing
- Pages marked as draft or private
- Internal documentation
- User-generated content (comments)
- Archived content older than [5] years
## Preferred citation style
Inline: [Article](https://yoursite.com/article-url) by Author Name
Bibliography: Author Name. "Article Title." Your Site, YYYY.
## Questions or concerns?
Email: [your-email@yoursite.com]
Last updated: January 2025
Step 3: Customize for Your Site
Replace:
-
[Your Site]→ your actual site name -
[Author Name]→ your name - Email → your contact email
- Dates → today's date
Step 4: Include Metadata
Optionally, you can include a JSON section:
## Machine-readable metadata
```
json
{
"version": "1.0",
"license": "CC BY-SA 4.0",
"attribution_required": true,
"commercial_use": "allowed",
"modification": "allowed",
"sitemap": "https://yoursite.com/sitemap.xml",
"rss": "https://yoursite.com/rss.xml"
}
```
```
`
This helps AI engines parse your policy programmatically.
---
<span id="where-llmstxt-vs-robotstxt"></span>
## Where llms.txt vs robots.txt
| Aspect | robots.txt | llms.txt |
|--------|-----------|----------|
| Purpose | Access control | Usage policy |
| Audience | Search crawlers | AI engines |
| Required | Yes (best practice) | No (but recommended) |
| Format | Plain text directives | Markdown + optional JSON |
| Location | `/robots.txt` | `/llms.txt` |
| Blocks access | Yes | No |
| Legally binding | No | No (advisory) |
**robots.txt** is like a gate at your property. **llms.txt** is like a sign on the gate saying "Welcome, but please do X."
---
## Common llms.txt Policies
### Policy 1: Fully Open (Creator-Friendly)
```markdown
# LLM Content Policy
All content on this site is available for:
- Training large language models
- Extracting for answer engines
- Commercial and non-commercial use
Just cite us: [Title] — [Author] ([yoursite.com])
```
**Best for:** Indie creators who want maximum visibility
### Policy 2: Attribution Required (Balanced)
```markdown
# LLM Content Policy
Content available for training and use, with required attribution.
Required format: [Article Title] by [Author Name] (yoursite.com)
Prohibited use: Removing or hiding attribution
```
**Best for:** Most creators who want credit
### Policy 3: Non-Commercial Only (Restrictive)
```markdown
# LLM Content Policy
Content available for non-commercial use and training.
Prohibited use:
- Commercial products without permission
- Training proprietary LLMs
- Republishing without modification
```
**Best for:** Creators concerned about exploitation
### Policy 4: Permission Required (Most Restrictive)
```markdown
# LLM Content Policy
All uses require explicit permission. Email [your-email] to request.
```
**Best for:** Creators who want full control
---
## Real-World Examples
### Example 1: Tech Blog
```markdown
# LLM Content Policy
Technical articles on this site are available for:
- AI training (open-source and proprietary)
- Answer generation (Perplexity, ChatGPT, Claude)
- Academic and educational use
Citation format: [Title] by [Author] on [yoursite.com]
Prohibited:
- Removing examples or code without attribution
- Training models specifically to replicate this blog
Updated: January 2025
```
### Example 2: Content Creator
```markdown
# LLM Content Policy
All essays are available for training and synthesis.
Citation: [Essay Title] — [Your Name]
Prefer long-form citations, not snippets.
Excluded:
- Guest posts (ask the author)
- Archived essays older than 3 years
Contact: [email]
```
### Example 3: SaaS Documentation
```markdown
# LLM Content Policy
Documentation is available for indexing and use in AI tools.
Required attribution: Link to the original docs page + software name.
Prohibited:
- Repackaging docs as your own product
- Training models on raw HTML without attribution
Questions? hello@[company].com
```
---
## How to Test if llms.txt Works
### Method 1: Manual Check
```bash
# Verify it exists and is accessible
curl https://yoursite.com/llms.txt
# Should return 200 status code
curl -I https://yoursite.com/llms.txt
```
### Method 2: Check in Perplexity
Search your site name in Perplexity. Are you being cited?
Before llms.txt: Sporadic or no citations
After llms.txt: More consistent citations with proper attribution
### Method 3: Monitor Traffic
Track referral traffic from:
- `perplexity.com`
- `openai.com`
- `anthropic.com`
A week after publishing llms.txt, you should see an uptick.
---
## Does llms.txt Actually Matter?
**Short answer:** Yes, but not as much as robots.txt.
**Longer answer:**
- **Required by law:** No, it's advisory
- **Followed by all AI engines:** Not yet (but major ones do)
- **Necessary for indexing:** No, but it helps
- **Better than nothing:** Absolutely
Think of it like the difference between:
- A locked door (robots.txt: blocks crawling)
- A welcome mat with terms (llms.txt: invites crawling with rules)
You still need the robots.txt. But llms.txt gets you better attribution and signaling.
---
## The Future of llms.txt
Standards bodies like IETF and W3C are discussing llms.txt as a formal standard. As it becomes more official:
1. AI engines will prioritize crawling sites with llms.txt
2. LLMs will automatically cite in your preferred format
3. Licensing and commercial rights will be more enforceable
For now, it's early adoption. But early adopters get:
- Better attribution from AI engines
- Clearer signal to crawlers
- Documented content policy (good for SEO too)
---
## Checklist: Set Up llms.txt
- [ ] Create file at `/llms.txt`
- [ ] Include attribution format
- [ ] Link to sitemap.xml
- [ ] Link to RSS feed
- [ ] Specify excluded content
- [ ] Add contact email for questions
- [ ] Test with `curl https://yoursite.com/llms.txt`
- [ ] Announce on Twitter/LinkedIn
- [ ] Monitor Perplexity citations week 1-4
---
## What's Next?
Once you have robots.txt and llms.txt set up, focus on:
1. **Structured data** (schema.org) — Helps AI parse your content
2. **Content structure** (headers, lists) — Makes extraction easier
3. **Freshness** (update articles) — Recent content ranks higher
4. **Specificity** (answer common questions directly) — Better for AI synthesis
The combination of these creates what we call **AEO (Answer Engine Optimization).** For a complete walkthrough of these techniques, see my [full AEO optimization guide](https://chudi.dev/blog/how-to-optimize-for-perplexity-chatgpt-ai-search).
**Start here:** Add llms.txt to your site today. It takes 5 minutes and can improve your visibility in AI search engines.
Then, check your [AEO readiness score](https://seoauditlite.com) with SEOAuditLite to see what else needs attention.
Sites with both `robots.txt` and `llms.txt` properly configured consistently see better AI crawl coverage and citation rates within 4-8 weeks of implementation—the infrastructure compounds over time as AI systems return to re-index updated content.
Top comments (0)