DEV Community

Emma Watson
Emma Watson

Posted on

How I Automated LLM.txt Generation to Control AI Crawler Access with Python

I recently needed to control how AI crawlers access my documentation site, so I built a small tool to generate LLM.txt files. Here's how you can do it with the SERPSpur API:

python
import requests

API_KEY = "your_api_key_here"

def generate_llm_txt(rules):
response = requests.post(
"https://api.serpspur.com/v1/llm-txt/generate",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"rules": rules}
)
return response.text

Example rules to allow only certain AI agents

rules = {
"user_agent": ["GPTBot", "Google-Extended"],
"disallow": ["/private/", "/api/"],
"allow": ["/public/", "/docs/"]
}

llm_content = generate_llm_txt(rules)
print(llm_content)

This gives you fine-grained control over which AI systems can access your content and what they see. Have you implemented any LLM.txt configurations for your projects?

Top comments (3)

Collapse
 
6d94c35eb04ca profile image
Sophia

Interesting approach! I've been manually crafting robots.txt variations for different crawlers, but this seems much more maintainable at scale. Do you handle rate-limiting or caching for repeated generation calls?

Collapse
 
dylan_parker123 profile image
Dylan Parker

That's a smart way to handle AI crawlers. I've set up LLM.txt for my blog, but had trouble configuring it for multiple user agents with overlapping rules. Does your tool handle priority conflicts well?

Collapse
 
frishayltd6 profile image
FrishayLTD

Interesting use case. I've been relying on robots.txt for crawler control, but LLM.txt seems like a good supplement for AI-specific instructions. Are there any gotchas when both files are present?