How I Automated LLM.txt Generation to Control AI Crawler Access with Python

I recently needed to control how AI crawlers access my documentation site, so I built a small tool to generate LLM.txt files. Here's how you can do it with the SERPSpur API:

python
import requests

API_KEY = "your_api_key_here"

def generate_llm_txt(rules):
response = requests.post(
"https://api.serpspur.com/v1/llm-txt/generate",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"rules": rules}
)
return response.text

Example rules to allow only certain AI agents

rules = {
"user_agent": ["GPTBot", "Google-Extended"],
"disallow": ["/private/", "/api/"],
"allow": ["/public/", "/docs/"]
}

llm_content = generate_llm_txt(rules)
print(llm_content)

This gives you fine-grained control over which AI systems can access your content and what they see. Have you implemented any LLM.txt configurations for your projects?

Top comments (3)

Sophia • Jun 12

Interesting approach! I've been manually crafting robots.txt variations for different crawlers, but this seems much more maintainable at scale. Do you handle rate-limiting or caching for repeated generation calls?

Dylan Parker • Jun 13

That's a smart way to handle AI crawlers. I've set up LLM.txt for my blog, but had trouble configuring it for multiple user agents with overlapping rules. Does your tool handle priority conflicts well?

FrishayLTD • Jun 12

Interesting use case. I've been relying on robots.txt for crawler control, but LLM.txt seems like a good supplement for AI-specific instructions. Are there any gotchas when both files are present?