DEV Community

Eleanor Brooks
Eleanor Brooks

Posted on

How to Generate and Manage LLM.txt Files for AI Crawlers Using Python

AI crawlers are becoming more common, and controlling how they access your site is important. I've been experimenting with LLM.txt files to set access rules for AI bots. Here's a Python script that uses the SERPSpur LLM.txt Generator to create and manage these files:

python
import requests

API_KEY = "your_api_key_here"

def generate_llm_txt(domain, rules):
url = "https://serpspur.com/tool/llms-txt-generator-tool/"
payload = {
"domain": domain,
"rules": rules, # e.g., {"allow": ["/blog/"], "disallow": ["/admin/"]}
"api_key": API_KEY
}
response = requests.post(url, json=payload)
if response.status_code == 200:
with open("llm.txt", "w") as f:
f.write(response.text)
print("LLM.txt generated!")
else:
print(f"Error: {response.status_code}")

Example

rules = {"allow": ["/public/"], "disallow": ["/private/", "/api/"]}
generate_llm_txt("mysite.com", rules)

This lets me define which sections AI crawlers can access. It's a neat way to protect sensitive content while keeping public pages open. Have you set up LLM.txt for your site yet?

Top comments (1)

Collapse
 
dylan_parker123 profile image
Dylan Parker

Nice approach! I've been manually editing robots.txt for AI bots, but LLM.txt sounds like a more structured way to define access rules. Do you find that most major crawlers respect this file, or is it still early days for adoption?