Why I keep sitemap.xml, robots.txt, and llms.txt separate

#seo #ai #webdev #opensource

I have been cleaning up the public structure for a small B2B matching project, and one thing became clearer than I expected: sitemap.xml, robots.txt, and llms.txt should not try to do the same job.

I used to think about this mostly as an indexing problem. Can search engines find the important pages? Are the routes listed? Is anything blocked by mistake?

That still matters, but it is only one layer.

The way I now split it:

sitemap.xml is the page list. It should stay boring and stable.
robots.txt is the crawl boundary. It tells crawlers what is allowed and what should be left alone.
llms.txt is closer to a project note. It explains what the site is, which pages matter, and how not to misread the product.

For a B2B supplier matching product, that last part matters. The useful context is not just "we have buyer pages and supplier pages." It is what buyer intent means, what a supplier capability record means, and where the matching logic should be treated with caution.

I put the current plain-text version here:
https://maplebridge.io/llms.txt

I do not expect this file to magically create traffic. The practical value for me is simpler: it forces the product structure to be written down in a way that a person, a crawler, or a tool can inspect without guessing.

DEV Community

Why I keep sitemap.xml, robots.txt, and llms.txt separate

Top comments (0)