What is an LLMs.txt File?

Large Language Models have made it into mainstream fields of technologies, beyond code generation, beyond documentation and quite significantly into many sorts of human-computer interactions. How do we give these LLMs more true context so that they do not hallucinate? so that these models, whether GPT-4o, Claude 3.7 Sonnet, or any other, can be more reliable, trustworthy vessel of information? Meet the llms.txt file format.

What is the LLMs.txt file?

The llms.txt file is a newly proposed standard that is intended to provide large language models with relevant context and metadata in the form of a simple text file (it may be formatted as plain-text markdown ascii).

What are the use-cases for LLMs.txt file?

Originally, the llms.txt file was intended to be used in the context of allowing AI-based agents processes to more easily scrape data off of websites so that these self-learning and autonomous agents do not need to deal with HTML parsing, loading JavaScript and any other web scraping struggles. Instead, websites can provide a simple llms.txt file that contains the relevant context for each page, and LLMs can easily and quickly digest them without requiring further compute for parsing.

LLMs.txt for Websites

Due to the directory structure of websites, you can generate and plant llms.txt files in the root directory of your website but also they can be placed in documentation subdomain to allow GenAI code assistants to better embed and create the context for code snippets and suggested code examples. Some examples of these websites include:

The Turbo build tool: https://turbo.build/llms.txt
Anthropic’s documentation: https://docs.anthropic.com/llms.txt
Dotenv’s all around favorite Node.js environment variable management tool: https://dotenvx.com/llms.txt
CrewAI agentic framework docs: https://docs.crewai.com/llms.txt

We’re already seeing emerging llms context related tools such as llmstxt python project that compresses files into a single, LLM-friendly text file designed to get codebases ready for analysis by Large Language Models.

What’s next for LLMs.txt?

Given that contextual information is at the core of LLM integrations and agentic frameworks, are we going to see llms.txt in different shapes and forms, making it to more than just websites?

I personally think so. Some ideas that come to mind are to put llms.txt files in the following hubs as a starting point:

GitHub repositories
DockerHub images
The npm registry

LLMs.txt Directory

With the newly proposed llms.txt file standard, new directories have been emerging that index the llmstxt file format and allow to search and discover websites that have embraced this new file format. Some of which are: