Large Language Models have made it into mainstream fields of technologies, beyond code generation, beyond documentation and quite significantly into many sorts of human-computer interactions. How do we give these LLMs more true context so that they do not hallucinate? so that these models, whether GPT-4o, Claude 3.7 Sonnet, or any other, can be more reliable, trustworthy vessel of information? Meet the llms.txt
file format.
What is the LLMs.txt file?
The llms.txt
file is a newly proposed standard that is intended to provide large language models with relevant context and metadata in the form of a simple text file (it may be formatted as plain-text markdown ascii).
What are the use-cases for LLMs.txt file?
Originally, the llms.txt
file was intended to be used in the context of allowing AI-based agents processes to more easily scrape data off of websites so that these self-learning and autonomous agents do not need to deal with HTML parsing, loading JavaScript and any other web scraping struggles. Instead, websites can provide a simple llms.txt
file that contains the relevant context for each page, and LLMs can easily and quickly digest them without requiring further compute for parsing.
LLMs.txt for Websites
Due to the directory structure of websites, you can generate and plant llms.txt
files in the root directory of your website but also they can be placed in documentation subdomain to allow GenAI code assistants to better embed and create the context for code snippets and suggested code examples. Some examples of these websites include:
- The Turbo build tool: https://turbo.build/llms.txt
- Anthropic’s documentation: https://docs.anthropic.com/llms.txt
- Dotenv’s all around favorite Node.js environment variable management tool: https://dotenvx.com/llms.txt
- CrewAI agentic framework docs: https://docs.crewai.com/llms.txt
We’re already seeing emerging llms context related tools such as llmstxt python project that compresses files into a single, LLM-friendly text file designed to get codebases ready for analysis by Large Language Models.
What’s next for LLMs.txt?
Given that contextual information is at the core of LLM integrations and agentic frameworks, are we going to see llms.txt
in different shapes and forms, making it to more than just websites?
I personally think so. Some ideas that come to mind are to put llms.txt files in the following hubs as a starting point:
- GitHub repositories
- DockerHub images
- The npm registry
LLMs.txt Directory
With the newly proposed llms.txt
file standard, new directories have been emerging that index the llmstxt file format and allow to search and discover websites that have embraced this new file format. Some of which are:
- LLMs.txt Hub: https://llmstxthub.com/
- LLMStxt Site: https://llmstxt.site/
Next up
LLMs are more ubiquitous than ever, but if you don’t want to risk privacy or spend, learn how to run a local LLM for inference with an offline-first approach.
Top comments (0)