DEV Community

hey atlas
hey atlas

Posted on • Originally published at github.com

I built a GitHub Action that fails CI when your llms.txt is broken

If you have added an llms.txt to your site, here is the uncomfortable part: nothing tells you when it breaks. A missing title, a malformed link, a relative URL that an AI fetcher cannot resolve, and your carefully curated file just gets skipped. Silently. So I built a tiny GitHub Action that lints llms.txt on every push and fails the build when it is wrong.

Quick refresher: what is llms.txt?

llms.txt is a small markdown file at the root of your site that hands large language models a curated map of your best pages. It is the AI-search cousin of robots.txt and sitemap.xml: instead of letting a crawler guess, you tell ChatGPT, Perplexity, Claude and Google AI Overviews exactly what to read and cite.

The format is deliberately simple:

# Your Site

> One-line summary a model reads first.

## Section name
- [Page title](https://example.com/page): short note on what it is.

## Optional
- [Lower-priority page](https://example.com/extra): models may skip this to save context.
Enter fullscreen mode Exit fullscreen mode

The catch is that "simple" is not the same as "hard to get wrong". The H1 is the only strictly required element, links must be real markdown link bullets, and an Optional section has special meaning. Those are exactly the things you forget at 1am.

Why a CI check

I treat llms.txt like any other build artifact. If a broken sitemap fails CI, a broken AI-readability file should too. The rules I wanted enforced:

  • Errors (break the build): the file exists and is non-empty, there is exactly one H1 title and it comes first, every link bullet is well-formed - [name](url): notes, and no URL is empty.
  • Warnings (optional break): a blockquote summary sits right under the title, links use absolute https:// URLs, every link has a : description, sections use H2, no empty sections, no duplicate URLs, and an Optional section exists.

The Action

Zero dependencies, pure Python standard library, so it runs in about a second on a stock runner with no setup step:

name: Validate llms.txt
on: [push, pull_request]

jobs:
  llms-txt:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: atlashey-collab/llms-txt-action@v1
        with:
          target: public/llms.txt   # or a live URL like https://example.com
Enter fullscreen mode Exit fullscreen mode

You can point target at a file path or a deployed URL (a bare site URL gets /llms.txt appended). Flip fail-on-warning: true for strict mode. Every run drops a table of H1 / sections / links / errors / warnings into the job summary, with per-line messages.

It is MIT licensed and the full validator is one readable file: github.com/atlashey-collab/llms-txt-action.

Run it locally too

curl -sO https://raw.githubusercontent.com/atlashey-collab/llms-txt-action/v1/validate_llms_txt.py
python3 validate_llms_txt.py llms.txt
python3 validate_llms_txt.py https://example.com --fail-on-warning
Enter fullscreen mode Exit fullscreen mode

Exit codes are CI-friendly: 0 valid, 1 validation failed, 2 usage error.

Honest caveat

llms.txt is a young convention. Adoption by the big AI engines is still uneven, and a valid file does not guarantee citations. But the cost of keeping it correct is now zero, and getting cited is impossible if the file is broken. That trade is easy.

If you do not have one yet, write a spec-compliant file first, then wire up the check. Either way, stop shipping a broken llms.txt and not knowing.

Top comments (0)