Moshe Simantov

Posted on Apr 16 • Originally published at neuledge.com

llms.txt Is Just a Table of Contents. Most AI Tools Stop There.

#ai #documentation #llm #tooling

If you've spent any time in the AI tooling space recently, you've probably seen llms.txt popping up everywhere. React Aria, Anthropic, Svelte, Next.js, MUI — a growing list of projects now ship an llms.txt file at their site root. The idea, proposed by Jeremy Howard and inspired by robots.txt and sitemap.xml, is simple: give AI tools a structured entry point to your documentation.

But there's a catch that most tools miss. An llms.txt file is a discovery index — a table of contents with section headers and links to the actual documentation pages. It is not the documentation itself. And most tools that claim llms.txt support stop at reading the index.

What llms.txt actually is (and isn't)

An llms.txt file is a single Markdown file at a site's root that lists documentation sections and links to detail pages. Think of it as a map — it tells you what exists and where to find it.

The structure is straightforward: headings group topics, and each topic links to one or more documentation pages. Some sites also publish an llms-full.txt that bundles everything inline — but most don't, because their documentation is too large to fit in a single file.

The important distinction: llms.txt is a discovery mechanism, not a documentation format. It points to docs. It doesn't contain them.

This matters because of how tools use it.

The "half the answer" problem

Here's what happens with most llms.txt-aware tools today: they fetch the llms.txt file, feed it to the model, and call it done. Your AI assistant gets a list of section titles and one-line descriptions — a menu, not a meal.

Take a real example. A popular framework's llms.txt is about 84 KB with 8 major sections. That sounds like a lot, but it's almost entirely links and brief descriptions. The actual documentation — the API signatures, code examples, migration guides, edge cases — lives behind those links. Without following them, your AI assistant is working with an outline.

This creates a frustrating failure mode. The model knows the API exists (it saw the link title), but it doesn't have the details. So it does what LLMs do — it fills in the gaps from training data. You get answers that sound right but reference deprecated patterns, wrong parameter names, or APIs from the wrong version.

Cloud-based tools like GitMCP do read llms.txt, but they still bounce queries through a remote service — adding latency, rate limits, and routing your codebase questions through someone else's infrastructure. The local-first approach avoids all of that.

The missing piece is simple: follow the links, fetch the real docs, and store them locally.

`context add <website>` — the new path

@neuledge/context now supports adding documentation directly from any website that publishes an llms.txt file. No git repo needed, no manual .db file construction — just point it at a URL.

Three usage patterns:

Bare domain — auto-discovers llms-full.txt, then falls back to llms.txt:

  context add https://react-aria.adobe.com

Direct file URL — skips discovery, uses the specified file:

  context add https://mui.com/material-ui/llms.txt

Custom package name — overrides the default hostname-based name:

  context add https://react-aria.adobe.com --name react-aria

Under the hood, any HTTPS URL that isn't a .db file or a git host is treated as a website source. Context tries llms-full.txt first (the complete bundle), then llms.txt (the index). If it finds the full version, you get everything in one fetch. If it finds the index, it does something most tools skip — it follows every link.

Following the links (why the index isn't enough)

When Context detects an llms.txt index (as opposed to llms-full.txt), it doesn't stop at the table of contents. It parses the Markdown links grouped by section header, then fetches each linked document concurrently.

The defaults are practical:

Concurrency: 5 parallel fetches
Timeout: 30 seconds per link
Max links: 500 (covers even massive documentation sites)
Same-origin only: links to external sites are skipped — you asked for React Aria docs, not random blog posts it happens to link to
Per-link failure tolerance: one 404 doesn't kill the whole build

The fetched documents get consolidated with the index and passed through the same package builder that handles git repos — deduplication, semantic chunking, FTS5 indexing into a portable SQLite .db file.

Before (index only): 8 sections, 84 KB — a table of contents.
After (links followed): hundreds of pages of actual documentation, deduped and indexed into a searchable local database.

Same-origin filtering matters for both signal and security. When you run context add https://react-aria.adobe.com, you want React Aria's documentation, not every external resource their docs happen to reference.

A real example, end to end

Let's walk through adding React Aria's documentation. Their site publishes an llms.txt at the root.

$ context add https://react-aria.adobe.com --name react-aria
Fetching https://react-aria.adobe.com/llms-full.txt... not found
Fetching https://react-aria.adobe.com/llms.txt... found
Detected llms.txt index with 147 linked documents
Fetching linked documents...
Fetched 139/147 documents (8 failed)
Building package "react-aria"...
Package built: .context/react-aria.db (139 documents)

Eight links returned 404s — probably outdated references in the llms.txt. That's fine. The 139 that succeeded contain the actual component APIs, hooks documentation, styling guides, and accessibility patterns.

Now wire it into your MCP client. If you're using Claude Code:

claude mcp add context -- npx @neuledge/context mcp

For Cursor or VS Code, add to your settings:

{
  "mcpServers": {
    "context": {
      "command": "npx",
      "args": ["@neuledge/context", "mcp"]
    }
  }
}

Now ask your AI assistant something specific — not "what is React Aria?" but something that requires the real docs: "How do I implement a custom calendar with React Aria's useCalendar hook, including locale support and disabled date ranges?"

Without the package, your assistant would cobble together an answer from training data — probably mixing up hook names or missing the createCalendar dependency. With the indexed docs, it searches the actual React Aria reference for useCalendar, finds the parameters, the locale configuration, and the isDateUnavailable callback. Grounded answers instead of educated guesses.

You can inspect what got indexed with context browse react-aria to see the full list of documents in the package.

Where this fits in the bigger picture

There are now three ways to get documentation into @neuledge/context:

Community registry — context install npm/react — 116+ pre-built packages ready to download.
Any llms.txt site — context add https://... — the capability covered in this article.
Any git repo with docs — context add https://github.com/... — the original path, with multi-format support for Markdown, reStructuredText, and AsciiDoc.

The llms.txt path closes an important gap. The registry covers popular libraries, but it can't cover everything. If a library publishes an llms.txt — and the list keeps growing — you can grab its docs even if nobody has added it to the registry yet.

For library authors, this creates a clear path: publish an llms.txt, and your users can instantly index your documentation into their AI tooling. No PR to any registry required. Just ship the file and let the tools follow the links.

Try it

Find a library you use that ships an llms.txt. Run context add <url>. Then ask your AI assistant the hardest question about that library — the one it usually gets wrong.

npx @neuledge/context add https://docs.anthropic.com

Product page — features, architecture, and how Context works
Documentation — quick start and editor configuration
Community registry — 116+ pre-built packages
llmstxt.org — the llms.txt specification

DEV Community

llms.txt Is Just a Table of Contents. Most AI Tools Stop There.

What llms.txt actually is (and isn't)

The "half the answer" problem

`context add <website>` — the new path

Following the links (why the index isn't enough)

A real example, end to end

Where this fits in the bigger picture

Try it

Top comments (0)

What llms.txt actually is (and isn't)

The "half the answer" problem

context add <website> — the new path

Following the links (why the index isn't enough)

A real example, end to end

Where this fits in the bigger picture

Try it

`context add <website>` — the new path