Generating a multilingual llms.txt in Astro

#webdev #ai #tutorial #llm

llms.txt is a Markdown index for LLMs, placed at the site root. Where sitemap.xml is a machine-readable list of URLs, llms.txt describes — with one-line notes — what the site is and where to start reading.

In Astro you can generate it from Content Collections as an API route, so the post list never has to be hand-maintained. This post is the minimum setup for a bilingual (EN/JA) site: emit /llms.txt, /ja/llms.txt and /llms-full.txt from one renderer.

Up front: how much llms.txt actually helps AI-search traffic isn't a settled or measured thing yet. This is only about the implementation.

The minimal route

Astro's file-based API routes return text when you drop a .txt.ts file under src/pages/. Return a text/plain Response from a GET handler.

// src/pages/llms.txt.ts
import type { APIContext } from "astro";
import { renderLlmsTxt } from "../lib/llmsTxt";

export async function GET(_context: APIContext) {
  const body = await renderLlmsTxt({ docLang: "en" });
  return new Response(body, {
    status: 200,
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "public, max-age=3600",
    },
  });
}

The .txt.ts extension builds to the URL /llms.txt. Keep the assembly logic in src/lib/llmsTxt.ts and leave the route thin, so a per-language endpoint can reuse it.

Build it from Content Collections

Get the post list with getCollection and lay it out on the fly. A hand-kept list goes stale — add a post, forget the index, and llms.txt drifts from the content.

// src/lib/llmsTxt.ts (excerpt)
export async function renderLlmsTxt(opts: LlmsTxtOptions): Promise<string> {
  const blog = await getCollection("blog", ({ data }) => !data.draft);
  blog.sort((a, b) => b.data.pubDate.getTime() - a.data.pubDate.getTime());
  // ...assemble sections and return join("\n")
}

Don't drop the ({ data }) => !data.draft filter. Skip it and a half-written draft lands in llms.txt, advertising a URL you haven't published. Reuse the same exclusion sitemap and RSS use.

Split by language on two axes

This is the part that matters for a multilingual site. Give the renderer two axes:

filterLang: which language's posts to include
docLang: which language the headings and notes are written in

Separating them lets one renderer emit three endpoints.

// src/pages/llms.txt.ts        → English headings, posts from all languages
renderLlmsTxt({ docLang: "en" });

// src/pages/ja/llms.txt.ts     → Japanese headings, Japanese posts only
renderLlmsTxt({ filterLang: "ja", docLang: "ja" });

The English /llms.txt leaves filterLang unset on purpose — it's the whole-site entry point, so it surfaces posts in either language. The Japanese /ja/llms.txt closes to the Japanese surface with filterLang: "ja".

Narrow only the featured list to docLang

One design call. The English version can surface both languages, but the Featured section alone narrows to docLang.

const featuredSource = filteredBlog.filter(
  (p) => entryLangLocal(p.id) === opts.docLang,
);

Listing both halves of a translation pair in the featured slots spends two slots on one piece of content and halves the unique signal in a bounded list. So a limited list (featured) narrows by language; a full dump with loose size limits (llms-full.txt) carries both.

Gotchas

Put the draft exclusion in the shared renderer: so all three endpoints pass the same filter. Let one route through raw and drafts leak
Don't conflate filterLang and docLang: "which posts" and "what language to write in" are different axes. Leaving filterLang unset on the English version is deliberate — don't read it later as a bug and add a filter

Wrapping up

Generate llms.txt from Content Collections as an Astro API route and the manual upkeep goes away. Split on filterLang and docLang and one renderer emits all three files.

The language cross-references in llms-full.txt, how it sits next to robots.txt, and how I word the usage/citation section are on the Aulvem site → Generating llms.txt and llms-full.txt in Astro for a Bilingual Site