Isaac FEI

Posted on Mar 20 • Originally published at isaacfei.com

Making Your Astro Blog AI-Readable with llms.txt

#ai #llm #tutorial #webdev

I've been noticing something shift over the past year or so. The readers of my blog posts, documentation pages, and personal sites are no longer just humans scrolling through a browser. More and more, the "readers" are AI models 🤖 — Claude, ChatGPT, Gemini — ingesting content on behalf of a developer who asked a question. When someone types "how do I deploy TanStack Start to Cloudflare" into an AI assistant, the answer might come from my blog. But the AI never sees my carefully styled page. It sees whatever raw text it can scrape, often garbled with navigation markup, SVG icons, and JavaScript artifacts.

There's a strange irony in this. I write a blog post, an AI reads it, digests it, and serves it back to another person — who might then ask a different AI to help them write their blog post based on it. The whole loop is increasingly AI-to-AI, with humans as the initiators but not necessarily the readers anymore. Documentation sites are being consumed not by developers reading them top-to-bottom, but by coding assistants pulling in context on-the-fly. The audience has changed, and it changed quietly.

I'd be a hypocrite if I pretended this bothers me in some principled way. Most of the posts on this blog were drafted or polished with AI assistance. 😅 That's just the reality of writing these days — AI helps me organize thoughts, smooth out rough prose, and catch things I'd miss on a second pass. The line between "I wrote this" and "AI wrote this" has become genuinely blurry, and I think most honest writers would admit the same.

What does strike me, though, is the downstream effect. When AI creates more content, and AI consumes more content, the proportion of genuinely human-originated thought in the loop gets smaller. I'm not saying that's catastrophic — the ideas are still human, the intent is still human — but it's worth pausing to notice. We're building an infrastructure where machines write for machines to read, and the humans are somewhere in the margins, prompting and approving.

I don't have a grand conclusion about this. It's just the world we're in now. And if that's the case, the pragmatic thing to do is make sure the content I do put out there — however it was drafted — is as accessible as possible to both audiences.

That's exactly what llms.txt is for. It's a simple, emerging standard — a Markdown file at /llms.txt that acts as a site map for language models, pointing them to clean, structured content they can actually parse. Think of it as robots.txt, but instead of telling crawlers what not to index, it tells AI models what to read.

So here's what I did: I made my blog speak both languages. A styled, human-friendly site for browsers, and a clean Markdown feed for AI. Below is how I integrated it into my Astro blog.

The plan

The implementation has three pieces:

/llms.txt — an index file listing every post with a link to its Markdown version.
/llms-full.txt — all posts concatenated into a single file, for when an AI model wants the full context.
/posts/{slug}.md — a per-post clean Markdown endpoint, so individual articles can be fetched without the HTML wrapper.

All three are prerendered as static files at build time, so there's zero runtime cost.

Step 1: MDX to clean Markdown

My posts are written in MDX — Markdown with JSX components, imports, and expressions sprinkled in. An AI model doesn't need any of that. It needs plain Markdown with the code blocks intact.

I used the unified ecosystem — specifically remark-parse, remark-mdx, remark-stringify, and unist-util-visit — to build a small processing pipeline:

// src/lib/llms-txt.ts
import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkMdx from "remark-mdx";
import remarkStringify from "remark-stringify";
import type { Root } from "mdast";
import { visit, SKIP } from "unist-util-visit";

function remarkStripMdx() {
  return (tree: Root) => {
    visit(tree, (node, index, parent) => {
      if (index === undefined || !parent) return;
      if (
        node.type === "mdxjsEsm" ||
        node.type === "mdxJsxFlowElement" ||
        node.type === "mdxJsxTextElement" ||
        node.type === "mdxFlowExpression" ||
        node.type === "mdxTextExpression"
      ) {
        parent.children.splice(index, 1);
        return [SKIP, index];
      }
    });
  };
}

const processor = unified()
  .use(remarkParse)
  .use(remarkMdx)
  .use(remarkStripMdx)
  .use(remarkStringify, { bullet: "-", fences: true, listItemIndent: "one" });

export async function mdxToMarkdown(mdx: string): Promise<string> {
  const stripped = mdx.replace(/^---\n[\s\S]*?\n---\n?/, "");
  return String(await processor.process(stripped)).trim();
}

// ... shared helpers (getSortedPosts, formatPost, etc.) shown below ...

The pipeline: remark-parse parses the Markdown, remark-mdx understands the MDX syntax, a custom remarkStripMdx plugin walks the AST with unist-util-visit and removes every MDX-specific node (imports, exports, JSX components, JS expressions), and remark-stringify serializes what's left back to clean Markdown.

This is significantly more robust than regex. The AST approach correctly handles edge cases like JSX components nested inside lists, or code blocks that happen to contain import-like syntax.

Step 2: The three endpoints

With the processor in place, the endpoints are minimal. Still in the same src/lib/llms-txt.ts file, I added a few shared helpers using Astro's Content Collections API to keep things DRY:

// src/lib/llms-txt.ts
// ... imports, remarkStripMdx plugin, processor, mdxToMarkdown shown above ...

import { getCollection } from "astro:content";

export interface PostMeta {
  title: string;
  description: string;
  date: Date;
  tags: string[];
  body: string;
  slug: string;
}

export async function formatPost(post: PostMeta, baseUrl: string): Promise<string> {
  const md = await mdxToMarkdown(post.body);
  const date = post.date.toISOString().split("T")[0];
  const tags = post.tags.length ? `- **Tags:** ${post.tags.join(", ")}\n` : "";

  return `# ${post.title}\n\n- **URL:** ${baseUrl}/posts/${post.slug}\n- **Date:** ${date}\n${tags}- **Description:** ${post.description}\n\n---\n\n${md}`;
}

export async function getSortedPosts() {
  const posts = await getCollection("blogPost", ({ data }) => !data.draft);
  return posts.sort((a, b) => b.data.date.getTime() - a.data.date.getTime());
}

export function resolveBaseUrl(site: URL | undefined): string {
  return site?.toString().replace(/\/$/, "") ?? "https://example.com";
}

Then each Astro endpoint becomes a thin wrapper:

`/llms.txt` — the index

// src/pages/llms.txt.ts
import type { APIRoute } from "astro";
import { getSortedPosts, resolveBaseUrl } from "@/lib/llms-txt";

export const prerender = true;

export const GET: APIRoute = async ({ site }) => {
  const baseUrl = resolveBaseUrl(site);
  const posts = await getSortedPosts();

  const lines = [
    "# My Blog",
    "",
    "> A short description of your site.",
    "",
    "## Blog Posts",
    "",
    ...posts.map(
      (p) => `- [${p.data.title}](${baseUrl}/posts/${p.id}.md): ${p.data.description}`
    ),
    "",
    "## Optional",
    "",
    `- [Full blog content](${baseUrl}/llms-full.txt): All posts in one file`,
  ];

  return new Response(lines.join("\n"), {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
};

`/llms-full.txt` — everything in one file

// src/pages/llms-full.txt.ts
import type { APIRoute } from "astro";
import { getSortedPosts, resolveBaseUrl, toPostMeta, formatPost } from "@/lib/llms-txt";

export const prerender = true;

export const GET: APIRoute = async ({ site }) => {
  const baseUrl = resolveBaseUrl(site);
  const posts = await getSortedPosts();
  const sections = await Promise.all(
    posts.map((p) => formatPost(toPostMeta(p), baseUrl))
  );

  return new Response(
    ["# Full Blog Content", "", sections.join("\n\n---\n\n")].join("\n\n"),
    { headers: { "Content-Type": "text/plain; charset=utf-8" } }
  );
};

`/posts/{slug}.md` — per-post Markdown

// src/pages/posts/[...slug].md.ts
import type { APIRoute } from "astro";
import { getSortedPosts, resolveBaseUrl, toPostMeta, formatPost } from "@/lib/llms-txt";

export const prerender = true;

export async function getStaticPaths() {
  const posts = await getSortedPosts();
  return posts.map((post) => ({
    params: { slug: post.id },
    props: { post },
  }));
}

export const GET: APIRoute = async ({ site, props }) => {
  const { post } = props as { post: Awaited<ReturnType<typeof getSortedPosts>>[number] };

  return new Response(await formatPost(toPostMeta(post), resolveBaseUrl(site)), {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
};

That's it for the backend. Three files, each under 20 lines of actual logic.

Step 3: A "Copy for AI" button

To make this discoverable to human visitors too, I added a small React component that fetches the Markdown endpoint and copies it to the clipboard. One component handles two modes: a simple inline button for blog list tiles, and a dropdown menu in the global header (copy this page, copy full blog, view llms.txt). I keep the dropdown only in the header — not duplicated on each post page — to avoid clutter.

// src/components/copy-for-ai.tsx
"use client";

import { useState } from "react";

export function CopyForAI({ postSlug }: { postSlug?: string }) {
  const [status, setStatus] = useState<"idle" | "copying" | "copied">("idle");

  const url = postSlug ? `/posts/${postSlug}.md` : "/llms-full.txt";

  const handleCopy = async () => {
    setStatus("copying");
    try {
      const res = await fetch(url);
      await navigator.clipboard.writeText(await res.text());
      setStatus("copied");
      setTimeout(() => setStatus("idle"), 2000);
    } catch {
      setStatus("idle");
    }
  };

  return (
    <button onClick={handleCopy}>
      {status === "copied" ? "Copied!" : "Copy for AI"}
    </button>
  );
}

Pass a postSlug and it copies that single post. Omit it and it copies the entire blog. The dropdown lives in the global header so visitors can quickly grab AI-friendly content from anywhere on the site.

A discoverability gotcha: links must be in the HTML

Here's a subtle but important concern. If the "View llms.txt" link lives only inside a dropdown that's rendered conditionally (e.g. {open && (...)} in React), it won't exist in the DOM until the user clicks. AI crawlers that parse the initial HTML won't find it — they don't execute JavaScript or simulate clicks. So if someone tells an AI "read this person's blog," the AI might crawl the page, extract links from the HTML, and never see the llms.txt URL.

Primary discovery is convention-based: the llms.txt spec puts the file at a well-known path (https://yoursite.com/llms.txt), like robots.txt. AI systems that know the spec can fetch it directly without needing a link. But for crawlers that discover content by following links, you need the URL to appear in the HTML from the start.

What to do:

<link> in <head> — Add a machine-readable hint so parsers that read link tags can find llms.txt:

<link rel="alternate" type="text/plain" href="https://yoursite.com/llms.txt" title="llms.txt - AI-readable site index" />

Visible static link — Put a plain anchor somewhere that's always rendered (e.g. in your RSS/feeds section or footer). No JavaScript, no dropdown. The link must be in the initial server-rendered HTML.
robots.txt — Create one at the site root if you don't have it. You can add a comment pointing to llms.txt for crawlers that look for it.
Per-post link — On each blog post page, add a simple dedicated link to that post's .md version. Don't duplicate the full CopyForAI dropdown (it's already in the header). Use a minimal Astro component that renders a static <a href="/posts/{slug}.md"> — always in the HTML, so crawlers can discover individual post markdown from any article page.

<!-- src/components/PostMarkdownLink.astro -->
---
interface Props { slug: string; className?: string; }
const { slug, className = "" } = Astro.props;
---
<a href={`/posts/${slug}.md`} class={className} title="View markdown version for AI">
  Markdown for AI
</a>

This keeps the header dropdown for human convenience (copy to clipboard, view full blog) while ensuring every post page has a crawlable link to its AI-readable version.

Takeaways

The whole implementation is about 200 lines across 6 files. The key dependencies are unified, remark-parse, remark-mdx, and remark-stringify — battle-tested libraries that handle the MDX-to-Markdown conversion properly through AST manipulation rather than fragile regex.

The technical part was honestly the easy part. What lingers is the feeling behind it. We spent years perfecting responsive layouts, dark mode, typography, and scroll animations — all for human eyes. Now we're adding a second output format for an audience that doesn't care about any of that. An audience that wants raw text, clear structure, and nothing else.

And maybe that's fine. Maybe the future of a personal blog isn't just a place for humans to read — it's also a node in a larger knowledge graph that AI systems draw from. If my writing helps someone solve a problem, does it matter whether they read it directly or whether an AI summarized it for them? I'm honestly not sure. 🤷

What I am sure of is that this shift is happening whether we like it or not. The pragmatic move is to meet it halfway — keep writing for humans, but make it easy for machines too. That's all llms.txt really is: a small act of acknowledging who your readers actually are right now.

DEV Community

Making Your Astro Blog AI-Readable with llms.txt

The plan

Step 1: MDX to clean Markdown

Step 2: The three endpoints

`/llms.txt` — the index

`/llms-full.txt` — everything in one file

`/posts/{slug}.md` — per-post Markdown

Step 3: A "Copy for AI" button

A discoverability gotcha: links must be in the HTML

Takeaways

Top comments (0)

The plan

Step 1: MDX to clean Markdown

Step 2: The three endpoints

/llms.txt — the index

/llms-full.txt — everything in one file

/posts/{slug}.md — per-post Markdown

Step 3: A "Copy for AI" button

A discoverability gotcha: links must be in the HTML

Takeaways

`/llms.txt` — the index

`/llms-full.txt` — everything in one file

`/posts/{slug}.md` — per-post Markdown