Harshil Agrawal

Posted on Jun 7 • Originally published at harshil.dev

Making My Astro Site Agent-Ready: An Honest Audit of isitagentready.com

#astro #agents #cloudflare

I have built a few SaaS apps and I spend a lot of time thinking about how people find and use them. I track analytics, optimize for search engines, and write descriptive meta tags. But recently I realized a lot of these apps are getting referred by ChatGPT and other AI chat apps. I didn't optimize the sites for this entirely new category of visitor: AI agents.

Agents are increasingly browsing the web on our behalf — reading documentation, comparing products, summarizing content. But most websites, mine included, are built for human eyes. There's no standard way for a site to tell an agent what it can do, what content it offers, or how to access it in a machine-friendly format.

Cloudflare launched isitagentready.com, a scanner that evaluates how "AI-agent-friendly" your website is. I tried it on my personal site built with Astro and deployed on Cloudflare Workers. I didn't expect a perfect score, but I was curious.

I pasted in https://harshil.dev, hit scan, and waited.

Eight issues. All red.

My first reaction was mild panic. My second reaction was more useful: skepticism. Most of these recommendations assume I'm running a SaaS platform with public APIs, authentication flows, and agent-facing tools. I'm not. I'm a developer with a blog and a couple of gated APIs.

This post is what I actually implemented, what I deliberately skipped, and the small decision framework I used to tell the difference between a meaningful improvement and cargo-cult engineering.

The Audit Results

Here's what the scanner found:

#	Check	Result	Assumes you're a...
1	Link response headers (RFC 8288)	🔴 Fail	Any site
2	Markdown content negotiation	🔴 Fail	Any site
3	Content Signals in robots.txt	🔴 Fail	Any site
4	API catalog (RFC 9727)	🔴 403	SaaS with APIs
5	OAuth/OIDC discovery metadata	🔴 Missing	SaaS with auth
6	OAuth Protected Resource Metadata	🔴 Missing	SaaS with auth
7	MCP Server Card	🔴 403	SaaS with tools
8	Agent Skills discovery index	🔴 403	Agent platform
9	WebMCP support	🔴 Missing	SaaS with actions

The first three apply to any website. The last six assume infrastructure I simply don't have. But the scanner doesn't know that; it just checks for files and headers. You can select which ones to scan for — I let it scan for all just out of curiosity.

The Decision Framework

Instead of blindly chasing every red flag, I asked three questions for each recommendation:

Do I actually have this capability? If the scanner wants an MCP Server Card, do I have an MCP server? (No.)
Is it meaningful for a personal blog? Would an agent benefit from discovering a gated API catalog? (Not really.)
Is it a quick metadata win? Can I add meaningful information with a few lines of code or config? (Yes, for the first three.)

This is the framework I'd recommend to anyone running a similar audit. The goal isn't to turn every light green; it's to provide honest, useful signals to agents that visit your site. Publishing empty discovery documents is worse than not publishing them at all. You're wasting an agent's time and giving it false expectations.

Once you answer the above questions, you can copy the prompt provided by the site, update it based on your answers, and let your coding agent handle the improvements.

What my agent implemented

1. Link Headers on the Homepage (RFC 8288)

The scanner found zero Link headers on my homepage. RFC 8288 defines a standard way to advertise related resources via HTTP headers, so agents don't need to parse HTML to find sitemaps, feeds, or author pages.

This was a genuine gap. My site has a sitemap, a writings feed, and an about page — things an agent might want to find. But none of them were advertised in the HTTP response.

The fix was surprisingly simple. Astro's server-side rendering lets you set response headers directly in a page's frontmatter:

---
import Layout from "../layouts/Default.astro";

const title = "Harshil Agrawal";
const description = "...";

// RFC 8288 Link headers for agent discovery
Astro.response.headers.set(
  "Link",
  [
    '</sitemap-index.xml>; rel="describedby"',
    '</writings>; rel="related"',
    '</about>; rel="author"',
  ].join(", ")
);
---

I chose three IANA-registered relation types:

describedby for the sitemap — the best machine-readable overview of the site
related for the writings feed — the primary content stream
author for the about page — semantically correct for a personal site

You could also use service-doc or api-catalog if you have those. I don't, so I didn't add them.

2. Content Signals in robots.txt

The scanner was looking for Content Signals — a proposed IETF standard that lets sites declare AI usage preferences via robots.txt directives. The idea is straightforward: tell agents whether they can use your content for training, search, and input context.

My robots.txt was minimal:

User-agent: *
Allow: /

Sitemap: https://harshil.dev/sitemap-index.xml

I added one line:

User-agent: *
Allow: /
Content-Signal: ai-train=yes, search=yes, ai-input=yes

Sitemap: https://harshil.dev/sitemap-index.xml

I chose ai-train=yes because this is a public blog — the entire point is to share knowledge as widely as possible. search=yes is obvious, and ai-input=yes means agents can use the content as context when generating responses. If you run a private or paywalled site, you might choose differently.

This is a metadata-only change. It's a single line in a text file, but it gives agents a clear signal about your intent. That's the whole point.

Validation:

I deployed the site with these changes, and ran it through the scanner again. The scanner went from red to green on this check immediately.

3. Markdown Content Negotiation

This was the most interesting check, and the one that took the most actual work.

The scanner sends requests with Accept: text/markdown and checks whether your site returns markdown instead of HTML. This matters because markdown is far more token-efficient for LLMs than HTML. A typical HTML page might be 50KB; the same content in markdown might be 8KB. For agents with context window limits, that's a meaningful difference.

The native solution: Cloudflare offers a feature called Markdown for Agents that does this automatically at the edge. When enabled, Cloudflare intercepts requests with Accept: text/markdown, fetches the HTML from your origin, strips navigation/scripts/styles, and converts the body to clean markdown. Agents get structured content; browsers get normal HTML. Zero application code required.

The catch: It's only available on Pro, Business, and Enterprise plans. I'm on the Free plan.

The DIY approach: Since my blog posts are written in MDX, the raw source is already mostly valid markdown. Astro's content collections expose entry.body, which contains the raw MDX before JSX processing. I can serve this directly when an agent requests markdown.

Here's what I changed in my blog post route:

---
import { getEntry, render } from 'astro:content';
import Layout from '../../layouts/Default.astro';

const { id } = Astro.params;
const post = await getEntry('writings', id);

if (!post) {
  return new Response('Not found', { status: 404 });
}

// Content negotiation: return raw markdown when requested
const accept = Astro.request.headers.get('Accept') || '';
if (accept.includes('text/markdown')) {
  return new Response(post.body, {
    status: 200,
    headers: {
      'Content-Type': 'text/markdown; charset=utf-8',
      'Vary': 'Accept',
    },
  });
}

const { Content } = await render(post);
---

<Layout title={post.data.title} description={post.data.description}>
  <article>
    <Content />
  </article>
</Layout>

This worked great for blog posts, but the scanner tests multiple pages — not just blog posts. The homepage, about page, projects, talks, links, and even the 404 page all need to respond to Accept: text/markdown. I needed a site-wide solution.

A Wrinkle: Prerendering vs. Cost

Blog posts are the most frequently accessed pages on a personal blog. Originally, I used server-side rendering for everything — markdown requests ran getEntry() and returned post.body, while HTML requests ran render() to compile MDX into HTML. This worked, but every blog post request hit the Cloudflare Worker, counting against the free daily request limit.

The fix was to prerender blog posts back to static HTML (Astro's default behavior) and move markdown negotiation to a dedicated API endpoint that only runs when explicitly requested:

import type { APIRoute } from 'astro';
import { getEntry } from 'astro:content';

export const GET: APIRoute = async ({ params }) => {
  const post = await getEntry('writings', params.id);
  if (!post) {
    return new Response('Not found', { status: 404 });
  }

  return new Response(post.body, {
    status: 200,
    headers: {
      'Content-Type': 'text/markdown; charset=utf-8',
      'Vary': 'Accept',
    },
  });
};

The blog post page itself is now prerendered:

---
import { getCollection, getEntry, render } from 'astro:content';

export const prerender = true;

export async function getStaticPaths() {
  const posts = await getCollection('writings');
  return posts.map((post) => ({
    params: { id: post.id },
  }));
}

const { id } = Astro.params;
const post = await getEntry('writings', id);
if (!post) {
  return new Response('Not found', { status: 404 });
}

const { Content } = await render(post);
---

To wire it all together, a Cloudflare Transform Rule routes markdown requests to the API. This is the key insight: on the Free plan, static HTML is served from the CDN at no cost, while every Worker invocation counts toward your daily limit. By intercepting markdown requests at the edge before they reach the Worker, normal browser traffic bypasses the Worker entirely:

Browsers hit /writings/my-post — no rule match — static HTML from the CDN (free)
Agents hit /writings/my-post with Accept: text/markdown — rule matches — rewrites to /api/markdown/my-post internally, API returns raw MDX

The rule uses Cloudflare's Rules Language. Note that http.request.headers["accept"] returns an Array<String>, so you must use any() with wildcard index [*]:

Expression: starts_with(http.request.uri.path, "/writings/") and any(http.request.headers["accept"][*] contains "text/markdown")
Rewrite path: concat("/api/markdown/", substring(http.request.uri.path, 10))

This rewrite is internal — the client never sees the /api/markdown/ URL. The agent gets text/markdown, the browser gets text/html, and only markdown-requesting agents pay the Worker cost.

Extending Markdown Negotiation to Every Other Page

Astro's frontmatter syntax makes early returns tricky. A page with <script> tags can't simply return new Response() in frontmatter without confusing Astro's compiler. After some trial and error with the coding agent, I settled on a middleware pattern for the remaining pages:

import { defineMiddleware } from 'astro:middleware';
import { MarkdownResponse } from '../utils/markdown';

export const onRequest = defineMiddleware(async (_context, next) => {
  try {
    return await next();
  } catch (error) {
    if (error instanceof MarkdownResponse) {
      return error.response;
    }
    throw error;
  }
});

import type { AstroGlobal } from 'astro';

export function wantsMarkdown(astro: AstroGlobal): boolean {
  const accept = astro.request.headers.get('Accept') || '';
  return accept.includes('text/markdown');
}

export class MarkdownResponse extends Error {
  response: Response;
  constructor(body: string) {
    super('MarkdownResponse');
    this.response = new Response(body, {
      status: 200,
      headers: {
        'Content-Type': 'text/markdown; charset=utf-8',
        'Vary': 'Accept',
      },
    });
  }
}

Each page now constructs a markdown string from its data, then throws a MarkdownResponse if the client requested markdown. The middleware catches it and returns the proper HTTP response.

For example, the projects page generates a markdown list from its data:

---
// ... fetch projects data ...

let markdownBody = `# Projects\n\nA collection of things I've built, shipped, and open-sourced.\n\n`;

for (const project of allProjects) {
  const tags = project.tags?.length ? ` (${project.tags.join(', ')})` : '';
  markdownBody += `- [${project.title}](${project.link})${tags} — ${project.description}\n`;
}

if (wantsMarkdown(Astro)) throw new MarkdownResponse(markdownBody);
---

Every page follows this pattern — homepage, about, projects, talks, links, and even the 404 page. Each one constructs a meaningful markdown representation of its content, with proper links and structure, then throws MarkdownResponse when Accept: text/markdown is present.

Trade-offs: The raw MDX body for blog posts is about 95% valid markdown, but it may contain JSX component tags. For other pages, I'm constructing markdown programmatically from the page's data — which is cleaner but requires per-page logic. It's not as clean as Cloudflare's native feature — which strips navigation, footers, and scripts automatically — but it works on the Free plan and provides genuine value to agents on every route.

What I Skipped (and Why)

API Catalog (RFC 9727)

The scanner checks for /.well-known/api-catalog returning application/linkset+json. The idea is to advertise your APIs with link relations like service-desc (OpenAPI spec), service-doc (documentation), and status (health endpoint).

My site has no public APIs. Creating a JSON catalog was unnecessary ceremony. If I ever add a proper API surface with multiple public endpoints, OpenAPI specs, and documentation, I'll add a catalog. Until then, an empty catalog or a catalog with one entry is just noise.

OAuth/OIDC Discovery & Protected Resource Metadata

These checks look for /.well-known/openid-configuration and /.well-known/oauth-protected-resource. They're designed for sites with OAuth-protected APIs that agents need to authenticate against.

My only "protected" endpoint is /generate-og, which uses an internal token, not OAuth. There are no public APIs requiring agent authentication, so OAuth discovery metadata would be pure theater.

MCP Server Card, Agent Skills, WebMCP

These assume you're operating infrastructure specifically designed for AI agent consumption:

MCP Server Card: Advertises a running MCP (Model Context Protocol) server
Agent Skills: Publishes a discovery index of agent tools/skills
WebMCP: Exposes browser-side tools via the WebMCP API

I don't have an MCP server. I don't have agent skills. I don't have site actions that need to be exposed to browser-based agents. Publishing empty stub files for all of these would be actively misleading.

The Final Scan

After implementing the three changes, I deployed the site and re-ran the scanner. All scores were now at 100!

Check	Before	After
Link headers (RFC 8288)	🔴 Fail	🟢 Pass
Markdown negotiation	🔴 Fail	🟢 Pass
Content Signals	🔴 Fail	🟢 Pass

What I Learned

Don't cargo-cult scanner recommendations: When I first ran the scanner, I opted for all the checks, even though not all of them were relevant. Only run the scans for the categories that are relevant to your website.
Quick metadata wins have absurdly high ROI: Adding Link headers took 6 lines of code. Adding Content Signals took 1 line in a text file. Both immediately improved how agents discover and reason about the site. These are the changes to prioritize.
Know your platform: Cloudflare's native Markdown for Agents would have saved me from writing custom content negotiation logic, but it's not available on the Free plan. The DIY approach — middleware + per-page markdown generation — works, but it's worth understanding what you're giving up.
Be honest about what your site offers: Publishing empty discovery documents (API catalogs, OAuth metadata, MCP server cards) is worse than not publishing them. Agents that discover these files expect them to be meaningful. Don't waste their time.
Prerender blog posts, but keep an escape hatch for agents: Astro's prerendering generates static HTML that serves straight from the edge — zero Worker invocations, zero cost. For pages that agents might request in markdown, a separate API endpoint plus a Cloudflare Transform Rule gives you the best of both worlds: static HTML for humans, raw markdown for agents, without adding Worker overhead to normal traffic.

Summary

Making your site "agent-ready" doesn't mean implementing every recommendation from every scanner. It means providing honest, useful signals to agents that visit your site, while recognizing the gap between what a scanner demands and what your site actually is.

For a personal blog, that means:

Link headers — advertise your machine-readable resources (sitemaps, feeds, author pages)
Content Signals — declare your AI usage preferences in robots.txt
Markdown negotiation — serve raw markdown when agents request it, or enable Cloudflare's native feature if you're on Pro+
Skip everything else — unless you actually run the infrastructure the scanner is looking for

The goal isn't a perfect score. The goal is to be a good citizen of the agent-accessible web.

If you run your own site through isitagentready.com, I'd love to hear what you decided to implement and what you skipped. Hit me up on X/Twitter to share how you're making your sites agent-ready!

DEV Community