Last week I wrote about making my site fully agent-readable — an MCP server, an NLWeb /ask endpoint, llms.txt, a .well-known/ discovery tree. Shipping those was the easy half.
Then I started scoring the site with an agent-readiness scanner — a prober that visits like a real agent: no JavaScript execution, spec-shaped requests, no goodwill. Every round it found something that worked in my browser and did not exist as far as an agent could tell.
These are the four that stung. All real code from yonyon.ai, all fixed this round.
1. The "helpful" GET that made my endpoint read as missing
My /ask endpoint (Microsoft's NLWeb shape) answered POST { "query": "..." }. For GET, I did what felt like good API manners — return a machine-readable descriptor telling the caller how to use the endpoint:
// GET /ask — the old version
export function GET(): Response {
return Response.json({
_meta: { response_type: "nlws", version: NLWEB_VERSION },
description: 'POST { "query": "..." } for answers...',
method: "POST",
});
}
Polite, self-documenting, useless. The NLWeb reference client — and therefore the scanner — issues GET /ask?query=... and expects an answer. It got my descriptor back, which is not an answer, so the endpoint graded as not implemented. My documentation-instead-of-data response was indistinguishable from a stub.
The fix: GET with a query parameter runs the real pipeline (including SSE via ?streaming=true); a bare GET keeps the descriptor.
export async function GET(req: Request): Promise<Response> {
const url = new URL(req.url);
const hasQuery = ["query", "q", "question"].some((k) => url.searchParams.get(k)?.trim());
if (hasQuery) return handleAsk(req); // same pipeline as POST, SSE supported
return descriptor(); // bare GET still teaches probing agents how to call it
}
Lesson: agents don't read your descriptor and adapt — they call you the way their reference client calls everyone. Implement the spec's client behavior, not your idea of good manners.
2. The WebMCP tools nobody could see
I exposed in-browser tools via WebMCP (navigator.modelContext, Chrome early preview): ask_yonatan, browse_projects, book_intro_call. The natural React implementation is a client component:
"use client";
export function WebMcpTools() {
useEffect(() => {
const provide = (navigator as any).modelContext?.provideContext;
if (typeof provide !== "function") return;
provide.call(navigator.modelContext, { tools: [/* ... */] });
}, []);
return null;
}
Feature-detected, try/caught, tidy. Also invisible. Crawlers and scanners detect WebMCP support by scanning the unhydrated document — they don't run your bundle, and even executing browsers without the experimental API skip the registration entirely. A hydration-gated capability has no static trace. As far as any external observer was concerned, the site had no WebMCP support at all.
The fix is almost embarrassing: a server component that emits the registration as an inline <script>, so it's right there in the raw HTML:
// Server component — no hydration involved
const WEBMCP_REGISTRATION = String.raw`(function () {
var mc = navigator.modelContext;
if (!mc || typeof mc.provideContext !== "function") return;
mc.provideContext({ tools: [ /* same tools, plain JS */ ] });
})();`;
export function WebMcpScript() {
return <script dangerouslySetInnerHTML={{ __html: WEBMCP_REGISTRATION }} />;
}
(Next.js already requires 'unsafe-inline' in CSP, so this adds no new policy cost. If your CSP is stricter, use a nonce.)
Lesson: a runtime-registered agent capability needs a statically visible twin. If it can't be seen in curl output, it doesn't exist.
3. The status page that 404'd in production only
I added /status — a tiny health page with content negotiation: JSON for Accept: application/json, HTML for humans. It worked in dev. In production: 404.
The culprit was next-intl's middleware. My site is internationalized, and the middleware matches every page-ish path and routes it into the [locale] tree. I'd even left myself a comment claiming route handlers escape the middleware. They don't. Bare /status was being locale-routed into /[locale]/status, which doesn't exist.
// src/proxy.ts — the matcher had to exclude /status explicitly
export const config = {
matcher: "/((?!api|studio|trpc|_next|_vercel|status|.*\\..*).*)",
};
And there was a second bug hiding behind the first: I'd marked the route force-static for cheapness. Static means Next.js prerenders one variant — with an empty Accept header — and serves that frozen response to everyone. My own content negotiation could never run. It had to be force-dynamic with Vary: Accept.
Lesson: i18n middleware and agent endpoints fight over the same URL space, and the agent endpoints lose silently. Every bare path you promise agents (/ask, /mcp, /status) needs an explicit carve-out — and content negotiation is incompatible with static prerendering by definition.
4. ai-train=yes, on purpose
Most advice about AI crawlers is defensive: block GPTBot, block CCBot, opt out of training. My robots.txt does the opposite, explicitly:
# Content Signals (https://contentsignals.org). Discovery is a goal for this
# site, so AI training is allowed — being in training corpora is the point.
Content-Signal: search=yes, ai-input=yes, ai-train=yes
This is a portfolio. Its entire job is to be found when someone — increasingly, someone's agent — asks "who can build a production RAG system?" An assistant that learned about my work during training, or retrieves it at inference time, is doing my marketing. Blocking that to protect content whose value is being known would be self-defeating.
That trade-off is deliberate and it isn't right for everyone — if your content is the product, sell it, don't donate it. But make the choice consciously. The default-deny advice assumes your content's value is captured by humans reading it on your domain. For a personal site optimizing for discovery, the math runs the other way. (The same file blocks Bytespider and other bulk scrapers that offer no assistant or search surface — welcome the tier that cites you, refuse the tier that doesn't.)
The pattern under all four
"Agent-readable" turned out to be two separate properties, and I kept confusing them:
- Works when called correctly — the endpoint answers, the tool executes.
- Discoverable by something that won't call you correctly — no JS execution, spec-default request shapes, bare paths, raw HTML.
Everything above failed on the second property while passing the first. My browser hydrated the WebMCP component; the scanner read static HTML. My docs explained POST /ask; the prober sent GET. Dev served /status; production's middleware ate it.
The only thing that caught any of this was pointing an external, JS-free, spec-literal scanner at the production domain and treating its score as the truth. Your own browser is the worst possible test client for agent-readability — it's too forgiving.
The full implementation (MCP server, NLWeb endpoint, llms.txt, .well-known tree) is covered in part one. Try the live surfaces: curl -X POST https://yonyon.ai/ask -d '{"query":"what does yonatan build?"}' or GET https://yonyon.ai/status with Accept: application/json.
Top comments (0)