The AI crawl-and-summarize wave has landed. Google's Gemini, OpenAI's GPT, Anthropic's Claude, Perplexity — they're all hitting your site. If you serve them HTML, they waste tokens parsing `` soup. If you serve them markdown, they get clean context immediately.
Cloudflare published a guide to serving markdown for AI agents using Transform Rules and their AI gateway. Michael Wolson then wrote an excellent adaptation for the free plan using Transform Rules to set headers that content-negotiate at the edge.
I run SvelteKit on Cloudflare Workers. My situation is different — and simpler. Here's how I implemented it and why no Transform Rules were necessary.
The Problem with Transform Rules on Workers
Wolson's approach is clever: use a Cloudflare Transform Rule to inject a custom header based on the Accept header, then check that header in your application to decide the response format. This solves a real problem for static sites: CDN caching. Without differentiated cache keys, a cached HTML response gets served to a markdown-requesting bot, or vice versa.
Workers don't have this problem. Every request executes the Worker — there's no CDN cache layer sitting in front of dynamic routes on Pages Functions. The Worker sees the raw Accept header directly and can branch on it before any rendering happens.
Architecture
The implementation hooks into SvelteKit's server hooks — the middleware layer that runs before any route handler.
`mermaid
flowchart TD
A[Incoming Request] --> B{Accept: text/markdown?
or ?format=md}
B -->|Yes| C[handleMarkdownRequest]
C --> D{Route matched?}
D -->|Yes| E[Fetch API data via
Service Binding]
E --> F[Format as markdown]
F --> G[Return Response
text/markdown + headers]
D -->|No| H[Fall through to SSR]
B -->|No| H
H --> I[Normal HTML Response]
G --> J[Add security headers]
I --> J
J --> K[Response to client]
`
The key insight: all my content already lives in the API with raw markdown fields. Posts have content (markdown). Articles have content (markdown). Pages have content (markdown). No HTML-to-markdown conversion needed — I just skip the rendering step entirely.
Detection
Two signals trigger markdown responses:
- The
Accept: text/markdownheader (per the emerging convention) - A
?format=mdquery parameter (for easy browser testing)
`typescript
function wantsMarkdown(request: Request): boolean {
if (request.headers.get('accept')?.includes('text/markdown')) return true;
const url = new URL(request.url);
return url.searchParams.get('format') === 'md';
}
`
The Hook
In hooks.server.ts, the markdown check runs after legacy redirects but before resolve(event) — meaning SvelteKit never renders any Svelte components for markdown requests:
`typescript
if (wantsMarkdown(event.request)) {
const mdResponse = await handleMarkdownRequest(event);
if (mdResponse) {
// Security headers still apply
mdResponse.headers.set('X-Content-Type-Options', 'nosniff');
mdResponse.headers.set('X-Frame-Options', 'DENY');
// ...
return mdResponse;
}
}
// Unmatched routes fall through to normal HTML
const response = await resolve(event);
`
Routes without a markdown handler (like /security or /tweet-archive) fall through to normal SSR. No 406 errors, no broken pages.
Route Handlers
Each route is a regex pattern matched against the pathname, paired with a handler that fetches from the API and formats the response:
| Route | Data Source |
|---|---|
/ |
Recent posts + presence status |
/now |
Full /now page data |
/posts |
Published micro posts (last 50) |
/posts/:slug |
Single post — raw content field |
/articles |
Published article list |
/articles/:slug |
Single article — raw content field |
/pages/:slug |
Single page — raw content field |
Individual content pages return the markdown as-is with minimal front matter:
`markdown
Article Title
Published: 2026-02-18 · Stream: tech
URL: https://cogley.jp/articles/some-slug
[raw markdown content from API]
`
List pages provide a structured index with truncated previews and URLs.
The Full Dump: /llms-full.txt
For agents that want everything at once, /llms-full.txt returns all articles, all pages, the now page, and the 50 most recent posts stitched into a single markdown document. This endpoint always returns markdown regardless of the Accept header — it's explicitly for machine consumption.
Combined with the existing /llms.txt (which describes the site structure and available sections), agents have a complete discovery path:
`mermaid
flowchart LR
A[Agent discovers site] --> B[GET /llms.txt]
B --> C[Understands site structure]
C --> D{Need everything?}
D -->|Yes| E[GET /llms-full.txt]
D -->|No| F[GET /articles/specific-slug
Accept: text/markdown]
`
Token Estimation
Every markdown response includes an x-markdown-tokens header with a rough token count estimate (content.length / 4). It's not precise — real tokenization varies by model — but it gives agents a quick way to gauge response size before processing.
Why This Approach Works for Workers
The Cloudflare blog and Wolson's approach both solve a caching problem that Workers don't have:
| Concern | Static/CDN Sites | Workers/Pages Functions |
|---|---|---|
| CDN cache collision | Real problem — same URL, different Accept | Non-issue — Worker always executes |
| Transform Rules needed | Yes, to differentiate cache keys | No |
Vary: Accept |
Insufficient alone (CDN ignores it) | Works correctly (no CDN layer) |
| Implementation | Edge rules + origin logic | Just origin logic |
Setting Vary: Accept is still good practice for HTTP correctness, but it's not load-bearing the way it would be on a CDN-cached static site.
Serving the Profile Site Too
My profile site at rick.cogley.jp uses the same pattern. The profile sections only store content_html (not raw markdown), so the handler uses stripHtml() to extract clean text. Not as rich as raw markdown, but vastly better than HTML tags for an AI agent trying to understand who I am.
Testing It
`bash
Markdown via Accept header
curl -s -H "Accept: text/markdown" https://cogley.jp/now
Markdown via query param
curl -s "https://cogley.jp/now?format=md"
Check headers
curl -sI -H "Accept: text/markdown" https://cogley.jp/posts
Full site dump
curl -s https://cogley.jp/llms-full.txt
Normal HTML (no markdown signal)
curl -s https://cogley.jp/now
`
What I'd Do Differently
If I were building this from scratch, I'd store all content as markdown and render HTML on demand — which is essentially what this site already does. The /api already has markdown fields everywhere because that's what the editor produces. The HTML rendering happens in SvelteKit route handlers using marked.
For sites where content is HTML-first (CMS output, rich text editors), you'd need an HTML-to-markdown conversion step. Libraries like turndown handle this, but the output won't be as clean as source markdown. If you're designing a new system, store the markdown.
References
- llms.txt specification
Originally published at cogley.jp
Rick Cogley is CEO of eSolia Inc., providing bilingual IT outsourcing and infrastructure services in Tokyo, Japan.
Top comments (0)