Rick Cogley

Posted on Mar 22 • Edited on Mar 26 • Originally published at cogley.jp

Markdown for Agents on SvelteKit + Cloudflare Workers

#tech #svelte #cloudflare

The AI crawl-and-summarize wave has landed. Google's Gemini, OpenAI's GPT, Anthropic's Claude, Perplexity — they're all hitting your site. If you serve them HTML, they waste tokens parsing `` soup. If you serve them markdown, they get clean context immediately.

Cloudflare published a guide to serving markdown for AI agents using Transform Rules and their AI gateway. Michael Wolson then wrote an excellent adaptation for the free plan using Transform Rules to set headers that content-negotiate at the edge.

I run SvelteKit on Cloudflare Workers. My situation is different — and simpler. Here's how I implemented it and why no Transform Rules were necessary.

The Problem with Transform Rules on Workers

Wolson's approach is clever: use a Cloudflare Transform Rule to inject a custom header based on the Accept header, then check that header in your application to decide the response format. This solves a real problem for static sites: CDN caching. Without differentiated cache keys, a cached HTML response gets served to a markdown-requesting bot, or vice versa.

Workers don't have this problem. Every request executes the Worker — there's no CDN cache layer sitting in front of dynamic routes on Pages Functions. The Worker sees the raw Accept header directly and can branch on it before any rendering happens.

Architecture

The implementation hooks into SvelteKit's server hooks — the middleware layer that runs before any route handler.

`mermaid flowchart TD A[Incoming Request] --> B{Accept: text/markdown? or ?format=md} B -->|Yes| C[handleMarkdownRequest] C --> D{Route matched?} D -->|Yes| E[Fetch API data via Service Binding] E --> F[Format as markdown] F --> G[Return Response text/markdown + headers] D -->|No| H[Fall through to SSR] B -->|No| H H --> I[Normal HTML Response] G --> J[Add security headers] I --> J J --> K[Response to client] `

The key insight: all my content already lives in the API with raw markdown fields. Posts have content (markdown). Articles have content (markdown). Pages have content (markdown). No HTML-to-markdown conversion needed — I just skip the rendering step entirely.

Detection

Two signals trigger markdown responses:

The Accept: text/markdown header (per the emerging convention)
A ?format=md query parameter (for easy browser testing)

`typescript function wantsMarkdown(request: Request): boolean { if (request.headers.get('accept')?.includes('text/markdown')) return true; const url = new URL(request.url); return url.searchParams.get('format') === 'md'; } `

The Hook

In hooks.server.ts, the markdown check runs after legacy redirects but before resolve(event) — meaning SvelteKit never renders any Svelte components for markdown requests:

`typescript if (wantsMarkdown(event.request)) { const mdResponse = await handleMarkdownRequest(event); if (mdResponse) { // Security headers still apply mdResponse.headers.set('X-Content-Type-Options', 'nosniff'); mdResponse.headers.set('X-Frame-Options', 'DENY'); // ... return mdResponse; } } // Unmatched routes fall through to normal HTML const response = await resolve(event); `

Routes without a markdown handler (like /security or /tweet-archive) fall through to normal SSR. No 406 errors, no broken pages.

Route Handlers

Each route is a regex pattern matched against the pathname, paired with a handler that fetches from the API and formats the response:

Route	Data Source
`/`	Recent posts + presence status
`/now`	Full /now page data
`/posts`	Published micro posts (last 50)
`/posts/:slug`	Single post — raw `content` field
`/articles`	Published article list
`/articles/:slug`	Single article — raw `content` field
`/pages/:slug`	Single page — raw `content` field

Individual content pages return the markdown as-is with minimal front matter:

`markdown

Article Title

Published: 2026-02-18 · Stream: tech
URL: https://cogley.jp/articles/some-slug

[raw markdown content from API]
`

List pages provide a structured index with truncated previews and URLs.

The Full Dump: /llms-full.txt

For agents that want everything at once, /llms-full.txt returns all articles, all pages, the now page, and the 50 most recent posts stitched into a single markdown document. This endpoint always returns markdown regardless of the Accept header — it's explicitly for machine consumption.

Combined with the existing /llms.txt (which describes the site structure and available sections), agents have a complete discovery path:

`mermaid flowchart LR A[Agent discovers site] --> B[GET /llms.txt] B --> C[Understands site structure] C --> D{Need everything?} D -->|Yes| E[GET /llms-full.txt] D -->|No| F[GET /articles/specific-slug Accept: text/markdown] `

Token Estimation

Every markdown response includes an x-markdown-tokens header with a rough token count estimate (content.length / 4). It's not precise — real tokenization varies by model — but it gives agents a quick way to gauge response size before processing.

Why This Approach Works for Workers

The Cloudflare blog and Wolson's approach both solve a caching problem that Workers don't have:

Concern	Static/CDN Sites	Workers/Pages Functions
CDN cache collision	Real problem — same URL, different Accept	Non-issue — Worker always executes
Transform Rules needed	Yes, to differentiate cache keys	No
`Vary: Accept`	Insufficient alone (CDN ignores it)	Works correctly (no CDN layer)
Implementation	Edge rules + origin logic	Just origin logic

Setting Vary: Accept is still good practice for HTTP correctness, but it's not load-bearing the way it would be on a CDN-cached static site.

Serving the Profile Site Too

My profile site at rick.cogley.jp uses the same pattern. The profile sections only store content_html (not raw markdown), so the handler uses stripHtml() to extract clean text. Not as rich as raw markdown, but vastly better than HTML tags for an AI agent trying to understand who I am.

Testing It

`bash

Markdown via Accept header

curl -s -H "Accept: text/markdown" https://cogley.jp/now

Markdown via query param

curl -s "https://cogley.jp/now?format=md"

Check headers

curl -sI -H "Accept: text/markdown" https://cogley.jp/posts

Full site dump

curl -s https://cogley.jp/llms-full.txt

Normal HTML (no markdown signal)

curl -s https://cogley.jp/now
`

What I'd Do Differently

If I were building this from scratch, I'd store all content as markdown and render HTML on demand — which is essentially what this site already does. The /api already has markdown fields everywhere because that's what the editor produces. The HTML rendering happens in SvelteKit route handlers using marked.

For sites where content is HTML-first (CMS output, rich text editors), you'd need an HTML-to-markdown conversion step. Libraries like turndown handle this, but the output won't be as clean as source markdown. If you're designing a new system, store the markdown.

References

- llms.txt specification

Originally published at cogley.jp

Rick Cogley is CEO of eSolia Inc., providing bilingual IT outsourcing and infrastructure services in Tokyo, Japan.

DEV Community