DEV Community

Cover image for Markdown for Agents on SvelteKit + Cloudflare Workers
Rick Cogley
Rick Cogley

Posted on • Originally published at cogley.jp

Markdown for Agents on SvelteKit + Cloudflare Workers

The AI crawl-and-summarize wave has landed. Google's Gemini, OpenAI's GPT, Anthropic's Claude, Perplexity — they're all hitting your site. If you serve them HTML, they waste tokens parsing `` soup. If you serve them markdown, they get clean context immediately.

Cloudflare published a guide to serving markdown for AI agents using Transform Rules and their AI gateway. Michael Wolson then wrote an excellent adaptation for the free plan using Transform Rules to set headers that content-negotiate at the edge.

I run SvelteKit on Cloudflare Workers. My situation is different — and simpler. Here's how I implemented it and why no Transform Rules were necessary.

The Problem with Transform Rules on Workers

Wolson's approach is clever: use a Cloudflare Transform Rule to inject a custom header based on the Accept header, then check that header in your application to decide the response format. This solves a real problem for static sites: CDN caching. Without differentiated cache keys, a cached HTML response gets served to a markdown-requesting bot, or vice versa.

Workers don't have this problem. Every request executes the Worker — there's no CDN cache layer sitting in front of dynamic routes on Pages Functions. The Worker sees the raw Accept header directly and can branch on it before any rendering happens.

Architecture

The implementation hooks into SvelteKit's server hooks — the middleware layer that runs before any route handler.

`mermaid
flowchart TD
A[Incoming Request] --> B{Accept: text/markdown?
or ?format=md}
B -->|Yes| C[handleMarkdownRequest]
C --> D{Route matched?}
D -->|Yes| E[Fetch API data via
Service Binding]
E --> F[Format as markdown]
F --> G[Return Response
text/markdown + headers]
D -->|No| H[Fall through to SSR]
B -->|No| H
H --> I[Normal HTML Response]
G --> J[Add security headers]
I --> J
J --> K[Response to client]
`

The key insight: all my content already lives in the API with raw markdown fields. Posts have content (markdown). Articles have content (markdown). Pages have content (markdown). No HTML-to-markdown conversion needed — I just skip the rendering step entirely.

Detection

Two signals trigger markdown responses:

  1. The Accept: text/markdown header (per the emerging convention)
  2. A ?format=md query parameter (for easy browser testing)

`typescript
function wantsMarkdown(request: Request): boolean {
if (request.headers.get('accept')?.includes('text/markdown')) return true;
const url = new URL(request.url);
return url.searchParams.get('format') === 'md';
}
`

The Hook

In hooks.server.ts, the markdown check runs after legacy redirects but before resolve(event) — meaning SvelteKit never renders any Svelte components for markdown requests:

`typescript
if (wantsMarkdown(event.request)) {
const mdResponse = await handleMarkdownRequest(event);
if (mdResponse) {
// Security headers still apply
mdResponse.headers.set('X-Content-Type-Options', 'nosniff');
mdResponse.headers.set('X-Frame-Options', 'DENY');
// ...
return mdResponse;
}
}
// Unmatched routes fall through to normal HTML
const response = await resolve(event);
`

Routes without a markdown handler (like /security or /tweet-archive) fall through to normal SSR. No 406 errors, no broken pages.

Route Handlers

Each route is a regex pattern matched against the pathname, paired with a handler that fetches from the API and formats the response:

Route Data Source
/ Recent posts + presence status
/now Full /now page data
/posts Published micro posts (last 50)
/posts/:slug Single post — raw content field
/articles Published article list
/articles/:slug Single article — raw content field
/pages/:slug Single page — raw content field

Individual content pages return the markdown as-is with minimal front matter:

`markdown

Article Title

Published: 2026-02-18 · Stream: tech
URL: https://cogley.jp/articles/some-slug


[raw markdown content from API]
`

List pages provide a structured index with truncated previews and URLs.

The Full Dump: /llms-full.txt

For agents that want everything at once, /llms-full.txt returns all articles, all pages, the now page, and the 50 most recent posts stitched into a single markdown document. This endpoint always returns markdown regardless of the Accept header — it's explicitly for machine consumption.

Combined with the existing /llms.txt (which describes the site structure and available sections), agents have a complete discovery path:

`mermaid
flowchart LR
A[Agent discovers site] --> B[GET /llms.txt]
B --> C[Understands site structure]
C --> D{Need everything?}
D -->|Yes| E[GET /llms-full.txt]
D -->|No| F[GET /articles/specific-slug
Accept: text/markdown]
`

Token Estimation

Every markdown response includes an x-markdown-tokens header with a rough token count estimate (content.length / 4). It's not precise — real tokenization varies by model — but it gives agents a quick way to gauge response size before processing.

Why This Approach Works for Workers

The Cloudflare blog and Wolson's approach both solve a caching problem that Workers don't have:

Concern Static/CDN Sites Workers/Pages Functions
CDN cache collision Real problem — same URL, different Accept Non-issue — Worker always executes
Transform Rules needed Yes, to differentiate cache keys No
Vary: Accept Insufficient alone (CDN ignores it) Works correctly (no CDN layer)
Implementation Edge rules + origin logic Just origin logic

Setting Vary: Accept is still good practice for HTTP correctness, but it's not load-bearing the way it would be on a CDN-cached static site.

Serving the Profile Site Too

My profile site at rick.cogley.jp uses the same pattern. The profile sections only store content_html (not raw markdown), so the handler uses stripHtml() to extract clean text. Not as rich as raw markdown, but vastly better than HTML tags for an AI agent trying to understand who I am.

Testing It

`bash

Markdown via Accept header

curl -s -H "Accept: text/markdown" https://cogley.jp/now

Markdown via query param

curl -s "https://cogley.jp/now?format=md"

Check headers

curl -sI -H "Accept: text/markdown" https://cogley.jp/posts

Full site dump

curl -s https://cogley.jp/llms-full.txt

Normal HTML (no markdown signal)

curl -s https://cogley.jp/now
`

What I'd Do Differently

If I were building this from scratch, I'd store all content as markdown and render HTML on demand — which is essentially what this site already does. The /api already has markdown fields everywhere because that's what the editor produces. The HTML rendering happens in SvelteKit route handlers using marked.

For sites where content is HTML-first (CMS output, rich text editors), you'd need an HTML-to-markdown conversion step. Libraries like turndown handle this, but the output won't be as clean as source markdown. If you're designing a new system, store the markdown.

References

- llms.txt specification

Originally published at cogley.jp

Rick Cogley is CEO of eSolia Inc., providing bilingual IT outsourcing and infrastructure services in Tokyo, Japan.

Top comments (0)