Cloudflare recently shipped Markdown for Agents: if a client sends Accept: text/markdown, Cloudflare can fetch your HTML and return a Markdown variant.
That idea stuck with me, so I did some vibe coding over a few evenings and built a self-hostable NGINX dynamic module that does something similar on your own infra.
This is a very early / starter prototype — mainly meant to help people try the workflow and share feedback.
Repo: https://github.com/cnkang/nginx-markdown-for-agents
What it does
Only when the client explicitly asks for Markdown:
- Browser (
Accept: text/html) → original HTML - Agent (
Accept: text/markdown) → NGINX converts upstream HTML to Markdown and returnstext/markdown
No application changes. This sits entirely at the reverse-proxy layer.
Best for: docs / blogs / news / KB pages
Not for: APIs, streaming responses, or authenticated pages (unless you really know what you’re doing with caching).
Why you might care
Agents and LLM tools often fetch full HTML and end up spending tokens on:
- navigation / footer / cookie banners
- layout markup
- scripts and noisy attributes
A Markdown variant can make downstream parsing cheaper and more predictable.
Quick try (2 minutes)
1) Install (prebuilt module)
curl -sSL https://raw.githubusercontent.com/cnkang/nginx-markdown-for-agents/main/tools/install.sh | sudo bash
sudo nginx -t && sudo nginx -s reload
Note: dynamic modules must match your exact NGINX patch version (
nginx -v). If there isn’t a matching build, you may need to compile.
2) Verify content negotiation
# Markdown variant
curl -sD - -o /dev/null -H "Accept: text/markdown" http://localhost:8080/ | grep -iE 'content-type|vary'
# expect:
# content-type: text/markdown; charset=utf-8
# vary: Accept
# HTML variant
curl -sD - -o /dev/null -H "Accept: text/html" http://localhost:8080/ | grep -i 'content-type'
3) See the body
curl -s -H "Accept: text/markdown" http://localhost:8080/ | head -40
Minimal NGINX config
Start small — enable it on one route first.
load_module modules/ngx_http_markdown_filter_module.so;
http {
markdown_filter off;
server {
listen 8080;
location /docs/ {
markdown_filter on;
# Recommended: avoid upstream compression for clean conversion
proxy_set_header Accept-Encoding "";
proxy_pass http://backend;
}
}
}
A few knobs (optional)
- Fail open (recommended for trials): if conversion fails, return original HTML
markdown_on_error pass;
- Limit work (avoid huge pages)
markdown_max_size 10m;
markdown_timeout 5s;
- Metrics endpoint (localhost only)
location /markdown-metrics {
markdown_metrics;
}
-
If you cache at NGINX/CDN: make sure variants split by
Accept
proxy_cache_key "$scheme$request_method$host$request_uri$http_accept";
Things that are rough / WIP
- It’s early: edge cases will exist (weird HTML, giant pages, odd encodings).
- It’s focused on HTML → Markdown only (not PDFs, not arbitrary binaries).
- Caching needs care (variant keys + auth-aware behavior).
If you try it… feedback welcome 🙏
If you run into a broken page, a really slow page, or a caching gotcha, I’d appreciate an issue/report with:
- a sample URL (or anonymized HTML)
- your
nginx -v - whether upstream is compressed
- any cache/CDN in front
Repo: https://github.com/cnkang/nginx-markdown-for-agents
Cloudflare inspiration:
Top comments (0)