DEV Community

Cover image for I tried Cloudflare’s “Markdown for Agents” idea in NGINX (Rust module) — early prototype
Kang
Kang

Posted on

I tried Cloudflare’s “Markdown for Agents” idea in NGINX (Rust module) — early prototype

Cloudflare recently shipped Markdown for Agents: if a client sends Accept: text/markdown, Cloudflare can fetch your HTML and return a Markdown variant.

That idea stuck with me, so I did some vibe coding over a few evenings and built a self-hostable NGINX dynamic module that does something similar on your own infra.

This is a very early / starter prototype — mainly meant to help people try the workflow and share feedback.

Repo: https://github.com/cnkang/nginx-markdown-for-agents


What it does

Only when the client explicitly asks for Markdown:

  • Browser (Accept: text/html) → original HTML
  • Agent (Accept: text/markdown) → NGINX converts upstream HTML to Markdown and returns text/markdown

No application changes. This sits entirely at the reverse-proxy layer.

Best for: docs / blogs / news / KB pages

Not for: APIs, streaming responses, or authenticated pages (unless you really know what you’re doing with caching).


Why you might care

Agents and LLM tools often fetch full HTML and end up spending tokens on:

  • navigation / footer / cookie banners
  • layout markup
  • scripts and noisy attributes

A Markdown variant can make downstream parsing cheaper and more predictable.


Quick try (2 minutes)

1) Install (prebuilt module)

curl -sSL https://raw.githubusercontent.com/cnkang/nginx-markdown-for-agents/main/tools/install.sh | sudo bash
sudo nginx -t && sudo nginx -s reload
Enter fullscreen mode Exit fullscreen mode

Note: dynamic modules must match your exact NGINX patch version (nginx -v). If there isn’t a matching build, you may need to compile.

2) Verify content negotiation

# Markdown variant
curl -sD - -o /dev/null -H "Accept: text/markdown" http://localhost:8080/ | grep -iE 'content-type|vary'
# expect:
# content-type: text/markdown; charset=utf-8
# vary: Accept

# HTML variant
curl -sD - -o /dev/null -H "Accept: text/html" http://localhost:8080/ | grep -i 'content-type'
Enter fullscreen mode Exit fullscreen mode

3) See the body

curl -s -H "Accept: text/markdown" http://localhost:8080/ | head -40
Enter fullscreen mode Exit fullscreen mode

Minimal NGINX config

Start small — enable it on one route first.

load_module modules/ngx_http_markdown_filter_module.so;

http {
  markdown_filter off;

  server {
    listen 8080;

    location /docs/ {
      markdown_filter on;

      # Recommended: avoid upstream compression for clean conversion
      proxy_set_header Accept-Encoding "";

      proxy_pass http://backend;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

A few knobs (optional)

  • Fail open (recommended for trials): if conversion fails, return original HTML
  markdown_on_error pass;
Enter fullscreen mode Exit fullscreen mode
  • Limit work (avoid huge pages)
  markdown_max_size 10m;
  markdown_timeout 5s;
Enter fullscreen mode Exit fullscreen mode
  • Metrics endpoint (localhost only)
  location /markdown-metrics {
    markdown_metrics;
  }
Enter fullscreen mode Exit fullscreen mode
  • If you cache at NGINX/CDN: make sure variants split by Accept
  proxy_cache_key "$scheme$request_method$host$request_uri$http_accept";
Enter fullscreen mode Exit fullscreen mode

Things that are rough / WIP

  • It’s early: edge cases will exist (weird HTML, giant pages, odd encodings).
  • It’s focused on HTML → Markdown only (not PDFs, not arbitrary binaries).
  • Caching needs care (variant keys + auth-aware behavior).

If you try it… feedback welcome 🙏

If you run into a broken page, a really slow page, or a caching gotcha, I’d appreciate an issue/report with:

  • a sample URL (or anonymized HTML)
  • your nginx -v
  • whether upstream is compressed
  • any cache/CDN in front

Repo: https://github.com/cnkang/nginx-markdown-for-agents

Cloudflare inspiration:

Top comments (0)