DEV Community

SEN LLC
SEN LLC

Posted on

A 10 MB Markdown-to-HTML Service in Rust with pulldown-cmark

A 10 MB Markdown-to-HTML Service in Rust with pulldown-cmark

A tiny axum service that renders Markdown to HTML through pulldown-cmark. Same request and response contract as my earlier PHP markdown-api (entry #133), but the whole pipeline runs through a pull-parser, ships in a single 10 MB Alpine image, and answers the usual request in a handful of microseconds instead of hundreds.

πŸ”— GitHub: https://github.com/sen-ltd/markdown-render

Screenshot

Every non-trivial backend I have worked on in the last decade ended up rendering Markdown somewhere. Issue bodies. Changelogs. Release notes. User-submitted documentation pages. Inline help. Email templates. The surface area keeps growing and the implementation keeps being "pull in a markdown library, call it inline, hope it is fast enough". Each service does the same job a slightly different way, each service carries its own dependency, and nobody can answer how long a render actually takes because it is buried in a larger request handler.

Entry #133 in this 100-project sweep was the PHP version of a tiny HTTP wrapper around that surface area β€” markdown-api, built on Slim 4 + league/commonmark. It is a nice piece of PHP. league/commonmark is serious CommonMark 0.31, thoroughly maintained, and exposing it as an HTTP service takes about 200 lines of glue. But "PHP over FPM that parses a few KB of Markdown per request" has a floor latency that is perfectly fine for most documentation sites and completely wrong for a service that gets hit in the critical path of every page load on a big app.

So here is the other end of the same contract. Same JSON-in / JSON-out shape. Same safe-by-default posture. Same error codes. But the parser is pulldown-cmark β€” the fastest CommonMark parser I know of on any runtime β€” and the binary is 10 MB of statically-linked musl Alpine. That is not a boast; we will look at why it is small and fast rather than just claiming that it is.

The contract

POST /render
Content-Type: application/json

{ "markdown": "# Hi\n\n**hey**", "flavor": "commonmark", "safe": true }
Enter fullscreen mode Exit fullscreen mode
200 OK
Content-Type: application/json

{
  "html": "<h1>Hi</h1>\n<p><strong>hey</strong></p>\n",
  "word_count": 2,
  "headings": [{ "level": 1, "text": "Hi", "anchor": "hi" }]
}
Enter fullscreen mode Exit fullscreen mode

A second endpoint, POST /render/html, takes the same JSON and returns the raw HTML body with Content-Type: text/html. A third, GET /render?text=..., is a URL shortcut for the cases where you are composing from a browser console or a shell one-liner. A fourth, GET /health, reports the service version and the pulldown-cmark crate version so clients can tell at a glance what is running.

The important field in the request is safe, which defaults to true. We will get to it in a minute; first, the parser.

Why pulldown-cmark is the fast option

Markdown parsers on most languages fall into two families:

  1. Tree-builders. Parse the source into an AST, walk the AST, render HTML from the AST. This is what Marked, markdown-it, python-markdown, league/commonmark, and remark all do. It is the friendliest model to work with because an AST is easy to inspect and transform, but it pays for convenience by allocating a node for every inline run, every emphasis span, every list item.
  2. Pull parsers. Emit a stream of events (Start(Heading), Text, End(Heading), Start(Emphasis), …) and let the caller consume them on the fly. There is no intermediate tree. The renderer is a state machine that writes HTML bytes directly to an output buffer as events arrive. pulldown-cmark is the canonical example; it is the parser used by cargo doc, mdBook, and pretty much every Rust tool that touches Markdown.

For the Markdown-to-HTML case the pull model wins decisively. You are not going to inspect the tree β€” you just want HTML out the other side. The parser pre-allocates no nodes, the renderer allocates nothing past its output String, and the whole pipeline fits in L1 cache for typical page-sized inputs.

Here is what the single-pass render looks like in our service:

use pulldown_cmark::{Event, HeadingLevel, Options, Parser, Tag, TagEnd};

pub fn render(markdown: &str, opts: RenderOptions) -> Rendered {
    let source = if opts.safe {
        escape_html_angle_brackets(markdown)
    } else {
        markdown.to_string()
    };

    let mut cmark_opts = Options::empty();
    if matches!(opts.flavor, Flavor::Gfm) {
        cmark_opts.insert(Options::ENABLE_TABLES);
        cmark_opts.insert(Options::ENABLE_TASKLISTS);
        cmark_opts.insert(Options::ENABLE_STRIKETHROUGH);
        cmark_opts.insert(Options::ENABLE_FOOTNOTES);
    }

    let parser = Parser::new_ext(&source, cmark_opts);

    let mut events: Vec<Event> = Vec::new();
    let mut headings: Vec<Heading> = Vec::new();
    let mut word_count: usize = 0;
    let mut in_heading: Option<u8> = None;
    let mut heading_buf = String::new();

    for event in parser {
        match &event {
            Event::Start(Tag::Heading { level, .. }) => {
                in_heading = Some(level_to_u8(*level));
                heading_buf.clear();
            }
            Event::End(TagEnd::Heading(_)) => {
                if let Some(level) = in_heading.take() {
                    let text = heading_buf.trim().to_string();
                    headings.push(Heading {
                        level,
                        anchor: slugify(&text),
                        text,
                    });
                }
            }
            Event::Text(t) | Event::Code(t) => {
                word_count += t.split_whitespace().count();
                if in_heading.is_some() {
                    heading_buf.push_str(t);
                }
            }
            _ => {}
        }
        events.push(event);
    }

    let mut html = String::new();
    pulldown_cmark::html::push_html(&mut html, events.into_iter());

    Rendered { html, word_count, headings }
}
Enter fullscreen mode Exit fullscreen mode

Notice what is not there: there is no AST, no visitor, no intermediate structure that exists specifically to be walked. The parser emits events; we tee them into a Vec so we can hand them to the HTML writer at the end; along the way we harvest everything a consumer might care about. Headings, word count, HTML β€” three outputs from one pass over the source.

(You can do this even more cheaply by streaming events straight into push_html with an adapter that intercepts them inline, but the two-pass version with a vector of events is so much easier to read that I picked it as the house style. The allocation is a single Vec with a known upper bound on size; the optimization is premature.)

The heading slug algorithm

headings[] is one of the features that makes this service useful as a renderer for docs and blog CMSes. To produce anchor links you need a slug for each heading that is stable across runs, readable to humans, and matches what GitHub's web UI does so that copy-pasted anchors from a README.md keep working.

GitHub's algorithm is not documented in one place, but it is de facto:

  1. Lowercase ASCII letters. Leave non-ASCII (CJK, accented Latin, Greek) alone.
  2. Drop everything that is not alphanumeric, a hyphen, or a space.
  3. Replace runs of whitespace with a single hyphen.
  4. Trim leading and trailing hyphens.

I could pull a crate for this, but a readable 40-line implementation in the service is worth more than saving 40 lines and owning a dependency I cannot explain. Here is the real version from src/slugify.rs:

pub fn slugify(heading: &str) -> String {
    let mut out = String::with_capacity(heading.len());
    let mut prev_dash = true;

    for ch in heading.chars() {
        if ch.is_alphanumeric() {
            if ch.is_ascii_uppercase() {
                out.push(ch.to_ascii_lowercase());
            } else {
                out.push(ch);
            }
            prev_dash = false;
        } else if ch == '_' {
            out.push('_');
            prev_dash = false;
        } else if ch.is_whitespace() || ch == '-' {
            if !prev_dash {
                out.push('-');
                prev_dash = true;
            }
        }
    }
    while out.ends_with('-') { out.pop(); }
    out
}
Enter fullscreen mode Exit fullscreen mode

Two non-obvious decisions:

  • ASCII lowercase only. char::to_lowercase does the right thing for Turkish Δ° by producing i + combining dot above, which is technically correct and completely wrong as an anchor. Anchors need to be stable, ASCII-friendly where possible, and not secretly two characters where the user typed one. So the fold is scoped to ASCII letters β€” Δ° stays Δ°.
  • Underscores survive. GitHub keeps underscores as-is (foo_bar becomes #foo_bar). A dozen Markdown parsers I have read over the years do not, and produce foo-bar instead. Testing against the real GitHub behaviour caught this; a unit test now pins it.

The table of cases in the module docs doubles as executable documentation:

Heading text Slug
Hello World hello-world
Hello, World! hello-world
Section 1.2: Foo section-12-foo
C++ vs. Rust c-vs-rust
こんにけは δΈ–η•Œ こんにけは-δΈ–η•Œ

Safe mode without ammonia

The safe field defaults to true. In safe mode, raw HTML in the Markdown source is escaped before parsing so that a user submission like <script>alert(1)</script> renders as literal text instead of executing in whoever views the page.

The "correct" way to sanitize rendered HTML in the Rust ecosystem is ammonia, which runs the output through an allowlist parser that knows which tags and attributes are safe. ammonia is excellent. It also pulls in html5ever, which is a full HTML5 parser β€” several megabytes of code whose job is to parse HTML again after we just rendered it from Markdown. For a service whose policy is "no raw HTML at all, ever", that is buying a lot of generality to use 1% of it.

So markdown-render does the simpler thing: pre-escape < and > in the Markdown source. That is two characters replaced with five-char and five-char entities, in a single pass:

fn escape_html_angle_brackets(s: &str) -> String {
    let mut out = String::with_capacity(s.len());
    for ch in s.chars() {
        match ch {
            '<' => out.push_str("&lt;"),
            '>' => out.push_str("&gt;"),
            other => out.push(other),
        }
    }
    out
}
Enter fullscreen mode Exit fullscreen mode

This approach is strictly more aggressive than ammonia. It has zero opinions about tags; nothing HTML-shaped ever reaches the parser. It is also less surgical. In particular, CommonMark angle-bracket autolinks (<https://example.com>) stop working in safe mode because the angle brackets get escaped before the parser sees them. I think that is a reasonable tradeoff for a 10 MB image, and the README documents it in one sentence so nobody is surprised. Use [text](url) explicit links if you want links in safe mode, or set safe: false and accept that you now own XSS.

The thing I keep reminding myself is that "safe" is a label, not a guarantee. Even with this pre-escape, you still want a strict Content-Security-Policy on the page that eventually renders this HTML. Defense in depth.

Tests

46 total: 27 unit + 14 integration + 5 slugify. All integration tests run against the full router via tower::ServiceExt::oneshot, which drives axum in-process β€” no sockets, no ports, no live HTTP β€” so the entire suite finishes in milliseconds. A typical assertion looks like:

#[tokio::test]
async fn render_safe_mode_escapes_script_tag() {
    let app = app::build_app();
    let res = app.oneshot(post_json(
        "/render",
        json!({ "markdown": "<script>alert(1)</script>" }),
    )).await.unwrap();
    assert_eq!(res.status(), StatusCode::OK);
    let json = body_json(res.into_body()).await;
    let html = json["html"].as_str().unwrap();
    assert!(html.contains("&lt;script&gt;"));
    assert!(!html.contains("<script>"));
}
Enter fullscreen mode Exit fullscreen mode

The pattern I keep landing on for axum services in this portfolio: the app factory is called build_app() in its own module, main.rs only knows how to bind a port and wire signals, and every test goes through build_app(). Swapping env vars between tests is a footgun because cargo runs tests in parallel and std::env is process-global; where a test needs an edge-case config I make the test send a genuinely large body instead of mutating env.

Tradeoffs

  • Angle-bracket autolinks do not work in safe mode, as discussed above. Use [text](url).
  • pulldown-cmark 0.12 does not expose bare-URL GFM autolinks as a first-class option. Newer releases bundle them under ENABLE_GFM, and a future bump to 0.13 will pick them up. Until then, again, use explicit link syntax.
  • Heading IDs are not deduplicated. Two # Foo headings produce two anchors both called foo. The slugifier is stateless on purpose; deduplication is the consumer's job, because the right -1 / -2 / -foo-2 scheme depends on the ToC format.
  • Unusual HTML entities in raw blocks (&#x3C;script&#x3E;) will survive safe mode, because pre-escaping < and > does not touch numeric entities. The parser then hands them to the HTML writer which emits them as text, so they still do not execute, but a strict reviewer would want either ammonia or a longer escape table. The README calls this out.

Try it in 30 seconds

git clone https://github.com/sen-ltd/markdown-render
cd markdown-render
docker build -t markdown-render .
docker run --rm -p 8000:8000 markdown-render
Enter fullscreen mode Exit fullscreen mode
curl -sS -X POST http://localhost:8000/render \
  -H 'Content-Type: application/json' \
  -d '{"markdown": "# Hello\n\n**Bold** and *italic*.\n\n- a\n- b"}' | jq
Enter fullscreen mode Exit fullscreen mode

If you already have markdown-api deployed and want to compare, the two services accept identical request bodies, so pointing a client at one or the other is a one-line config change. I am keeping both in the portfolio because the entire point of the exercise is to show the same contract filled in different languages and measure what each one costs.

Closing

Entry #183 in a 100+ portfolio series by SEN LLC. The PHP sibling is entry #133, markdown-api; the Rust house-style reference is entry that introduced feed-parser. Feedback welcome.

Top comments (0)