DEV Community

Cover image for From O(n ) to O(n): Building a Streaming Markdown Renderer for the AI Era
king
king

Posted on

From O(n ) to O(n): Building a Streaming Markdown Renderer for the AI Era

From O(n²) to O(n): Building a Streaming Markdown Renderer for the AI Era

If you've built an AI chat application, you've probably noticed something frustrating: the longer the conversation gets, the slower the rendering becomes.

The reason is simple — every time the AI outputs a new token, traditional markdown parsers re-parse the entire document from scratch. This is a fundamental architectural problem, and it only gets worse as AI outputs get longer.

We built Incremark to fix this.

The Uncomfortable Truth About AI in 2025

If you've been following AI trends, you know the numbers are getting crazy:

  • 2022: GPT-3.5 responses? A few hundred words, no big deal
  • 2023: GPT-4 cranks it up to 2,000-4,000 words
  • 2024-2025: Reasoning models (o1, DeepSeek R1) are outputting 10,000+ word "thinking processes"

We're moving from 4K token conversations to 32K, even 128K. And here's the thing nobody talks about: rendering 500 words and rendering 50,000 words of Markdown are completely different engineering problems.

Most markdown libraries? They were built for blog posts. Not for AI that thinks out loud.

Why Your Markdown Parser is Lying to You

Here's what happens under the hood when you stream AI output through a traditional parser:

Chunk 1: Parse 100 chars ✓
Chunk 2: Parse 200 chars (100 old + 100 new)
Chunk 3: Parse 300 chars (200 old + 100 new)
...
Chunk 100: Parse 10,000 chars 😰
Enter fullscreen mode Exit fullscreen mode

Total work: 100 + 200 + 300 + ... + 10,000 = 5,050,000 character operations.

That's O(n²). The cost doesn't just grow — it explodes.

For a 20KB AI response, this means:

  • ant-design-x: 1,657 ms parsing time
  • markstream-vue: 5,755 ms (almost 6 seconds of parsing!)

And these are popular, well-maintained libraries. The problem isn't bad code — it's the wrong architecture.

The Key Insight

Here's the thing:

Once a markdown block is "complete", it will never change.

Think about it. When the AI outputs:

# Heading

This is a paragraph.

Enter fullscreen mode Exit fullscreen mode

After that second blank line, the paragraph is done. Locked in. No matter what comes next — code blocks, lists, more paragraphs — that paragraph will never be touched again.

So why are we re-parsing it 500 times?

How Incremark Actually Works

We built Incremark around this insight. The core algorithm:

  1. Detect stable boundaries — blank lines, new headings, fence closings
  2. Cache completed blocks — never touch them again
  3. Only re-parse the pending block — the one still receiving input
Chunk 1: Parse 100 chars → cache stable blocks
Chunk 2: Parse only ~100 new chars
Chunk 3: Parse only ~100 new chars
...
Chunk 100: Parse only ~100 new chars
Enter fullscreen mode Exit fullscreen mode

Total work: 100 × 100 = 10,000 character operations.

That's 500x less work. Each character is parsed at most once. That's O(n).

Complete Benchmark Data

We benchmarked 38 real markdown files — AI conversations, docs, code analysis reports. Not synthetic test data. Total: 6,484 lines, 128.55 KB.

Here's the full table:

File Lines Size Incremark Streamdown markstream-vue ant-design-x
test-footnotes-simple.md 15 0.09 KB 0.3 ms 0.0 ms 1.4 ms 0.2 ms
simple-paragraphs.md 16 0.41 KB 0.9 ms 0.9 ms 5.9 ms 1.0 ms
introduction.md 34 1.57 KB 5.6 ms 12.6 ms 75.6 ms 12.8 ms
footnotes.md 52 0.94 KB 1.7 ms 0.2 ms 10.6 ms 1.9 ms
concepts.md 91 4.29 KB 12.0 ms 50.5 ms 381.9 ms 53.6 ms
comparison.md 109 5.39 KB 20.5 ms 74.0 ms 552.2 ms 85.2 ms
complex-html-examples.md 147 3.99 KB 9.0 ms 58.8 ms 279.3 ms 57.2 ms
FOOTNOTE_FIX_SUMMARY.md 236 3.93 KB 22.7 ms 0.5 ms 535.0 ms 120.8 ms
OPTIMIZATION_SUMMARY.md 391 6.24 KB 19.1 ms 208.4 ms 980.6 ms 217.8 ms
BLOCK_TRANSFORMER_ANALYSIS.md 489 9.24 KB 75.7 ms 574.3 ms 1984.1 ms 619.9 ms
test-md-01.md 916 17.67 KB 87.7 ms 1441.1 ms 5754.7 ms 1656.9 ms
Total (38 files) 6484 128.55 KB 519.4 ms 3190.3 ms 14683.9 ms 3728.6 ms

Being Honest: Where We're Slower

You'll notice something weird in the data. For footnotes.md and FOOTNOTE_FIX_SUMMARY.md, Streamdown appears much faster:

File Incremark Streamdown Why?
footnotes.md 1.7 ms 0.2 ms Streamdown doesn't support footnotes
FOOTNOTE_FIX_SUMMARY.md 22.7 ms 0.5 ms Same — it just skips them

This isn't a performance issue — it's a feature difference.

When Streamdown encounters [^1] footnote syntax, it simply ignores it. Incremark fully implements footnotes — and we had to solve a tricky streaming-specific problem:

In streaming scenarios, references often arrive before definitions:

Chunk 1: "See footnote[^1] for details..."  // reference arrives first
Chunk 2: "More content..."
Chunk 3: "[^1]: This is the definition"     // definition arrives later
Enter fullscreen mode Exit fullscreen mode

Traditional parsers assume you have the complete document. We built "optimistic references" that gracefully handle incomplete links/images during streaming, then resolve them when definitions arrive.

We chose to fully implement footnotes, math blocks ($...$), and custom containers (:::tip) because that's what real AI content needs.

Where We Actually Shine

Excluding footnote files, look at standard markdown performance:

File Lines Incremark Streamdown Advantage
concepts.md 91 12.0 ms 50.5 ms 4.2x
comparison.md 109 20.5 ms 74.0 ms 3.6x
complex-html-examples.md 147 9.0 ms 58.8 ms 6.6x
OPTIMIZATION_SUMMARY.md 391 19.1 ms 208.4 ms 10.9x
test-md-01.md 916 87.7 ms 1441.1 ms 16.4x

The pattern is clear: the larger the document, the bigger our advantage.

For the largest file (17.67 KB):

  • Incremark: 88 ms
  • ant-design-x: 1,657 ms (18.9x slower)
  • markstream-vue: 5,755 ms (65.6x slower)

Why Such a Huge Gap?

This is O(n) vs O(n²) in action.

Traditional parsers re-parse the entire document on every chunk:

Chunk 1: Parse 100 chars
Chunk 2: Parse 200 chars (100 old + 100 new)
Chunk 3: Parse 300 chars (200 old + 100 new)
...
Chunk 100: Parse 10,000 chars
Enter fullscreen mode Exit fullscreen mode

Total work: 100 + 200 + ... + 10,000 = 5,050,000 character operations.

Incremark only processes new content:

Chunk 1: Parse 100 chars → cache stable blocks
Chunk 2: Parse only ~100 new chars
Chunk 3: Parse only ~100 new chars
...
Chunk 100: Parse only ~100 new chars
Enter fullscreen mode Exit fullscreen mode

Total work: 100 × 100 = 10,000 character operations.

That's a 500x difference. And it only gets worse as documents grow.

When to Use Incremark

Use Incremark for:

  • AI chat with streaming output (Claude, ChatGPT, etc.)
  • Long-form AI content (reasoning models, code generation)
  • Real-time markdown editors
  • Content requiring footnotes, math, or custom containers
  • 100K+ token conversations

⚠️ Consider alternatives for:

  • One-time static markdown rendering (just use marked directly)
  • Very small files (<500 characters) — the overhead isn't worth it

Two Engines, One Goal

Marked or Micromark? Both have tradeoffs.

Marked is blazing fast but lacks advanced features. Micromark is spec-perfect but heavier.

Our answer: support both.

Engine Speed Best For
Marked (default) ⚡⚡⚡⚡⚡ Real-time streaming, AI chat
Micromark ⚡⚡⚡ Complex docs, strict CommonMark

We extended Marked with custom tokenizers for footnotes, math, and containers. If you hit edge cases Marked can't handle, switch to Micromark with one config change.

Both engines produce identical mdast output. Your rendering code doesn't care which one is running.

The Typewriter Problem Nobody Talks About

You know that smooth "typing" effect ChatGPT has? Most implementations do this:

displayText = fullText.slice(0, currentIndex)
Enter fullscreen mode Exit fullscreen mode

This breaks markdown constantly. You get half-rendered **bold** tags, flickering code blocks, syntax that looks drunk.

We moved the animation to the AST level. Our BlockTransformer knows the structure — it animates within nodes, never across them. Result: buttery smooth typing that respects markdown semantics.

Try It Yourself

npm install @incremark/vue  # or react, or svelte
Enter fullscreen mode Exit fullscreen mode
<script setup>
import { ref } from 'vue'
import { IncremarkContent } from '@incremark/vue'

const content = ref('')
const isFinished = ref(false)

async function handleStream(stream) {
  for await (const chunk of stream) {
    content.value += chunk
  }
  isFinished.value = true
}
</script>

<template>
  <IncremarkContent 
    :content="content" 
    :is-finished="isFinished"
    :incremark-options="{ gfm: true, math: true }"
  />
</template>
Enter fullscreen mode Exit fullscreen mode

We support Vue 3, React 18, and Svelte 5 with identical APIs. One core, three frameworks, zero behavior differences.

What's Next

This is version 0.3.0. We're just getting started.

The AI world is moving toward longer outputs, more complex reasoning traces, and richer formatting. Traditional parsers can't keep up — their O(n²) architecture guarantees it.

We built Incremark because we needed it. Hopefully you find it useful too.


📚 Docs: incremark.com
💻 GitHub: kingshuaishuai/incremark
🎮 Live Demos: Vue | React | Svelte

If this saved you debugging time, a ⭐️ on GitHub would mean a lot. Questions? Open an issue or drop a comment below.

Top comments (0)