JAI

Posted on Mar 30 • Originally published at altrusian.com

I Made Streaming Markdown 300x Faster — Here's the Architecture

#webdev #javascript #react #ai

Every AI chat app has the same hidden performance bug.

Go open ChatGPT. Stream a long response. Open DevTools → Performance tab → Record.

Watch the flame chart. Every single token triggers a full re-parse of the entire accumulated markdown string. Every heading re-detected. Every code block re-highlighted. Every table re-measured.

After 500 tokens on a 2KB response, your app has re-parsed 1,000,000 characters. The work scales quadratically.

I built StreamMD to make this structurally impossible. Here's how.

🔴 The O(n²) Trap

Here's the code every AI app uses:

function Chat({ streamingText }) {
  // Re-parses ALL markdown, re-renders ALL components — per token
  return <ReactMarkdown>{streamingText}</ReactMarkdown>;
}

This looks innocent. But here's what actually happens on every token:

Token arrives
  → Concat to string (now 2,847 chars)
  → Re-parse ENTIRE string from char 0
  → Rebuild AST (unified/remark/rehype)
  → Diff entire virtual DOM tree
  → Reconcile all changed nodes
  → Re-highlight all code blocks
  → Re-measure all tables

At token 1, you parse 5 characters. At token 100, you parse 400 characters. At token 500, you parse 2,000 characters. Every. Single. Time.

The total characters processed:

5 + 10 + 15 + ... + 2,000 = ~500,000 characters

That's O(n²). And it gets worse the longer the response.

Why nobody notices

At 15 tok/s (GPT-3.5 speed), the browser can keep up. You burn CPU, but it's fast enough.

At 50+ tok/s (modern models), frames start dropping. Code blocks flicker as they're re-highlighted. Tables visibly rebuild. The scrollbar jitters.

At 100+ tok/s (where we're headed), it falls apart entirely.

🟢 The Fix: Incremental Block Parsing

I asked one question: what if the parser only processed new characters?

import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';

function Chat({ streamingText }) {
  return <StreamMD text={streamingText} theme="dark" />;
}

Same API. Same output. Completely different internals.

How StreamMD's parser works

The StreamParser class accepts the full accumulated text on each call. But internally, it tracks prevLength and only processes the delta:

push(fullText: string): ParseResult {
  if (fullText.length <= this.prevLength) return this.result;

  // Only process NEW characters
  const newContent = fullText.slice(this.prevLength);
  this.prevLength = fullText.length;

  // Parse new lines into blocks
  this.buffer += newContent;
  const lines = this.buffer.split('\n');
  // ... classify each line into block types
}

Each line is classified into a block type:

Heading — starts with #
Code fence — starts with `
Table — contains | pipes
List — starts with -, *, 1.
Blockquote — starts with >
Paragraph — everything else

When a block is complete (the parser encounters a blank line, a new heading, or a closing code fence), it's marked closed: true.

The React layer

Here's where it gets good. Each block is rendered by a React.memo component:

tsx const BlockContent = React.memo(function BlockContent({ block }) { switch (block.type) { case 'heading': return <HeadingBlock block={block} />; case 'code': return <CodeBlock block={block} />; case 'table': return <TableBlock block={block} />; // ... } });

Closed blocks never re-render. Their props don't change, so React.memo skips them entirely.

On each token, only one component re-renders — the active (last, unclosed) block. Everything above it is frozen.

🧠 The Hard Part: Incomplete Lines

Here's the bug that took the longest to fix.

When tokens arrive mid-line, you get partial content:

plaintext Token 1: "## He" ← Not a complete heading yet Token 2: "ading\n" ← NOW it's complete

The naive approach commits "## He" to the active block. When "ading\n" arrives, the parser sees the full line "## Heading" and processes it again. Duplicate text.

StreamMD's fix: incomplete lines live in a separate buffer.

`typescript
push(fullText: string) {
// ...
const lines = this.buffer.split('\n');

// Last segment has no trailing \n — it's incomplete
const incompleteLine = this.buffer.endsWith('\n')
? ''
: lines.pop()!;

// Only process COMPLETE lines
for (const line of lines) {
this.processLine(line);
}

// Store incomplete line separately
this._incompleteLine = incompleteLine;
this.buffer = incompleteLine;
}
`

The incomplete line is never committed to block content. Instead, it's virtually appended at render time:

tsx // In the React component if (incompleteLine && activeBlock) { // Display block = real content + pending text (read-only view) const displayContent = activeBlock.content + '\n' + incompleteLine; return <BlockContent block={{ ...activeBlock, content: displayContent }} />; }

This means the parser state is always clean. No duplication. No corruption. The incomplete text is a temporary visual overlay that gets replaced by the real content when the line completes.

📊 The Numbers

I built a live demo with a side-by-side comparison. Here's what it shows for a typical LLM response (~1,300 characters, 15 blocks):

Metric	react-markdown	StreamMD
Chars parsed	~400,000	~1,300
Per-token complexity	O(n) — full re-parse	O(1) — delta only
Block re-renders per token	All blocks	1 (active only)
Bundle size	45kB + remark + rehype	30kB total
Runtime dependencies	unified + remark + rehype + ...	0 (React peer only)
Syntax highlighting	BYO (Prism 40kB / Shiki 200kB)	Built-in (3kB, 15 langs)

300x fewer characters processed. Same formatted output.

💻 Usage

Drop-in replacement

bash npm install stream-md

`tsx
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';

// That's it. One component.

`

With Vercel AI SDK

`tsx
'use client';
import { useChat } from '@ai-sdk/react';
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';

export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();

return (

{messages.map((m) => (

{m.role === 'assistant' ? (

) : (

{m.content}

)}

))}

);
}
`

Hook API (advanced)

`tsx
import { useStreamMD } from 'stream-md';

function CustomRenderer() {
const { blocks, activeIndex, incompleteLine, push, reset } = useStreamMD();

useEffect(() => {
const sse = new EventSource('/api/chat');
let text = '';
sse.onmessage = (e) => {
text += e.data;
push(text);
};
return () => sse.close();
}, [push]);

return (

{blocks.map((block, i) => (

{/* Frozen blocks will never re-render thanks to React.memo */}

))}

);
}
`

Component overrides

Full control — swap any element with your own component:

tsx <StreamMD text={text} components={{ pre: ({ code, language }) => <MyCodeBlock code={code} lang={language} />, a: ({ href, children }) => <MyLink href={href}>{children}</MyLink>, table: ({ headers, rows }) => <MyTable headers={headers} rows={rows} />, }} />

🎨 What's Included

Markdown support:
Headings, paragraphs, code blocks (fenced), inline code, bold, italic, links, images, ordered/unordered/task lists, tables with alignment, blockquotes, horizontal rules, strikethrough.

Syntax highlighting:
Built-in lightweight highlighter (~3kB) for JavaScript, TypeScript, Python, Rust, Go, Java, C/C++, Bash, JSON, HTML, CSS, SQL, YAML, Diff, Markdown. No Prism. No Shiki. No extra bundle.

Theming:
Dark and light presets via CSS custom properties. Or bring your own — set theme="none" and override --smd-* variables.

🔗 The Stack: ZeroJitter + StreamMD

StreamMD has a companion library: ZeroJitter.

plaintext zero-jitter → plain text streaming (canvas rendering, zero DOM reflows) stream-md → markdown streaming (incremental parsing, block memoization)

ZeroJitter eliminates layout thrashing by rendering text to <canvas> via a Web Worker. It's for raw text streams where you don't need markdown formatting.

StreamMD eliminates redundant parsing by incrementally tracking blocks. It's for full markdown rendering with headings, code blocks, tables, and inline formatting.

Together, they own the "streaming LLM display" category. Use the right tool for the job.

The Takeaway

The performance problem in AI chat apps isn't React. It isn't the DOM. It's re-parsing content that hasn't changed.

StreamMD doesn't make React faster. It makes React do less work. Completed blocks are frozen. Only the active block updates. The parser only sees new characters.

The fastest code is the code that never runs.

📦 npm install stream-md

⭐ GitHub

🎮 Live Demo

Built by Jai. Feedback and contributions welcome.

Top comments (1)

Harjot Singh • Jun 1

i appreciate your insights on the performance issues with AI chat apps. it's wild how a simple re-parse can create such inefficiencies. on a different note, if you're looking to deploy your own app quickly, check out moonshift. you can get a full next.js + postgres + auth setup live in about 7 minutes, and you own the code. happy to offer a complimentary run if you're interested.