Every AI chat app has the same hidden performance bug.
Go open ChatGPT. Stream a long response. Open DevTools → Performance tab → Record.
Watch the flame chart. Every single token triggers a full re-parse of the entire accumulated markdown string. Every heading re-detected. Every code block re-highlighted. Every table re-measured.
After 500 tokens on a 2KB response, your app has re-parsed 1,000,000 characters. The work scales quadratically.
I built StreamMD to make this structurally impossible. Here's how.
🔴 The O(n²) Trap
Here's the code every AI app uses:
function Chat({ streamingText }) {
// Re-parses ALL markdown, re-renders ALL components — per token
return <ReactMarkdown>{streamingText}</ReactMarkdown>;
}
This looks innocent. But here's what actually happens on every token:
Token arrives
→ Concat to string (now 2,847 chars)
→ Re-parse ENTIRE string from char 0
→ Rebuild AST (unified/remark/rehype)
→ Diff entire virtual DOM tree
→ Reconcile all changed nodes
→ Re-highlight all code blocks
→ Re-measure all tables
At token 1, you parse 5 characters. At token 100, you parse 400 characters. At token 500, you parse 2,000 characters. Every. Single. Time.
The total characters processed:
5 + 10 + 15 + ... + 2,000 = ~500,000 characters
That's O(n²). And it gets worse the longer the response.
Why nobody notices
At 15 tok/s (GPT-3.5 speed), the browser can keep up. You burn CPU, but it's fast enough.
At 50+ tok/s (modern models), frames start dropping. Code blocks flicker as they're re-highlighted. Tables visibly rebuild. The scrollbar jitters.
At 100+ tok/s (where we're headed), it falls apart entirely.
🟢 The Fix: Incremental Block Parsing
I asked one question: what if the parser only processed new characters?
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';
function Chat({ streamingText }) {
return <StreamMD text={streamingText} theme="dark" />;
}
Same API. Same output. Completely different internals.
How StreamMD's parser works
The StreamParser class accepts the full accumulated text on each call. But internally, it tracks prevLength and only processes the delta:
push(fullText: string): ParseResult {
if (fullText.length <= this.prevLength) return this.result;
// Only process NEW characters
const newContent = fullText.slice(this.prevLength);
this.prevLength = fullText.length;
// Parse new lines into blocks
this.buffer += newContent;
const lines = this.buffer.split('\n');
// ... classify each line into block types
}
Each line is classified into a block type:
-
Heading — starts with
# -
Code fence — starts with
` -
Table — contains
|pipes -
List — starts with
-,*,1. -
Blockquote — starts with
> - Paragraph — everything else
When a block is complete (the parser encounters a blank line, a new heading, or a closing code fence), it's marked closed: true.
The React layer
Here's where it gets good. Each block is rendered by a React.memo component:
tsx
const BlockContent = React.memo(function BlockContent({ block }) {
switch (block.type) {
case 'heading': return <HeadingBlock block={block} />;
case 'code': return <CodeBlock block={block} />;
case 'table': return <TableBlock block={block} />;
// ...
}
});
Closed blocks never re-render. Their props don't change, so React.memo skips them entirely.
On each token, only one component re-renders — the active (last, unclosed) block. Everything above it is frozen.
🧠 The Hard Part: Incomplete Lines
Here's the bug that took the longest to fix.
When tokens arrive mid-line, you get partial content:
plaintext
Token 1: "## He" ← Not a complete heading yet
Token 2: "ading\n" ← NOW it's complete
The naive approach commits "## He" to the active block. When "ading\n" arrives, the parser sees the full line "## Heading" and processes it again. Duplicate text.
StreamMD's fix: incomplete lines live in a separate buffer.
`typescript
push(fullText: string) {
// ...
const lines = this.buffer.split('\n');
// Last segment has no trailing \n — it's incomplete
const incompleteLine = this.buffer.endsWith('\n')
? ''
: lines.pop()!;
// Only process COMPLETE lines
for (const line of lines) {
this.processLine(line);
}
// Store incomplete line separately
this._incompleteLine = incompleteLine;
this.buffer = incompleteLine;
}
`
The incomplete line is never committed to block content. Instead, it's virtually appended at render time:
tsx
// In the React component
if (incompleteLine && activeBlock) {
// Display block = real content + pending text (read-only view)
const displayContent = activeBlock.content + '\n' + incompleteLine;
return <BlockContent block={{ ...activeBlock, content: displayContent }} />;
}
This means the parser state is always clean. No duplication. No corruption. The incomplete text is a temporary visual overlay that gets replaced by the real content when the line completes.
📊 The Numbers
I built a live demo with a side-by-side comparison. Here's what it shows for a typical LLM response (~1,300 characters, 15 blocks):
| Metric | react-markdown | StreamMD |
|---|---|---|
| Chars parsed | ~400,000 | ~1,300 |
| Per-token complexity | O(n) — full re-parse | O(1) — delta only |
| Block re-renders per token | All blocks | 1 (active only) |
| Bundle size | 45kB + remark + rehype | 30kB total |
| Runtime dependencies | unified + remark + rehype + ... | 0 (React peer only) |
| Syntax highlighting | BYO (Prism 40kB / Shiki 200kB) | Built-in (3kB, 15 langs) |
300x fewer characters processed. Same formatted output.
💻 Usage
Drop-in replacement
bash
npm install stream-md
`tsx
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';
// That's it. One component.
`
With Vercel AI SDK
`tsx
'use client';
import { useChat } from '@ai-sdk/react';
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
{messages.map((m) => (
{m.role === 'assistant' ? (
) : (
{m.content}
)}
))}
);
}
`
Hook API (advanced)
`tsx
import { useStreamMD } from 'stream-md';
function CustomRenderer() {
const { blocks, activeIndex, incompleteLine, push, reset } = useStreamMD();
useEffect(() => {
const sse = new EventSource('/api/chat');
let text = '';
sse.onmessage = (e) => {
text += e.data;
push(text);
};
return () => sse.close();
}, [push]);
return (
{blocks.map((block, i) => (
{/* Frozen blocks will never re-render thanks to React.memo */}
))}
);
}
`
Component overrides
Full control — swap any element with your own component:
tsx
<StreamMD
text={text}
components={{
pre: ({ code, language }) => <MyCodeBlock code={code} lang={language} />,
a: ({ href, children }) => <MyLink href={href}>{children}</MyLink>,
table: ({ headers, rows }) => <MyTable headers={headers} rows={rows} />,
}}
/>
🎨 What's Included
Markdown support:
Headings, paragraphs, code blocks (fenced), inline code, bold, italic, links, images, ordered/unordered/task lists, tables with alignment, blockquotes, horizontal rules, strikethrough.
Syntax highlighting:
Built-in lightweight highlighter (~3kB) for JavaScript, TypeScript, Python, Rust, Go, Java, C/C++, Bash, JSON, HTML, CSS, SQL, YAML, Diff, Markdown. No Prism. No Shiki. No extra bundle.
Theming:
Dark and light presets via CSS custom properties. Or bring your own — set theme="none" and override --smd-* variables.
🔗 The Stack: ZeroJitter + StreamMD
StreamMD has a companion library: ZeroJitter.
plaintext
zero-jitter → plain text streaming (canvas rendering, zero DOM reflows)
stream-md → markdown streaming (incremental parsing, block memoization)
ZeroJitter eliminates layout thrashing by rendering text to <canvas> via a Web Worker. It's for raw text streams where you don't need markdown formatting.
StreamMD eliminates redundant parsing by incrementally tracking blocks. It's for full markdown rendering with headings, code blocks, tables, and inline formatting.
Together, they own the "streaming LLM display" category. Use the right tool for the job.
The Takeaway
The performance problem in AI chat apps isn't React. It isn't the DOM. It's re-parsing content that hasn't changed.
StreamMD doesn't make React faster. It makes React do less work. Completed blocks are frozen. Only the active block updates. The parser only sees new characters.
The fastest code is the code that never runs.
📦 npm install stream-md
⭐ GitHub
Built by Jai. Feedback and contributions welcome.
Top comments (0)