DEV Community: JAI

I Made Streaming Markdown 300x Faster — Here's the Architecture

JAI — Mon, 30 Mar 2026 12:17:14 +0000

Every AI chat app has the same hidden performance bug.

Go open ChatGPT. Stream a long response. Open DevTools → Performance tab → Record.

Watch the flame chart. Every single token triggers a full re-parse of the entire accumulated markdown string. Every heading re-detected. Every code block re-highlighted. Every table re-measured.

After 500 tokens on a 2KB response, your app has re-parsed 1,000,000 characters. The work scales quadratically.

I built StreamMD to make this structurally impossible. Here's how.

🔴 The O(n²) Trap

Here's the code every AI app uses:

function Chat({ streamingText }) {
  // Re-parses ALL markdown, re-renders ALL components — per token
  return <ReactMarkdown>{streamingText}</ReactMarkdown>;
}

This looks innocent. But here's what actually happens on every token:

Token arrives
  → Concat to string (now 2,847 chars)
  → Re-parse ENTIRE string from char 0
  → Rebuild AST (unified/remark/rehype)
  → Diff entire virtual DOM tree
  → Reconcile all changed nodes
  → Re-highlight all code blocks
  → Re-measure all tables

At token 1, you parse 5 characters. At token 100, you parse 400 characters. At token 500, you parse 2,000 characters. Every. Single. Time.

The total characters processed:

5 + 10 + 15 + ... + 2,000 = ~500,000 characters

That's O(n²). And it gets worse the longer the response.

Why nobody notices

At 15 tok/s (GPT-3.5 speed), the browser can keep up. You burn CPU, but it's fast enough.

At 50+ tok/s (modern models), frames start dropping. Code blocks flicker as they're re-highlighted. Tables visibly rebuild. The scrollbar jitters.

At 100+ tok/s (where we're headed), it falls apart entirely.

🟢 The Fix: Incremental Block Parsing

I asked one question: what if the parser only processed new characters?

import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';

function Chat({ streamingText }) {
  return <StreamMD text={streamingText} theme="dark" />;
}

Same API. Same output. Completely different internals.

How StreamMD's parser works

The StreamParser class accepts the full accumulated text on each call. But internally, it tracks prevLength and only processes the delta:

push(fullText: string): ParseResult {
  if (fullText.length <= this.prevLength) return this.result;

  // Only process NEW characters
  const newContent = fullText.slice(this.prevLength);
  this.prevLength = fullText.length;

  // Parse new lines into blocks
  this.buffer += newContent;
  const lines = this.buffer.split('\n');
  // ... classify each line into block types
}

Each line is classified into a block type:

Heading — starts with #
Code fence — starts with `
Table — contains | pipes
List — starts with -, *, 1.
Blockquote — starts with >
Paragraph — everything else

When a block is complete (the parser encounters a blank line, a new heading, or a closing code fence), it's marked closed: true.

The React layer

Here's where it gets good. Each block is rendered by a React.memo component:

tsx const BlockContent = React.memo(function BlockContent({ block }) { switch (block.type) { case 'heading': return <HeadingBlock block={block} />; case 'code': return <CodeBlock block={block} />; case 'table': return <TableBlock block={block} />; // ... } });

Closed blocks never re-render. Their props don't change, so React.memo skips them entirely.

On each token, only one component re-renders — the active (last, unclosed) block. Everything above it is frozen.

🧠 The Hard Part: Incomplete Lines

Here's the bug that took the longest to fix.

When tokens arrive mid-line, you get partial content:

plaintext Token 1: "## He" ← Not a complete heading yet Token 2: "ading\n" ← NOW it's complete

The naive approach commits "## He" to the active block. When "ading\n" arrives, the parser sees the full line "## Heading" and processes it again. Duplicate text.

StreamMD's fix: incomplete lines live in a separate buffer.

`typescript
push(fullText: string) {
// ...
const lines = this.buffer.split('\n');

// Last segment has no trailing \n — it's incomplete
const incompleteLine = this.buffer.endsWith('\n')
? ''
: lines.pop()!;

// Only process COMPLETE lines
for (const line of lines) {
this.processLine(line);
}

// Store incomplete line separately
this._incompleteLine = incompleteLine;
this.buffer = incompleteLine;
}
`

The incomplete line is never committed to block content. Instead, it's virtually appended at render time:

tsx // In the React component if (incompleteLine && activeBlock) { // Display block = real content + pending text (read-only view) const displayContent = activeBlock.content + '\n' + incompleteLine; return <BlockContent block={{ ...activeBlock, content: displayContent }} />; }

This means the parser state is always clean. No duplication. No corruption. The incomplete text is a temporary visual overlay that gets replaced by the real content when the line completes.

📊 The Numbers

I built a live demo with a side-by-side comparison. Here's what it shows for a typical LLM response (~1,300 characters, 15 blocks):

Metric	react-markdown	StreamMD
Chars parsed	~400,000	~1,300
Per-token complexity	O(n) — full re-parse	O(1) — delta only
Block re-renders per token	All blocks	1 (active only)
Bundle size	45kB + remark + rehype	30kB total
Runtime dependencies	unified + remark + rehype + ...	0 (React peer only)
Syntax highlighting	BYO (Prism 40kB / Shiki 200kB)	Built-in (3kB, 15 langs)

300x fewer characters processed. Same formatted output.

💻 Usage

Drop-in replacement

bash npm install stream-md

`tsx
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';

// That's it. One component.

`

With Vercel AI SDK

`tsx
'use client';
import { useChat } from '@ai-sdk/react';
import { StreamMD } from 'stream-md';
import 'stream-md/styles.css';

export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();

return (

{messages.map((m) => (

{m.role === 'assistant' ? (

) : (

{m.content}

)}

))}

);
}
`

Hook API (advanced)

`tsx
import { useStreamMD } from 'stream-md';

function CustomRenderer() {
const { blocks, activeIndex, incompleteLine, push, reset } = useStreamMD();

useEffect(() => {
const sse = new EventSource('/api/chat');
let text = '';
sse.onmessage = (e) => {
text += e.data;
push(text);
};
return () => sse.close();
}, [push]);

return (

{blocks.map((block, i) => (

{/* Frozen blocks will never re-render thanks to React.memo */}

))}

);
}
`

Component overrides

Full control — swap any element with your own component:

tsx <StreamMD text={text} components={{ pre: ({ code, language }) => <MyCodeBlock code={code} lang={language} />, a: ({ href, children }) => <MyLink href={href}>{children}</MyLink>, table: ({ headers, rows }) => <MyTable headers={headers} rows={rows} />, }} />

🎨 What's Included

Markdown support:
Headings, paragraphs, code blocks (fenced), inline code, bold, italic, links, images, ordered/unordered/task lists, tables with alignment, blockquotes, horizontal rules, strikethrough.

Syntax highlighting:
Built-in lightweight highlighter (~3kB) for JavaScript, TypeScript, Python, Rust, Go, Java, C/C++, Bash, JSON, HTML, CSS, SQL, YAML, Diff, Markdown. No Prism. No Shiki. No extra bundle.

Theming:
Dark and light presets via CSS custom properties. Or bring your own — set theme="none" and override --smd-* variables.

🔗 The Stack: ZeroJitter + StreamMD

StreamMD has a companion library: ZeroJitter.

plaintext zero-jitter → plain text streaming (canvas rendering, zero DOM reflows) stream-md → markdown streaming (incremental parsing, block memoization)

ZeroJitter eliminates layout thrashing by rendering text to <canvas> via a Web Worker. It's for raw text streams where you don't need markdown formatting.

StreamMD eliminates redundant parsing by incrementally tracking blocks. It's for full markdown rendering with headings, code blocks, tables, and inline formatting.

Together, they own the "streaming LLM display" category. Use the right tool for the job.

The Takeaway

The performance problem in AI chat apps isn't React. It isn't the DOM. It's re-parsing content that hasn't changed.

StreamMD doesn't make React faster. It makes React do less work. Completed blocks are frozen. Only the active block updates. The parser only sees new characters.

The fastest code is the code that never runs.

📦 npm install stream-md

⭐ GitHub

🎮 Live Demo

Built by Jai. Feedback and contributions welcome.

I Eliminated Layout Jitter From LLM Streaming — Here's How

JAI — Mon, 30 Mar 2026 08:15:13 +0000

Every AI chat app has the same bug. You've felt it. That stuttering scrollbar, the content jumping, the dropped frames when tokens stream in. I spent weeks building a library that makes it physically impossible.

The Problem Nobody Talks About

Open ChatGPT. Claude. Gemini. Any LLM-powered chat interface.

Now watch the scrollbar while the model streams a response.

See it? That micro-stutter. The scrollbar jumps. The content reflows. If you're on a slower device, you'll see actual frame drops. It's subtle on short responses, but stream 500+ tokens and it becomes infuriating.

Why does this happen?

Every single token that arrives triggers the same cascade:

Token arrives → DOM mutation → Style recalculation → Layout reflow → Paint → Composite

At 50 tokens/second, that's 50 full layout reflows per second. Each one forces the browser to:

Recalculate every CSS property that could be affected
Recompute the geometry of every element in the render tree
Determine what pixels need repainting
Composite the final frame

On a page with 200 DOM elements, each reflow touches dozens of nodes. The browser's layout engine was never designed for this kind of write-heavy, real-time workload.

The result: Scrollbar jitter. Content jumping. Dropped frames. A "janky" feeling that makes expensive AI products feel cheap.

The Nuclear Option: Bypass the DOM Entirely

I asked a simple question: What if we never trigger a single layout reflow?

The answer was <canvas>.

Canvas rendering uses fillText() — a direct pixel operation that happens in the compositor thread. No DOM nodes to measure. No CSS to recalculate. No layout to reflow. Just math → pixels.

But "just use canvas" is like saying "just rewrite everything in Assembly." You lose:

Text selection
Accessibility (screen readers)
Responsive reflow on resize
Line breaking
International text support (CJK, BiDi, Thai)

So I built ZeroJitter — a React component that gives you all of those back while keeping the canvas performance.

Architecture: How ZeroJitter Works

┌─ Main Thread ──────────────────────────────────┐
│                                                │
│  LLM tokens → useZeroJitter hook               │
│                    │                            │
│              postMessage()                      │
│                    ▼                            │
│  ┌─ Web Worker ────────────────────────┐       │
│  │ Intl.Segmenter → measureText()      │       │
│  │ Line breaking • CJK • BiDi • Emoji  │       │
│  │ Returns: lines[], height, widths     │       │
│  └─────────────────────────────────────┘       │
│                    │                            │
│              onmessage()                        │
│                    ▼                            │
│  CanvasRenderer.paint() → <canvas>              │
│  AccessibilityMirror  → <div aria-live>         │
│                                                │
└────────────────────────────────────────────────┘

The Key Insight: Measurement ≠ Rendering

The expensive part of text layout isn't painting pixels — it's measuring text. Every time you add a word, the browser needs to figure out: Does this word fit on the current line? Where does the next line start? How tall is the container now?

ZeroJitter moves ALL of this math to a Web Worker using CanvasRenderingContext2D.measureText(). The worker:

Segments text via Intl.Segmenter (handles CJK per-character breaking, Thai word boundaries, Arabic/Hebrew BiDi)
Measures each segment via an OffscreenCanvas measureText() call
Caches measurements — the word "the" at 16px Inter always has the same width
Performs line breaking with pure arithmetic (~0.0002ms per text block)
Returns line data to the main thread

The main thread then just fillText()s each line at its computed position. Zero layout involvement. Zero reflows. Locked 60fps.

The Numbers

Metric	DOM Rendering	ZeroJitter
Reflows per token	1	0
Layout time	0.3-2ms	<0.01ms
Frame drops (@ 100 tok/s)	12-30	0
FPS	45-58	60 (locked)
Scrollbar stability	Jittery	Rock solid

Usage

npm install zero-jitter

import { useRef } from 'react';
import { ZeroJitter, useZeroJitter } from 'zero-jitter';

function StreamingChat() {
  const ref = useRef<HTMLDivElement>(null);
  const { append, clear, layout } = useZeroJitter(ref);

  useEffect(() => {
    const sse = new EventSource('/api/chat');
    sse.onmessage = (e) => append(e.data);
    return () => sse.close();
  }, [append]);

  return (
    <ZeroJitter
      ref={ref}
      font="16px Inter"
      maxHeight={400}
      color="#e2e8f0"
    />
  );
}

That's it. Drop-in replacement. Your streaming goes from janky to buttery.

What Makes This Different

Not "just a canvas text renderer"

There are canvas text libraries. ZeroJitter is specifically engineered for streaming:

Token coalescing: Multiple tokens arriving in the same frame are batched into one worker message via requestAnimationFrame
Stale response discarding: Monotonic request IDs ensure out-of-order worker responses don't cause glitches
Incremental layout: Only remeasures changed text, not the entire document
Viewport culling: O(log n) binary search — only visible lines are painted, even for 10,000-line documents

Full accessibility

A visually-hidden <div aria-live="polite"> mirrors the canvas text with a 300ms debounce during streaming. Screen readers announce updates without being flooded by individual tokens.

Zero dependencies

The entire text layout engine (based on pretext) is vendored into the library. No external runtime dependencies. Just React as a peer dep.

International text

Built on Intl.Segmenter with full support for:

CJK (Chinese, Japanese, Korean) — per-character line breaking with kinsoku rules
Arabic/Hebrew — BiDi text with correct segment ordering
Thai — proper word segmentation (Thai has no spaces!)
Emoji — corrects Chrome/Firefox canvas emoji width inflation

Live Demo

See it yourself: altrusian.com/zero-jitter

The demo streams the same text into both a standard DOM element and a ZeroJitter canvas side-by-side, with real-time metrics. Crank the speed to 150 tok/s and watch the DOM panel fall apart while the canvas stays rock solid.

The Deeper Problem

Layout thrashing isn't a "nice to fix" — it's a trust destroyer.

When users interact with an AI chat app, the streaming response is the primary interface. If that interface stutters, users subconsciously associate the jank with the AI itself. "Is it thinking? Did it freeze? Is something wrong?"

Smooth streaming = perceived intelligence.

Every major AI company is going to need to solve this as models get faster. GPT-4o streams at ~100 tokens/second. The next generation will be 200+. DOM rendering will break completely at those speeds.

ZeroJitter is open source, MIT licensed, and ready for production.

Links:

📦 npm: npm install zero-jitter
🔗 GitHub: github.com/jvoltci/zero-jitter
🎮 Live Demo: altrusian.com/zero-jitter
🌐 Website: altrusian.com/zero-jitter

TL;DR: I built a React library that renders streaming LLM text on <canvas> instead of the DOM. Zero layout reflows, locked 60fps, full accessibility, zero dependencies. The scrollbar will never jitter again.