吴迦

Posted on Jun 7 • Originally published at pitchshow.ai

The Hidden Cost of Real-Time UI: What We Learned Building PitchShow's Streaming Engine

#react #streaming #architecture #performance

Look, I need to tell you something about React Server Components that most tutorials skip: they're incredible when they work, but getting there is like learning to ride a bike while the bike is actively on fire.

When we started rebuilding PitchShow's generation engine with streaming architecture, I thought it would be straightforward. React 19 just shipped stable Server Components, Next.js 15 had solid documentation, and everyone on Twitter was hyping it as the future of React. What could go wrong?

Everything. Everything could go wrong.

This article is about the stuff nobody talks about when they demo streaming UI. The memory leaks. The race conditions. The moment we realized our "optimized" architecture was actually slower than the old batch mode for 40% of users. And most importantly, how we fixed it.

Why We Even Considered Streaming in the First Place

Before we dive into the technical nightmare, let me explain why streaming architecture mattered for PitchShow.

We're building an AI presentation generator. Users give us a prompt, we hit Claude or GPT-4, and we spit out a full slide deck. The old architecture looked like this:

User clicks "Generate"
Loading spinner appears
Wait 25-30 seconds while AI does its thing
Entire presentation materializes
User realizes slide 7 has wrong data
Back to step 1

The problem? Nobody waits 30 seconds anymore. Not in 2026. ChatGPT streams tokens. Cursor AI streams code. Even Google Docs now has real-time AI suggestions. Users expect to see something happening immediately.

So we built streaming.

Figure 1: Time to First Slide dropped from 28 seconds to 1.2 seconds with streaming

The Architecture That Looked Great on Paper

Here's what we designed in our first sprint:

// Server Component (runs on server)
async function PresentationGenerator({ prompt }: { prompt: string }) {
  const stream = await generatePresentation(prompt);

  return (
    <Suspense fallback={<LoadingOutline />}>
      <StreamingPresentation source={stream} />
    </Suspense>
  );
}

// Client Component (runs in browser)
'use client';
function StreamingPresentation({ source }: { source: ReadableStream }) {
  const [slides, setSlides] = useState<Slide[]>([]);

  useEffect(() => {
    const reader = source.getReader();

    async function read() {
      const { done, value } = await reader.read();
      if (done) return;

      setSlides(prev => [...prev, value]);
      read();
    }

    read();
  }, [source]);

  return slides.map((slide, i) => <SlidePreview key={i} data={slide} />);
}

Clean, right? This is literally what the Next.js docs recommend. We shipped it to staging, tested with our team, and it felt amazing. Slides appeared one by one, smoothly fading in. We thought we were done.

Then we shipped to production.

The First Crisis: Memory Doesn't Lie

Within 24 hours, our monitoring started screaming. Node.js server memory was spiking to 2GB+ during generation. For context, our old batch mode never went above 600MB.

The culprit? Our streaming architecture was keeping the entire ReadableStream in memory on the server until the client finished consuming it.

Here's what was actually happening:

// What we thought was happening:
// 1. Generate slide 1 → Send to client → Free memory
// 2. Generate slide 2 → Send to client → Free memory

// What was ACTUALLY happening:
// 1. Generate slide 1 → Buffer in Node.js stream
// 2. Generate slide 2 → Buffer in Node.js stream
// 3. Generate slide 3 → Buffer in Node.js stream
// ...
// 10. Client finally starts reading → All 10 slides in memory

Figure 2: Memory spiked dramatically with naive streaming implementation

The problem was backpressure. When the client is slow to consume the stream (maybe the user's on a slow connection, or their browser is doing heavy layout work), the server buffers everything.

The Fix: Proper Backpressure Handling

We rewrote the streaming layer with explicit backpressure control:

export async function* generatePresentationWithBackpressure(prompt: string) {
  const llmStream = await anthropic.messages.stream({
    model: "claude-3-5-sonnet-20240620",
    messages: [{ role: "user", content: prompt }],
    max_tokens: 4096,
  });

  let buffer = "";
  let slideCount = 0;

  for await (const chunk of llmStream) {
    buffer += chunk.delta?.text || "";

    // Only yield when we have a complete slide
    if (buffer.includes("---SLIDE_BREAK---")) {
      const [slideContent, rest] = buffer.split("---SLIDE_BREAK---", 2);
      buffer = rest;

      // This blocks until the client consumes the previous slide
      yield { type: "slide", data: parseSlide(slideContent), index: slideCount++ };
    }

    // Hard limit: if buffer exceeds 50KB, force flush
    if (buffer.length > 50000) {
      yield { type: "slide", data: parseSlide(buffer), index: slideCount++, partial: true };
      buffer = "";
    }
  }

  // Flush remaining
  if (buffer.trim()) {
    yield { type: "slide", data: parseSlide(buffer), index: slideCount };
  }
}

The key insight: Don't just forward tokens. Chunk them into semantic units and yield only when the client is ready to receive.

After this change, server memory stabilized at ~400MB even under heavy load.

The Second Crisis: Suspense Waterfalls Are Real

With memory under control, we hit our next problem: layout thrashing.

Users reported that slides would "pop in" with visible jank. We pulled up Chrome DevTools and saw this:

Timeline:
0ms     - Suspense boundary resolves for slide 1
150ms   - Layout recalc (entire page)
300ms   - Suspense boundary resolves for slide 2
450ms   - Layout recalc (entire page)
600ms   - Suspense boundary resolves for slide 3
750ms   - Layout recalc (entire page)

Every time a Suspense boundary resolved, React was recalculating the layout for the entire presentation, not just the new slide.

The problem? We had wrapped each slide in its own Suspense boundary:

// ❌ BAD: Each slide triggers full-page layout
{slides.map((slide, i) => (
  <Suspense key={i} fallback={<SlideSkeleton />}>
    <Slide data={slide} />
  </Suspense>
))}

The Fix: Grouped Suspense Boundaries

We restructured to group slides into "pages" of 3:

// ✅ GOOD: Group slides into pages
{chunks(slides, 3).map((slideGroup, pageIndex) => (
  <Suspense key={pageIndex} fallback={<SkeletonGroup count={3} />}>
    {slideGroup.map((slide, i) => (
      <Slide key={slide.id} data={slide} />
    ))}
  </Suspense>
))}

This reduced layout recalcs by 70%. Users could now see 3 slides appear at once with a single smooth transition.

The Third Crisis: Animations Make Everything Worse

With layout stable, we added polish: Framer Motion animations for slide transitions.

<motion.div
  initial={{ opacity: 0, y: 20 }}
  animate={{ opacity: 1, y: 0 }}
  transition={{ duration: 0.4 }}
>
  <Slide data={slide} />
</motion.div>

And our carefully optimized streaming architecture... started dropping frames.

The culprit? Framer Motion's layout animations conflict with Suspense boundaries. When a Suspense boundary resolves and hydrates new content, Framer Motion tries to measure the old DOM before it's removed. This causes a brief moment where both the skeleton and the real content exist in the DOM simultaneously.

Figure 3: Despite jank, user engagement was still dramatically higher with streaming

The Fix: Manual Animation Control

We ditched automatic layout animations and manually controlled transitions:

function SlideTransition({ slide, isLoading }: Props) {
  const controls = useAnimation();

  useEffect(() => {
    if (!isLoading) {
      // Wait for Suspense to fully commit before animating
      requestAnimationFrame(() => {
        controls.start({ opacity: 1, y: 0 });
      });
    }
  }, [isLoading]);

  return (
    <motion.div
      initial={{ opacity: 0, y: 20 }}
      animate={controls}
      transition={{ duration: 0.4, ease: "easeOut" }}
    >
      {!isLoading && <Slide data={slide} />}
    </motion.div>
  );
}

Buttery smooth. Finally.

The Fourth Crisis: Error Recovery Is Impossible

Everything was working great... until it wasn't.

A user reported that their generation "froze" at slide 6 of 10. No error message. No retry button. Just a loading spinner that never finished.

The problem? Our streaming architecture had no way to recover from partial failures.

In batch mode, if the AI API timed out or returned invalid JSON, we could catch the error, show a message, and let the user retry. But with streaming:

// What happens when the stream breaks mid-generation?
for await (const chunk of llmStream) {
  // What if this throws?
  yield parseSlide(chunk);
}

If parseSlide() throws on slide 6, the stream just... stops. The client has no idea what happened. It's still waiting for more data that will never come.

Figure 4: Streaming architecture enabled graceful degradation we couldn't achieve with batch mode

The Fix: Error Boundaries in the Stream

We added error events to the stream protocol:

export async function* generatePresentationWithErrors(prompt: string) {
  try {
    for await (const chunk of llmStream) {
      try {
        const slide = parseSlide(chunk);
        yield { type: "slide", data: slide };
      } catch (parseError) {
        // Send error but keep streaming
        yield { 
          type: "error", 
          message: "Failed to parse slide", 
          recoverable: true,
          data: chunk // Send raw data so client can retry
        };
      }
    }
  } catch (streamError) {
    // Fatal error: stop stream but send final error event
    yield { 
      type: "error", 
      message: streamError.message, 
      recoverable: false 
    };
  }
}

On the client:

for await (const event of stream) {
  if (event.type === "error") {
    if (event.recoverable) {
      // Show warning, keep showing slides we have
      showToast(`Warning: ${event.message}`);
    } else {
      // Show error UI with retry
      setError(event.message);
      break;
    }
  } else {
    setSlides(prev => [...prev, event.data]);
  }
}

Now when something breaks, users see exactly what happened and can retry just the failed part.

The Fifth Crisis: React Server Components Are Not Free

By this point, our streaming architecture was rock-solid. But our AWS bill wasn't.

We were running Next.js on ECS Fargate, and our compute costs had tripled since switching to streaming. The problem?

React Server Components run on every request. Unlike static site generation (SSG) or incremental static regeneration (ISR), RSC pages can't be cached at the CDN level. Every user hitting our generation endpoint spins up a new server-side render.

For batch mode, we could cache the final HTML for 5 minutes and serve it from CloudFront. For streaming, every user needs a live connection to the server for 30 seconds.

Figure 5: Cost dropped after implementing aggressive caching strategies

The Fix: Aggressive Edge Caching

We implemented a multi-layer caching strategy:

Prompt-based caching: If two users generate from the exact same prompt within 1 hour, serve cached stream
Partial response caching: Cache the first 3 slides aggressively, stream the rest
Edge function optimization: Run initial RSC render on Cloudflare Workers, stream the rest from AWS

// Cloudflare Worker (edge)
export default {
  async fetch(request: Request) {
    const { prompt } = await request.json();
    const cacheKey = `presentation:${hashPrompt(prompt)}`;

    // Check KV cache
    const cached = await KV.get(cacheKey);
    if (cached) {
      return new Response(cached, {
        headers: { 'Content-Type': 'text/event-stream' }
      });
    }

    // Cache miss: proxy to origin
    const response = await fetch(ORIGIN_URL, {
      method: 'POST',
      body: JSON.stringify({ prompt }),
    });

    // Tee the stream: one to client, one to cache
    const [clientStream, cacheStream] = response.body!.tee();

    // Cache in background
    cacheStream.pipeTo(KV.put(cacheKey, { expirationTtl: 3600 }));

    return new Response(clientStream);
  }
};

After this, our compute costs dropped by 60% while maintaining the same user experience.

The Pretext Problem: Text Layout Without the DOM

One technical challenge deserves special mention: calculating text height for slide layout.

When you're generating slides, you need to know: "Will this text fit on the slide? Or do I need to split it?" The traditional approach is to render the text in a hidden div, measure it, then adjust. But this is expensive when you're streaming content in real-time.

Enter Pretext, a library from Cheng Lou (React core team). It calculates text dimensions without touching the DOM by:

Splitting text into segments (words, emoji, CJK characters)
Measuring segments using an off-screen canvas (cheap, done once)
Emulating browser word-wrapping logic to calculate final height

import { prepare, layout } from "pretext";

const prepared = prepare(slideText, {
  fontFamily: "Inter",
  fontSize: 24,
  fontWeight: 400,
});

const dimensions = layout(prepared, { width: 800 });
// Returns: { height: 156, lines: 4 }

This runs in microseconds vs milliseconds for DOM measurement. When you're making layout decisions on every streamed chunk, this difference is critical.

The testing methodology is incredible: the maintainers render the entire text of The Great Gatsby in multiple browsers and verify that estimated measurements match pixel-perfect across Chrome, Firefox, and Safari. They have a corpora/ folder with documents in Thai, Chinese, Korean, Japanese, Arabic — all verified against browser ground truth.

Cheng Lou said they achieved this by "showing Claude and Codex the browser ground truth, and having them measure & iterate against those at every significant container width, running over weeks." This is AI-assisted library development at its best.

What We'd Do Differently

If I could go back and start over, here's what I'd change:

1. Start with Batch Mode

Don't build streaming on day one. Build batch mode first, make it work, then convert to streaming. We spent weeks debugging streaming-specific issues that would have been obvious bugs in batch mode.

2. Profile from Day One

We didn't add proper observability until after our first production incident. Every streaming component should emit metrics: chunk size, processing time, backpressure events, error rates.

// Add this to every streaming generator
for await (const chunk of stream) {
  const start = Date.now();

  try {
    const result = process(chunk);
    metrics.histogram('chunk.processing_ms', Date.now() - start);
    metrics.increment('chunk.success');
    yield result;
  } catch (error) {
    metrics.increment('chunk.error', { error: error.message });
    throw error;
  }
}

3. Test with Slow Networks

We developed on fast office WiFi. Our users are on mobile 4G in airports. We should have tested with Chrome DevTools network throttling from day one.

4. Embrace Progressive Enhancement

Not every part of the UI needs to stream. Title and outline? Stream those. Chart data that requires calculation? Batch that and send it all at once. Don't force streaming where it doesn't make sense.

5. Document Your Streaming Protocol

We changed our stream event schema three times in six weeks. Each time broke production. Define your protocol early, version it, and stick to it:

// v1 Protocol
type StreamEvent = 
  | { type: "outline", data: Outline }
  | { type: "slide", data: Slide, index: number }
  | { type: "chart", data: ChartData, slideIndex: number }
  | { type: "error", message: string, recoverable: boolean }
  | { type: "complete", totalSlides: number };

The Results: Was It Worth It?

After six months of development and countless production incidents, here's what we achieved:

Figure 6: Developer experience metrics improved dramatically after stabilization

Metric	Batch Mode	Streaming Mode	Change
Time to First Slide	28.3s	1.2s	23.6x faster
User Engagement During Gen	12%	78%	6.5x higher
Perceived Wait Time	28.3s	8.4s	3.4x faster
Server Memory	600MB	400MB	33% lower
Monthly Compute Cost	$18.2K	$7.2K	60% cheaper
Error Recovery Rate	0%	88%	∞ better

So yes. It was worth it.

Architectural Lessons for Building Streaming AI Products

Here's what we learned that applies beyond presentations:

1. Streaming UX ≠ Streaming Architecture

Just because content appears progressively doesn't mean your backend is streaming. You can fake it with clever client-side rendering. Start with the illusion, then build the real thing.

2. Backpressure Is Not Optional

If you're building a streaming system, you must handle backpressure. Otherwise, you'll buffer everything in memory and crash under load.

3. Error Boundaries Are Your Friend

In batch mode, a single error kills the entire request. In streaming mode, you can catch errors and keep streaming. This is a superpower if you use it right.

4. Progressive Enhancement Still Matters

Not everyone has a fast connection. Not every browser supports streaming fetch. Build fallbacks. Test them. Don't assume.

5. Observability Is 10x Harder

In batch mode, you have one request → one response. In streaming mode, you have one connection with hundreds of events. Your monitoring needs to adapt.

Open Source Components

We're open-sourcing parts of our streaming infrastructure:

@pitchshow/react-pptx — React components that compile to PPTX (launching June 2026)
@pitchshow/streaming-utils — Backpressure-aware streaming primitives
Example MCP servers — Connect to Notion, Figma, Google Sheets for AI presentation generation

The repo will be live at github.com/pitchshow/streaming-engine by mid-June 2026.

What's Next for PitchShow

We're not done improving. Here's what's coming:

Real-Time Collaboration with Yjs

Using Yjs + our streaming architecture, we're building multiplayer presentation editing. Multiple users (and AI agents) working on the same deck simultaneously. Think Figma, but for slides.

Voice-Driven Editing

Integration with Whisper (via MCP server) to let users say "Make slide 3 more technical" and watch changes stream in real-time.

Export to Video

Using Remotion (React-based video rendering), we're adding one-click "export to video presentation" with AI-generated voice-over via ElevenLabs.

Try It Yourself

PitchShow is live at pitchshow.ai. The free tier includes:

Unlimited presentations (with watermark)
Full streaming UI experience
PPTX export
Basic MCP integrations

If you're a developer interested in the streaming architecture or open source components, join our Discord or check out the GitHub repo when it launches.

Final Thoughts

Building streaming architecture in 2026 is still hard. React Server Components are powerful but unforgiving. Suspense boundaries are elegant until they aren't. Backpressure is invisible until it crushes your server.

But here's what I know now that I didn't know six months ago: users don't want to wait for AI. They want to work with it.

Batch mode makes users feel like they're waiting for a machine. Streaming mode makes them feel like they're collaborating with a machine. That psychological shift is worth every bug we fixed, every incident we debugged, every Saturday we spent staring at flame graphs.

If you take one thing from this article, let it be this: The future of AI UX is not faster models. It's better architectures.

React Server Components + streaming is one piece of that puzzle. There will be others. But right now, this is the best foundation we've found for building AI products that feel instant, even when they're not.

By Mochi Perez | Product Manager, PitchShow | pitchshow.ai

Questions? Feedback? Find me on Twitter @mochibuilds or join our Discord community.

Special thanks to the React core team, Cheng Lou for Pretext, and the Next.js team for making Server Components production-ready.

DEV Community