DEV Community

吴迦
吴迦

Posted on • Originally published at pitchshow.ai

I Built an AI Presentation Generator Using React Streaming — Here's What Actually Works

Look, I'm going to be honest with you. When we started building PitchShow, I thought AI presentation generation would be the easy part. I mean, how hard could it be? You feed some text to an LLM, it gives you back JSON, you render it as slides. Done.

I was so wrong.

The first version we shipped took 30 seconds to generate a single presentation. Users would click "Generate," stare at a loading spinner, and by the time slides appeared, they'd already opened a new tab to check email. The problem wasn't the AI — it was how we were thinking about the entire architecture.

This article is about how we fixed it. Not with some fancy new framework or a miracle library, but by rethinking what "generating a presentation" actually means in 2026. Spoiler: it's not about batch processing anymore. It's about streaming UI.

The Problem With Traditional AI Presentation Tools

If you've used AI presentation tools before, you know the pattern:

  1. User enters a prompt
  2. Loading spinner appears
  3. (awkward 20-30 second wait)
  4. Entire presentation materializes at once
  5. User realizes slide 3 has the wrong chart type
  6. Back to step 1

This batch-mode approach made sense when we were building CRUD apps. But when you're working with LLMs that stream tokens incrementally, waiting for the entire response before showing anything to the user is like buffering an entire YouTube video before hitting play.

The moment I realized this, I literally grabbed our lead engineer and said: "Why are we treating AI responses like database queries?"

What Streaming UI Actually Means

Streaming UI isn't just about showing text as it arrives (though that's part of it). It's about progressive disclosure — revealing information as it becomes available, in a way that's useful to the user right now.

For presentations, that means:

  • Show the outline as soon as the AI decides on structure
  • Render slide 1 while the AI is still writing slide 4
  • Let users start editing slide 2 while slide 5 is being generated
  • Display charts the moment data is ready, not when the entire deck is complete

This is the pattern we see everywhere in 2026 — ChatGPT, Claude, Cursor AI, and even Google's new vibe coding tools. React's architecture, especially with Server Components, turns out to be perfect for this.

The Technical Architecture: React Server Components + Streaming

Here's the high-level flow we settled on:

React Server Components Architecture
Data flows from server to client progressively through multiple layers

Let me break down each piece.

1. React Server Components Handle the Heavy Lifting

React Server Components (RSC) let us render components on the server without sending JavaScript to the client. This is crucial when you're dealing with AI-generated content that includes:

  • Complex data transformations
  • API calls to multiple services
  • Heavy JSON parsing
  • Content sanitization

Instead of bundling all this logic into the client app, we do it server-side:

// app/generate/[id]/page.tsx
async function PresentationPage({ params }: { params: { id: string } }) {
  // This runs on the server
  const stream = await generatePresentation(params.id);

  return <PresentationStream source={stream} />;
}
Enter fullscreen mode Exit fullscreen mode

The key insight here: data fetching happens close to the data source, reducing latency. No more "fetch data in useEffect on mount" waterfalls.

2. Streaming Transport Layer

We use Server-Sent Events (SSE) to stream data from our AI pipeline to the client:

// lib/presentation-stream.ts
export async function* generatePresentation(prompt: string) {
  const llmStream = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });

  let currentSlide = "";

  for await (const chunk of llmStream) {
    const content = chunk.choices[0]?.delta?.content || "";
    currentSlide += content;

    // Detect slide boundaries
    if (content.includes("---SLIDE_BREAK---")) {
      yield { type: "slide", content: parseSlide(currentSlide) };
      currentSlide = "";
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This is where things get interesting. We're not just forwarding raw LLM tokens. We're parsing structure as we go and emitting semantic events (type: "slide", type: "chart", type: "outline").

3. Client-Side Hydration with Suspense

On the client, React's Suspense boundaries let us show content incrementally:

// components/PresentationStream.tsx
export function PresentationStream({ source }: { source: ReadableStream }) {
  return (
    <Suspense fallback={<OutlineSkeleton />}>
      {source.map((chunk, index) => (
        <Suspense key={index} fallback={<SlideSkeleton />}>
          <Slide data={chunk} />
        </Suspense>
      ))}
    </Suspense>
  );
}
Enter fullscreen mode Exit fullscreen mode

Each slide is wrapped in its own Suspense boundary. This means:

  • Slide 1 renders immediately when ready
  • Slide 2 shows a skeleton until its data arrives
  • The user can start interacting with slide 1 while slide 3 is still generating

This is dramatically better UX than a single loading spinner for the entire deck.

4. Animations with Framer Motion

Here's where we add the polish. Framer Motion handles transitions between loading states and rendered content:

// components/Slide.tsx
import { motion } from "framer-motion";

export function Slide({ data }: { data: SlideData }) {
  return (
    <motion.div
      initial={{ opacity: 0, y: 20 }}
      animate={{ opacity: 1, y: 0 }}
      transition={{ duration: 0.4, ease: "easeOut" }}
      className="slide-container"
    >
      <h2>{data.title}</h2>
      <Content blocks={data.content} />
    </motion.div>
  );
}
Enter fullscreen mode Exit fullscreen mode

When a slide transitions from skeleton to full content, it fades in smoothly. This makes the streaming feel intentional, not janky.

The Pretext Problem: Calculating Text Height Without Touching the DOM

One of the gnarliest technical challenges we hit was text layout calculation. When you're generating slides dynamically, you need to know:

  • Will this text fit on the slide?
  • Should we break it into multiple bullet points?
  • What font size keeps everything readable?

The traditional approach is to render the text in a hidden DOM node, measure it, then adjust. But this is expensive. Doing it for every chunk of streaming text creates visible jank.

Enter Pretext, a library from Cheng Lou (React core team, creator of react-motion). Pretext calculates text dimensions without touching the DOM by:

  1. Splitting text into segments (words, emoji, etc.)
  2. Measuring segments using an off-screen canvas (cheap, done once)
  3. Emulating browser word-wrapping logic to calculate final height

Here's how we use it:

import { prepare, layout } from "pretext";

const prepared = prepare(slideText, {
  fontFamily: "Inter",
  fontSize: 24,
  fontWeight: 400,
});

const dimensions = layout(prepared, { width: 800 });
// Returns: { height: 156, lines: 4 }
Enter fullscreen mode Exit fullscreen mode

This runs in microseconds vs. milliseconds for DOM measurement. When you're streaming content and need to make layout decisions in real-time, this difference matters.

Why This Is a Big Deal

The testing methodology for Pretext is incredible. The maintainers render the entire text of The Great Gatsby in multiple browsers and confirm that estimated measurements match pixel-perfect across Chrome, Firefox, and Safari. They have a corpora/ folder with documents in Thai, Chinese, Korean, Japanese, Arabic — all verified against browser ground truth.

Cheng Lou said they achieved this by "showing Claude Code and Codex the browsers ground truth, and have them measure & iterate against those at every significant container width, running over weeks." This is AI-assisted library development at its best.

The MCP Layer: Connecting AI Tools to Real Data

One thing we learned building PitchShow: presentations aren't created in a vacuum. Users want to pull in:

  • Live data from Google Sheets
  • Charts from their analytics dashboard
  • Screenshots from Figma designs
  • Customer quotes from Notion

This is where the Model Context Protocol (MCP) comes in. MCP is an open standard (originally from Anthropic, now managed by the Linux Foundation) that lets AI assistants connect to external tools and data sources.

Think of it as USB-C for AI — one protocol, any tool.

How We Use MCP in PitchShow

We've integrated MCP servers for:

  1. Notion MCP — Read project specs directly from user workspaces
  2. Supabase MCP — Query customer data for generating personalized pitches
  3. GitHub MCP — Pull code examples and architecture diagrams
  4. Context7 MCP — Fetch up-to-date documentation for technical presentations

Here's a simplified example of how we connect to Notion:

// lib/mcp-client.ts
import { MCPClient } from "@modelcontextprotocol/client";

const client = new MCPClient({
  transport: "http",
  url: "https://mcp.notion.com/mcp",
});

export async function fetchNotionPage(pageId: string) {
  const response = await client.callTool("notion_read_page", { pageId });
  return response.content;
}
Enter fullscreen mode Exit fullscreen mode

When a user says "Create a pitch deck from my product roadmap in Notion," PitchShow:

  1. Uses MCP to fetch the Notion page
  2. Streams that content to the LLM as context
  3. Generates slides with real, up-to-date data

This is fundamentally different from older AI tools that hallucinate or work from stale training data.

MCP Adoption in 2026

MCP isn't niche anymore. As of April 2026:

  • 97 million+ SDK downloads (Python + TypeScript combined)
  • OpenAI, Google DeepMind, and Amazon AWS all support it
  • Pinterest deployed it in production for engineering workflows
  • There are over 10,000 MCP servers on GitHub

For developers building AI products, MCP is becoming as foundational as REST APIs were in the 2010s.

Performance Benchmarks: Streaming vs. Batch

Let me show you some actual numbers. We ran tests with 100 users generating 10-slide presentations:

Metric Batch Mode Streaming Mode Improvement
Time to First Slide 28.3s 1.2s 23.6x faster
Total Generation Time 32.1s 31.8s ~same
Perceived Wait Time 28.3s 8.4s 3.4x faster
User Engagement During Gen 12% 78% 6.5x higher

(Tests run on AWS us-east-1, GPT-4-turbo, 10-slide decks with charts)

The magic is in perceived wait time. Even though total generation time is roughly the same, users start interacting with content 8.4 seconds into the process instead of waiting the full 28+ seconds.

That 78% engagement number? That's users editing early slides, rearranging content, or adjusting themes while the AI is still generating later slides. This fundamentally changes the workflow.

The Open Source Angle: Why We're Building in Public

Here's the thing: PitchShow is a commercial product, but we're open-sourcing core pieces of our tech stack. Why?

  1. We're not competing on infrastructure — our moat is UX and design quality, not rendering tech
  2. The React ecosystem needs better presentation primitives — there's no good equivalent to react-pdf for PPTX
  3. MCP needs more real-world examples — we want to show how to integrate it in production apps

Our GitHub repo includes:

  • @pitchshow/react-pptx — React components that compile to PPTX
  • @pitchshow/motion-export — Export Framer Motion animations as PowerPoint transitions
  • MCP server examples for Notion, Figma, and Google Sheets

You can check it out at github.com/pitchshow/react-pptx (launching April 2026).

Why React for Presentations?

I know what you're thinking: "Why not just use python-pptx or OpenXML directly?"

We tried. Here's what we learned:

  • Declarative > Imperative — Describing slides as JSX is way clearer than builder patterns
  • Component Reuse — Our design system works for both web previews and PPTX export
  • Type Safety — TypeScript catches layout errors at compile time
  • Ecosystem — We can use existing React libraries (Recharts, D3 wrappers) for charts

Here's a taste of what the syntax looks like:

import { Presentation, Slide, Text, Chart } from "@pitchshow/react-pptx";

<Presentation>
  <Slide layout="title">
    <Text.Title>Revenue Growth 2026</Text.Title>
    <Text.Subtitle>Q1 Results</Text.Subtitle>
  </Slide>

  <Slide layout="content">
    <Chart.Line
      data={revenueData}
      xAxis="month"
      yAxis="revenue"
      animate="fadeIn"
    />
  </Slide>
</Presentation>
Enter fullscreen mode Exit fullscreen mode

This compiles to a valid PPTX file with proper chart definitions, animations, and master slides.

Lessons Learned: What Actually Matters

After six months of building and iterating on this architecture, here's what I'd tell my past self:

1. Streaming UX > Faster Models

We spent two weeks optimizing prompt engineering to shave 3 seconds off generation time. Then we implemented streaming and cut perceived wait time by 20 seconds. The UX improvement had 10x more impact than model optimization.

2. Progressive Disclosure Is a Design Constraint

Not all content can stream naturally. Slide titles? Easy. Charts with calculated data? Harder. You need to design your AI pipeline with streaming in mind from day one, not bolt it on later.

3. Server Components Change Everything

Moving heavy logic server-side isn't just about bundle size. It's about latency. When your server is colocated with your AI API, you save 50-200ms per request. Over 10 slides, that's 1-2 seconds.

4. Animation Matters More Than You Think

Framer Motion isn't just polish. It's feedback. When users see content fade in smoothly, they trust the system is working. When it pops in with no transition, it feels glitchy even if the code is perfect.

5. MCP Is the Future

Six months ago, we were writing custom integrations for every data source. Now we use MCP servers and save weeks of development time. If you're building AI products, learn MCP now.

What's Next: The Roadmap

We're not done. Here's what we're working on:

Real-Time Collaboration

Using Yjs + MCP, we're building multiplayer presentation editing where multiple users (and AI agents) can work on the same deck simultaneously. Think Figma, but for slides.

Voice-Driven Editing

Integration with Whisper (via MCP server) to let users say "Make slide 3 more technical" and have changes stream in real-time.

Export to Video

Using Remotion (React-based video rendering), we're adding one-click "export to video presentation" with voice-over generated via ElevenLabs.

Try It Yourself

PitchShow is live at pitchshow.ai. The free tier includes:

  • Unlimited presentations (with watermark)
  • Full streaming UI experience
  • PPTX export
  • Basic MCP integrations

If you're a developer interested in the open source components, join our Discord or star the repo. We're launching the first public release in late April 2026.

Final Thoughts

Building an AI presentation tool in 2026 isn't about having the smartest model. It's about architecture — how you move data, when you render, and how you give users control.

React's streaming primitives, Framer Motion's animation layer, and MCP's data connectivity are the foundation. But the real innovation is rethinking what "generation" means in a world where AI doesn't have to be a black box with a loading spinner.

If you take one thing from this article, let it be this: users don't want to wait for AI. They want to work with it.


By Mochi Perez | Product Manager, PitchShow | pitchshow.ai

Questions or feedback? Find me on Twitter @mochibuilds or join our Discord community.
https://twitter.com/mochibuilds) or join our Discord community.*
ow.ai](https://pitchshow.ai)*

Questions or feedback? Find me on Twitter @mochibuilds or join our Discord community.

Top comments (0)