Look, I'm going to be honest with you. When we started building PitchShow, I thought AI presentation generation would be the easy part. I mean, how hard could it be? You feed some text to an LLM, it gives you back JSON, you render it as slides. Done.
I was so wrong.
The first version we shipped took 30 seconds to generate a single presentation. Users would click "Generate," stare at a loading spinner, and by the time slides appeared, they'd already opened a new tab to check email. The problem wasn't the AI — it was how we were thinking about the entire architecture.
This article is about how we fixed it. Not with some fancy new framework or a miracle library, but by rethinking what "generating a presentation" actually means in 2026. Spoiler: it's not about batch processing anymore. It's about streaming UI.
The Problem With Traditional AI Presentation Tools
If you've used AI presentation tools before, you know the pattern:
- User enters a prompt
- Loading spinner appears
- (awkward 20-30 second wait)
- Entire presentation materializes at once
- User realizes slide 3 has the wrong chart type
- Back to step 1
This batch-mode approach made sense when we were building CRUD apps. But when you're working with LLMs that stream tokens incrementally, waiting for the entire response before showing anything to the user is like buffering an entire YouTube video before hitting play.
The moment I realized this, I literally grabbed our lead engineer and said: "Why are we treating AI responses like database queries?"
What Streaming UI Actually Means
Streaming UI isn't just about showing text as it arrives (though that's part of it). It's about progressive disclosure — revealing information as it becomes available, in a way that's useful to the user right now.
For presentations, that means:
- Show the outline as soon as the AI decides on structure
- Render slide 1 while the AI is still writing slide 4
- Let users start editing slide 2 while slide 5 is being generated
- Display charts the moment data is ready, not when the entire deck is complete
This is the pattern we see everywhere in 2026 — ChatGPT, Claude, Cursor AI, and even Google's new vibe coding tools. React's architecture, especially with Server Components, turns out to be perfect for this.
The Technical Architecture: React Server Components + Streaming
Here's the high-level flow we settled on:

Data flows from server to client progressively through multiple layers
Let me break down each piece.
1. React Server Components Handle the Heavy Lifting
React Server Components (RSC) let us render components on the server without sending JavaScript to the client. This is crucial when you're dealing with AI-generated content that includes:
- Complex data transformations
- API calls to multiple services
- Heavy JSON parsing
- Content sanitization
Instead of bundling all this logic into the client app, we do it server-side:
// app/generate/[id]/page.tsx
async function PresentationPage({ params }: { params: { id: string } }) {
// This runs on the server
const stream = await generatePresentation(params.id);
return <PresentationStream source={stream} />;
}
The key insight here: data fetching happens close to the data source, reducing latency. No more "fetch data in useEffect on mount" waterfalls.
2. Streaming Transport Layer
We use Server-Sent Events (SSE) to stream data from our AI pipeline to the client:
// lib/presentation-stream.ts
export async function* generatePresentation(prompt: string) {
const llmStream = await openai.chat.completions.create({
model: "gpt-4-turbo",
messages: [{ role: "user", content: prompt }],
stream: true,
});
let currentSlide = "";
for await (const chunk of llmStream) {
const content = chunk.choices[0]?.delta?.content || "";
currentSlide += content;
// Detect slide boundaries
if (content.includes("---SLIDE_BREAK---")) {
yield { type: "slide", content: parseSlide(currentSlide) };
currentSlide = "";
}
}
}
This is where things get interesting. We're not just forwarding raw LLM tokens. We're parsing structure as we go and emitting semantic events (type: "slide", type: "chart", type: "outline").
3. Client-Side Hydration with Suspense
On the client, React's Suspense boundaries let us show content incrementally:
// components/PresentationStream.tsx
export function PresentationStream({ source }: { source: ReadableStream }) {
return (
<Suspense fallback={<OutlineSkeleton />}>
{source.map((chunk, index) => (
<Suspense key={index} fallback={<SlideSkeleton />}>
<Slide data={chunk} />
</Suspense>
))}
</Suspense>
);
}
Each slide is wrapped in its own Suspense boundary. This means:
- Slide 1 renders immediately when ready
- Slide 2 shows a skeleton until its data arrives
- The user can start interacting with slide 1 while slide 3 is still generating
This is dramatically better UX than a single loading spinner for the entire deck.
4. Animations with Framer Motion
Here's where we add the polish. Framer Motion handles transitions between loading states and rendered content:
// components/Slide.tsx
import { motion } from "framer-motion";
export function Slide({ data }: { data: SlideData }) {
return (
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.4, ease: "easeOut" }}
className="slide-container"
>
<h2>{data.title}</h2>
<Content blocks={data.content} />
</motion.div>
);
}
When a slide transitions from skeleton to full content, it fades in smoothly. This makes the streaming feel intentional, not janky.
The Pretext Problem: Calculating Text Height Without Touching the DOM
One of the gnarliest technical challenges we hit was text layout calculation. When you're generating slides dynamically, you need to know:
- Will this text fit on the slide?
- Should we break it into multiple bullet points?
- What font size keeps everything readable?
The traditional approach is to render the text in a hidden DOM node, measure it, then adjust. But this is expensive. Doing it for every chunk of streaming text creates visible jank.
Enter Pretext, a library from Cheng Lou (React core team, creator of react-motion). Pretext calculates text dimensions without touching the DOM by:
- Splitting text into segments (words, emoji, etc.)
- Measuring segments using an off-screen canvas (cheap, done once)
- Emulating browser word-wrapping logic to calculate final height
Here's how we use it:
import { prepare, layout } from "pretext";
const prepared = prepare(slideText, {
fontFamily: "Inter",
fontSize: 24,
fontWeight: 400,
});
const dimensions = layout(prepared, { width: 800 });
// Returns: { height: 156, lines: 4 }
This runs in microseconds vs. milliseconds for DOM measurement. When you're streaming content and need to make layout decisions in real-time, this difference matters.
Why This Is a Big Deal
The testing methodology for Pretext is incredible. The maintainers render the entire text of The Great Gatsby in multiple browsers and confirm that estimated measurements match pixel-perfect across Chrome, Firefox, and Safari. They have a corpora/ folder with documents in Thai, Chinese, Korean, Japanese, Arabic — all verified against browser ground truth.
Cheng Lou said they achieved this by "showing Claude Code and Codex the browsers ground truth, and have them measure & iterate against those at every significant container width, running over weeks." This is AI-assisted library development at its best.
The MCP Layer: Connecting AI Tools to Real Data
One thing we learned building PitchShow: presentations aren't created in a vacuum. Users want to pull in:
- Live data from Google Sheets
- Charts from their analytics dashboard
- Screenshots from Figma designs
- Customer quotes from Notion
This is where the Model Context Protocol (MCP) comes in. MCP is an open standard (originally from Anthropic, now managed by the Linux Foundation) that lets AI assistants connect to external tools and data sources.
Think of it as USB-C for AI — one protocol, any tool.
How We Use MCP in PitchShow
We've integrated MCP servers for:
- Notion MCP — Read project specs directly from user workspaces
- Supabase MCP — Query customer data for generating personalized pitches
- GitHub MCP — Pull code examples and architecture diagrams
- Context7 MCP — Fetch up-to-date documentation for technical presentations
Here's a simplified example of how we connect to Notion:
// lib/mcp-client.ts
import { MCPClient } from "@modelcontextprotocol/client";
const client = new MCPClient({
transport: "http",
url: "https://mcp.notion.com/mcp",
});
export async function fetchNotionPage(pageId: string) {
const response = await client.callTool("notion_read_page", { pageId });
return response.content;
}
When a user says "Create a pitch deck from my product roadmap in Notion," PitchShow:
- Uses MCP to fetch the Notion page
- Streams that content to the LLM as context
- Generates slides with real, up-to-date data
This is fundamentally different from older AI tools that hallucinate or work from stale training data.
MCP Adoption in 2026
MCP isn't niche anymore. As of April 2026:
- 97 million+ SDK downloads (Python + TypeScript combined)
- OpenAI, Google DeepMind, and Amazon AWS all support it
- Pinterest deployed it in production for engineering workflows
- There are over 10,000 MCP servers on GitHub
For developers building AI products, MCP is becoming as foundational as REST APIs were in the 2010s.
Performance Benchmarks: Streaming vs. Batch
Let me show you some actual numbers. We ran tests with 100 users generating 10-slide presentations:
| Metric | Batch Mode | Streaming Mode | Improvement |
|---|---|---|---|
| Time to First Slide | 28.3s | 1.2s | 23.6x faster |
| Total Generation Time | 32.1s | 31.8s | ~same |
| Perceived Wait Time | 28.3s | 8.4s | 3.4x faster |
| User Engagement During Gen | 12% | 78% | 6.5x higher |
(Tests run on AWS us-east-1, GPT-4-turbo, 10-slide decks with charts)
The magic is in perceived wait time. Even though total generation time is roughly the same, users start interacting with content 8.4 seconds into the process instead of waiting the full 28+ seconds.
That 78% engagement number? That's users editing early slides, rearranging content, or adjusting themes while the AI is still generating later slides. This fundamentally changes the workflow.
The Open Source Angle: Why We're Building in Public
Here's the thing: PitchShow is a commercial product, but we're open-sourcing core pieces of our tech stack. Why?
- We're not competing on infrastructure — our moat is UX and design quality, not rendering tech
-
The React ecosystem needs better presentation primitives — there's no good equivalent to
react-pdffor PPTX - MCP needs more real-world examples — we want to show how to integrate it in production apps
Our GitHub repo includes:
-
@pitchshow/react-pptx— React components that compile to PPTX -
@pitchshow/motion-export— Export Framer Motion animations as PowerPoint transitions - MCP server examples for Notion, Figma, and Google Sheets
You can check it out at github.com/pitchshow/react-pptx (launching April 2026).
Why React for Presentations?
I know what you're thinking: "Why not just use python-pptx or OpenXML directly?"
We tried. Here's what we learned:
- Declarative > Imperative — Describing slides as JSX is way clearer than builder patterns
- Component Reuse — Our design system works for both web previews and PPTX export
- Type Safety — TypeScript catches layout errors at compile time
- Ecosystem — We can use existing React libraries (Recharts, D3 wrappers) for charts
Here's a taste of what the syntax looks like:
import { Presentation, Slide, Text, Chart } from "@pitchshow/react-pptx";
<Presentation>
<Slide layout="title">
<Text.Title>Revenue Growth 2026</Text.Title>
<Text.Subtitle>Q1 Results</Text.Subtitle>
</Slide>
<Slide layout="content">
<Chart.Line
data={revenueData}
xAxis="month"
yAxis="revenue"
animate="fadeIn"
/>
</Slide>
</Presentation>
This compiles to a valid PPTX file with proper chart definitions, animations, and master slides.
Lessons Learned: What Actually Matters
After six months of building and iterating on this architecture, here's what I'd tell my past self:
1. Streaming UX > Faster Models
We spent two weeks optimizing prompt engineering to shave 3 seconds off generation time. Then we implemented streaming and cut perceived wait time by 20 seconds. The UX improvement had 10x more impact than model optimization.
2. Progressive Disclosure Is a Design Constraint
Not all content can stream naturally. Slide titles? Easy. Charts with calculated data? Harder. You need to design your AI pipeline with streaming in mind from day one, not bolt it on later.
3. Server Components Change Everything
Moving heavy logic server-side isn't just about bundle size. It's about latency. When your server is colocated with your AI API, you save 50-200ms per request. Over 10 slides, that's 1-2 seconds.
4. Animation Matters More Than You Think
Framer Motion isn't just polish. It's feedback. When users see content fade in smoothly, they trust the system is working. When it pops in with no transition, it feels glitchy even if the code is perfect.
5. MCP Is the Future
Six months ago, we were writing custom integrations for every data source. Now we use MCP servers and save weeks of development time. If you're building AI products, learn MCP now.
What's Next: The Roadmap
We're not done. Here's what we're working on:
Real-Time Collaboration
Using Yjs + MCP, we're building multiplayer presentation editing where multiple users (and AI agents) can work on the same deck simultaneously. Think Figma, but for slides.
Voice-Driven Editing
Integration with Whisper (via MCP server) to let users say "Make slide 3 more technical" and have changes stream in real-time.
Export to Video
Using Remotion (React-based video rendering), we're adding one-click "export to video presentation" with voice-over generated via ElevenLabs.
Try It Yourself
PitchShow is live at pitchshow.ai. The free tier includes:
- Unlimited presentations (with watermark)
- Full streaming UI experience
- PPTX export
- Basic MCP integrations
If you're a developer interested in the open source components, join our Discord or star the repo. We're launching the first public release in late April 2026.
Final Thoughts
Building an AI presentation tool in 2026 isn't about having the smartest model. It's about architecture — how you move data, when you render, and how you give users control.
React's streaming primitives, Framer Motion's animation layer, and MCP's data connectivity are the foundation. But the real innovation is rethinking what "generation" means in a world where AI doesn't have to be a black box with a loading spinner.
If you take one thing from this article, let it be this: users don't want to wait for AI. They want to work with it.
By Mochi Perez | Product Manager, PitchShow | pitchshow.ai
Questions or feedback? Find me on Twitter @mochibuilds or join our Discord community.
https://twitter.com/mochibuilds) or join our Discord community.*
ow.ai](https://pitchshow.ai)*
Questions or feedback? Find me on Twitter @mochibuilds or join our Discord community.
Top comments (0)