Skip to content

DEV Community

Bibhu Pradhan

Posted on Mar 1

AI Pitch Deck Generator: A multimodal AI agent that generates complete startup pitch decks

#devchallenge #geminireflections #gemini #googlecloud

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

Founders and entrepreneurs often spend countless hours agonizing over the formatting, narrative structure, and visual design of their pitch decks instead of focusing on building their actual product.

I built the AI Pitch Deck Generator to remove this friction entirely. It is a powerful, multimodal web application that takes a simple startup idea and transforms it into a comprehensive, cohesive, and investor-ready pitch package in under a minute.

Google Gemini's Role:
Google's Generative AI ecosystem is the core engine of this project. The application utilizes a multi-agent architecture powered by the new google-genai SDK:

Gemini 2.0 Flash (gemini-2.0-flash): Acts as the master orchestrator. It processes the user's idea and generates a highly structured JSON response containing the full narrative (8 slides, speaker notes, social media captions), specifications for data charts, and detailed prompts for the image and video models.
Imagen 3 (imagen-3.0-generate-002): Consumes the prompts written by Gemini to generate high-quality, photorealistic product mockups and thematic scene visuals.
Veo 2.0 (veo-2.0-generate-001): Creates a dynamic, 5-second cinematic promotional video clip for the startup based on Gemini's prompt.

The backend (FastAPI) then programmatically renders premium charts using matplotlib and assembles everything into a downloadable PowerPoint (.pptx) file.

Demo

bibhupradhanofficial / AI-Pitch-Deck-Generator

A multimodal AI agent that generates complete startup pitch decks including slides, charts, product mockup images, voiceover scripts, promo video clips, and social media captions from a single text prompt.

AI Pitch Deck Generator

Project Overview

AI Pitch Deck Generator is a powerful tool that leverages Google's Generative AI to automatically create and assemble pitch decks. It handles everything from drafting content to generating visual assets and charts, providing a seamless generation experience with real-time streaming feedback to the user.

Architecture

+------+      +----------+      +-----------------------+      +--------------+      +-----------------------+
|      |      |          |      |                       |      |              |      |                       |
| User | ---> | Frontend | ---> | FastAPI / Cloud Run   | ---> | Gemini Agent | ---> | [Imagen, Veo, Charts] |
|      |      |          |      |                       |      |              |      |                       |
+------+      +----------+      +-----------------------+      +--------------+      +-----------------------+
   ^                                                                                             |
   |                                                                                             |
   |                                                                                             v
   |                                                                                         +-------+
   +--------------------------------------- Response stream -------------------------------- |  GCS  |
                                                                                             +-------+

Prerequisites

Before you begin, ensure you have the following requirements met:

Python: 3.11 or higher
GCP Account: A Google Cloud project with an active billing account
Google Cloud…

What I Learned

Building this application pushed me to learn a lot about orchestrating complex AI workflows and building reactive user interfaces:

Real-Time Streaming (SSE): Because generating images, videos, and complex charts takes time, I learned how to implement Server-Sent Events (SSE) using FastAPI. This allowed the backend to stream text, status updates, and individual assets to the vanilla JavaScript frontend as soon as they were ready, creating a magical, progressively revealing UI instead of a boring loading spinner.
Agentic Orchestration: I learned advanced techniques in prompt engineering to force Gemini to output strict, complex JSON structures reliably. Getting the model to act as a "director" that writes prompts for other models (Imagen and Veo) was a fascinating exercise in AI-to-AI communication.
Programmatic Asset Generation: I deepened my Python skills by using python-pptx to dynamically calculate layouts and build native PowerPoint files, and configuring matplotlib to render beautiful, premium dark-themed data visualizations.

Google Gemini Feedback

What worked well:

The new google-genai SDK is incredibly clean and intuitive. Being able to access text, image, and video generation models from a single unified client made the backend architecture much simpler.
Gemini 2.0 Flash is phenomenal. Its speed and ability to consistently adhere to a complex JSON schema (containing arrays of slides, chart data, and nested dictionaries) made it the perfect orchestration agent.

Where I ran into friction:

Video Generation Polling: Integrating Veo 2.0 required handling long-running operations. Since video generation isn't instant, I had to implement an asynchronous polling mechanism to check the operation status (client.operations.get(operation)) and eventually extract the video bytes. Figuring out how to do this smoothly without blocking the FastAPI event loop took some trial and error.
Cross-Model Prompting: Getting Gemini to write good prompts for Imagen was sometimes tricky. I had to inject strict system instructions and formatting rules (like appending specific style keywords) to ensure the generated images matched the overall dark-mode aesthetic of the application.

Challenges we ran into

Multimodal Orchestration: Coordinating asynchronous calls to three different AI models (Gemini, Imagen, and Veo) while ensuring the narrative, visual aesthetics, and generated data remained cohesive was complex.
Structured Output Formatting: Ensuring that the LLM consistently returned highly structured, valid JSON containing slide data, exact chart configurations, and specific image/video prompts required meticulous prompt engineering and fallback handling.
Real-Time User Experience: Generating heavy media assets like videos and images takes time. Keeping the user engaged required implementing an SSE (Server-Sent Events) pipeline to stream text, status updates, and individual assets to the frontend as soon as they were ready, rather than forcing the user to wait at a blank loading screen.
Programmatic PPTX Generation: Calculating layouts, scaling images, and ensuring the programmatically generated PowerPoint file looked professional and properly aligned required extensive fine-tuning using python-pptx.
Google Cloud Billing Requirements: We faced a significant roadblock when trying to enable the Google Cloud Storage (Buckets) service. The platform requires active billing information to be set up before allowing the service to be enabled.

Top comments (0)

Subscribe