DEV Community

Play Button Pause Button
ANIRUDDHA  ADAK
ANIRUDDHA ADAK Subscriber

Posted on

AI Slide Generator

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

We've all been there: staring at a blank slide, the pressure mounting to create a compelling presentation. The AI Slide Generator is my answer to that universal challenge. It's a smart creative partner designed to transform a single text prompt into a complete, professional, and visually stunning slide deck in minutes.

This tool goes far beyond simple text generation. It acts as a director, designer, and content creator rolled into one. You provide the core idea—"a pitch for sustainable urban farming"—and the AI orchestrates the entire presentation. It crafts a narrative, designs appropriate layouts for each slide, generates striking, context-aware images with Imagen, and even produces short, cinematic video clips using Veo. The result is a rich, multipage presentation, ready to be downloaded, shared, and presented.

Demo

You can try the applet live here: Link of Deployed Applet

Here’s a quick look at the generator in action:

1. The Initial Prompt: Simply describe the presentation you want to create.

2. AI-Generated Slide with Image: The app generates both the slide content and a relevant image based on the context.

3. AI-Generated Video Slide: For key moments, the app can create a full-slide video to capture the audience's attention.

Full Video Walkthrough:

Since video generation with Veo is a powerful but time-intensive feature, here is a complete video showcasing the entire workflow from prompt to final, downloadable presentation.

How I Used Google AI Studio

Google AI Studio was the creative engine and command center for this entire project. My goal was to build a seamless pipeline where different AI models collaborate to build the final presentation, and AI Studio was indispensable for this.

I leveraged a suite of models to handle the distinct multimodal tasks:

  • gemini-2.5-flash: This model served as the "brain" of the operation. I used Google AI Studio to meticulously craft a system prompt and a detailed JSON schema. This instructed the model to not just write content, but to think like a presentation designer—determining the layout, title, bullet points, and prompts for the visual elements for each slide.

  • imagen-4.0-generate-001: To bring the slides to life, I integrated this model to handle all image generation. It takes the text prompts generated by gemini-2.5-flash and creates beautiful, high-quality images that match the slide's topic and tone.

  • veo-2.0-generate-001: For maximum impact, I used this model to introduce motion. It generates short, dynamic video clips from text prompts, perfect for title slides or key message slides that need to be more engaging.

  • gemini-2.5-flash-image-preview (Nano Banana): To give users creative control, I used this model for the image editing feature. This allows for an intuitive, conversational way to tweak visuals (e.g., "add a sun in the background"), demonstrating a powerful image + text to image workflow.

Multimodal Features

The core of this applet is its ability to understand a single text request and expand it into a rich, multimodal experience.

  1. From Text to a Complete Visual Story: The primary multimodal feature is the transformation of a text prompt into a complete presentation containing structured text, generated images, and generated videos. The user provides one type of input, and the AI seamlessly weaves together multiple output types to tell a cohesive story.

  2. Context-Aware Image Generation: Instead of forcing users to search for stock photos, the app generates them on the fly. Because the image prompts are derived from the slide's content, the visuals are always contextually relevant, which dramatically enhances the narrative and the overall quality of the presentation.

  3. Interactive Image Editing with Nano Banana: This is where the multimodality becomes a two-way conversation. A user can select a generated image and provide a text prompt to modify it, receiving a new image in return. This closes the creative loop, making the AI not just a generator but a collaborative partner. It’s an incredibly intuitive way to fine-tune the visuals without needing complex editing software.

  4. Cinematic Video Integration: By incorporating video, the app elevates a standard slideshow into a more dynamic and engaging medium. The ability to generate short video clips from a simple text description adds a layer of professionalism and emotional impact that's difficult to achieve with static slides alone.

Great to participate!

Top comments (0)