This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built TubeForge, a multimodal applet that transforms any YouTube video into structured, interactive content.
With just a URL, the app automatically generates either:
Study Notes → Professionally formatted, textbook-style markdown notes with hierarchical headings, flowcharts, diagrams, tables, and highlighted key terms. It also allows learners to generate quizzes for self-testing.
Blog Posts → Polished, ready-to-publish articles with a strong hook, clear structure, and SEO-friendly formatting. The app automatically generates cover images, inline visuals, and shareable social media posts (LinkedIn + Twitter) to promote the blog.
This solves the challenge of turning raw video content into usable knowledge products. Whether for studying, content creation, or social media amplification.
Demo
Link To Demo
Watch Tubeforge Demo on YouTube
How I Used Google AI Studio
1. 💬 Chat Module First
I started by explaining my entire idea in Google AI Studio’s chat. I described the design, the number of inputs, the type of outputs, the formatting rules, and even what to avoid. The chat module crafted a complete project document with suggestions and API integration details.
2. 🏗️ Build Feature Magic
I fed that document into the build feature, and it spun up a first version of my application — already styled and functional. It was surprisingly close to perfect from the start.
3. 🎨 Tweaks & Styling
All I really changed were some colors and tiny UI details. The content flow was already set up by the chat design, so only minimal adjustments were needed.
4.🛠️ Coding Assistant on Demand
Whenever I wanted a tweak (“move this button,” “adjust this logic”), the coding assistant instantly made the changes. No manual wiring, no headaches.
5. 🚀 Deployment in One Click
Finally, I hit the Deploy to Cloud Run button. Within minutes, my app was live, no Docker setup, no configs. Google AI Studio handled everything.
Multimodal Features
YouTube Forge combines multiple modalities and APIs into one cohesive experience:
- Video Understanding: Gemini 2.5 Flash processes YouTube videos directly and extracts structured knowledge.
- Image Generation: Gemini and Imagen generated prompts create flowcharts, diagrams, and infographics for study notes.
Blog posts get a stunning header image + relevant inline visuals via Imagen.
- Text + Media Fusion: Markdown output with embedded base64 images for seamless visuals in notes and blogs.
- Quiz Generation: Automatic creation of multiple-choice and short-answer quizzes for self-assessment with Gemini 2.5 Flash
- Social Media Posts: Gemini generates ready-to-use LinkedIn and Twitter posts with hashtags and hooks.
By combining video, text, and images, the app creates a richer learning and publishing experience.
✨Building this was surprisingly smooth—the AI Studio assistant + Cloud Run integration made it feel like I had a full dev team working with me.
Top comments (0)