This is a submission for the Google AI Studio Multimodal Challenge
Social Creative Coach — Multimodal content & planning in one click
What I Built
Social Creative Coach is a small, production-ready app that turns a brief, an image, or a short audio/video into:
- channel-tailored post variants (LinkedIn, Instagram, X, Facebook, TikTok),
- a 7-day publication schedule,
- a precise image prompt (and optional image generations),
- one-click exports (ZIP “kit”, CSV, ICS calendar, Markdown),
- a quick A/B test with a scorecard and CTA recommendation.
It’s designed for non-technical users: a clean UI with a cards view for posts (plus a JSON view for power users). Uploading media is optional; the app can transcribe the first 60 seconds of audio/video to seed the brief.
Demo
- 🎥 Video walkthrough (1 min): https://youtu.be/4wix9K0JK3w
- 🚀 Live app (Cloud Run): https://social-coach-153575963272.us-central1.run.app/
- 🚀 Github repository: https://github.com/Medogo/gemini-challenge.git
If Gemini 2.5 Flash Image isn’t available in your quota, the app gracefully falls back to branded placeholders, and the video shows the full flow.
How I Used Google AI Studio
-
Text generation:
gemini-2.5-flash
via the Gemini API (Google AI Studio) for:- multi-channel post variants (with channel rules & soft limits),
- 7-day schedule suggestions (ISO times + brief “why”),
- a detailed 1080×1080 image prompt tailored to brand color/name.
(Optional) Image generation: configurable
GEMINI_IMAGE_MODEL
(e.g.,gemini-2.5-flash-image-preview
) for producing image variants per post. If not available, the app returns placeholder PNGs to preserve the UX.-
Multimodal input:
- Images provide context and can be auto-analyzed to bootstrap a brief when the text is empty.
-
Audio/Video get trimmed to 60s with
ffmpeg
, converted to mono 16 kHz WAV, then transcribed (used to enrich or replace the brief).
The app runs on Cloud Run; secrets (Gemini API key) are stored in Secret Manager.
Multimodal Features
- Input: text brief, image, or short audio/video (60s excerpt for speed & cost).
- Image analysis: caption, objects, colors, style, product, mood.
-
Brand kit hints:
brand_name
&brand_color
injected into post and image prompts. -
Image variants per post:
/images/zip
can generate multiple images for each post variant; supports an optional style reference upload. - A/B test: generates A & B for a chosen channel and returns a scorecard + recommended CTA.
-
Exports:
- ZIP Kit with variants/schedule/image prompt (+ README.md),
- ICS calendar events,
- CSV for post ops,
- Markdown snapshot.
UX Highlights
- Cards view for human-readable posts (titles, body, hashtags, CTA) with copy-to-clipboard.
- JSON view for raw output + copy button.
- “Quick Example” button: instantly pre-fills the form; optional sample image/audio files are auto-loaded.
- Clear status messages, file size limits (20 MB image / 100 MB media), and 60s request timeout on the frontend to avoid hangs.
Architecture (High-Level)
-
Frontend: Static HTML/CSS/JS (no framework) served by FastAPI’s
StaticFiles
. -
Backend: FastAPI + Gemini SDK (text & optional image models),
pydub
+ffmpeg
for media trimming,speech_recognition
for transcription. - Deployment: Docker → Cloud Build → Cloud Run (with min instance for warm starts); Secret Manager for API key; CORS open for demo; favicon + samples to keep logs clean and make the demo smooth.
Why Multimodal Matters Here
Marketing teams rarely start from a clean text brief. They have assets: a product photo, a CEO voice note, a teaser clip. Letting users drop any of those in and still get coherent text, a schedule, and images removes friction and showcases the power of Google AI Studio’s multimodal stack in a practical, demo-able way.
Try it
- 📺 Video: https://youtu.be/4wix9K0JK3w
- 🌐 Live app: https://social-coach-153575963272.us-central1.run.app/
- 🚀 Github repository: https://github.com/Medogo/gemini-challenge.git
(If the image model quota is unavailable, the demo still runs end-to-end with branded placeholders so judges can see the full flow.)
Top comments (0)