Social Creative Coach — Multimodal content & planning in one click

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

This is a submission for the Google AI Studio Multimodal Challenge

Social Creative Coach — Multimodal content & planning in one click

What I Built

Social Creative Coach is a small, production-ready app that turns a brief, an image, or a short audio/video into:

channel-tailored post variants (LinkedIn, Instagram, X, Facebook, TikTok),
a 7-day publication schedule,
a precise image prompt (and optional image generations),
one-click exports (ZIP “kit”, CSV, ICS calendar, Markdown),
a quick A/B test with a scorecard and CTA recommendation.

It’s designed for non-technical users: a clean UI with a cards view for posts (plus a JSON view for power users). Uploading media is optional; the app can transcribe the first 60 seconds of audio/video to seed the brief.

Demo

🎥 Video walkthrough (1 min): https://youtu.be/4wix9K0JK3w
🚀 Live app (Cloud Run): https://social-coach-153575963272.us-central1.run.app/
🚀 Github repository: https://github.com/Medogo/gemini-challenge.git

If Gemini 2.5 Flash Image isn’t available in your quota, the app gracefully falls back to branded placeholders, and the video shows the full flow.

How I Used Google AI Studio

Text generation: gemini-2.5-flash via the Gemini API (Google AI Studio) for:
- multi-channel post variants (with channel rules & soft limits),
- 7-day schedule suggestions (ISO times + brief “why”),
- a detailed 1080×1080 image prompt tailored to brand color/name.
(Optional) Image generation: configurable GEMINI_IMAGE_MODEL (e.g., gemini-2.5-flash-image-preview) for producing image variants per post. If not available, the app returns placeholder PNGs to preserve the UX.
Multimodal input:
- Images provide context and can be auto-analyzed to bootstrap a brief when the text is empty.
- Audio/Video get trimmed to 60s with ffmpeg, converted to mono 16 kHz WAV, then transcribed (used to enrich or replace the brief).

The app runs on Cloud Run; secrets (Gemini API key) are stored in Secret Manager.

Multimodal Features

Input: text brief, image, or short audio/video (60s excerpt for speed & cost).
Image analysis: caption, objects, colors, style, product, mood.
Brand kit hints: brand_name & brand_color injected into post and image prompts.
Image variants per post: /images/zip can generate multiple images for each post variant; supports an optional style reference upload.
A/B test: generates A & B for a chosen channel and returns a scorecard + recommended CTA.
Exports:
- ZIP Kit with variants/schedule/image prompt (+ README.md),
- ICS calendar events,
- CSV for post ops,
- Markdown snapshot.

UX Highlights

Cards view for human-readable posts (titles, body, hashtags, CTA) with copy-to-clipboard.
JSON view for raw output + copy button.
“Quick Example” button: instantly pre-fills the form; optional sample image/audio files are auto-loaded.
Clear status messages, file size limits (20 MB image / 100 MB media), and 60s request timeout on the frontend to avoid hangs.

Architecture (High-Level)

Frontend: Static HTML/CSS/JS (no framework) served by FastAPI’s StaticFiles.
Backend: FastAPI + Gemini SDK (text & optional image models), pydub + ffmpeg for media trimming, speech_recognition for transcription.
Deployment: Docker → Cloud Build → Cloud Run (with min instance for warm starts); Secret Manager for API key; CORS open for demo; favicon + samples to keep logs clean and make the demo smooth.

Why Multimodal Matters Here

Marketing teams rarely start from a clean text brief. They have assets: a product photo, a CEO voice note, a teaser clip. Letting users drop any of those in and still get coherent text, a schedule, and images removes friction and showcases the power of Google AI Studio’s multimodal stack in a practical, demo-able way.