DEV Community

Cover image for Social Creative Coach — Multimodal content & planning in one click
Dominique Megnidro
Dominique Megnidro

Posted on

Social Creative Coach — Multimodal content & planning in one click

This is a submission for the Google AI Studio Multimodal Challenge

Social Creative Coach — Multimodal content & planning in one click

What I Built

Social Creative Coach is a small, production-ready app that turns a brief, an image, or a short audio/video into:

  • channel-tailored post variants (LinkedIn, Instagram, X, Facebook, TikTok),
  • a 7-day publication schedule,
  • a precise image prompt (and optional image generations),
  • one-click exports (ZIP “kit”, CSV, ICS calendar, Markdown),
  • a quick A/B test with a scorecard and CTA recommendation.

It’s designed for non-technical users: a clean UI with a cards view for posts (plus a JSON view for power users). Uploading media is optional; the app can transcribe the first 60 seconds of audio/video to seed the brief.

Demo

If Gemini 2.5 Flash Image isn’t available in your quota, the app gracefully falls back to branded placeholders, and the video shows the full flow.

How I Used Google AI Studio

  • Text generation: gemini-2.5-flash via the Gemini API (Google AI Studio) for:

    • multi-channel post variants (with channel rules & soft limits),
    • 7-day schedule suggestions (ISO times + brief “why”),
    • a detailed 1080×1080 image prompt tailored to brand color/name.
  • (Optional) Image generation: configurable GEMINI_IMAGE_MODEL (e.g., gemini-2.5-flash-image-preview) for producing image variants per post. If not available, the app returns placeholder PNGs to preserve the UX.

  • Multimodal input:

    • Images provide context and can be auto-analyzed to bootstrap a brief when the text is empty.
    • Audio/Video get trimmed to 60s with ffmpeg, converted to mono 16 kHz WAV, then transcribed (used to enrich or replace the brief).

The app runs on Cloud Run; secrets (Gemini API key) are stored in Secret Manager.

Multimodal Features

  • Input: text brief, image, or short audio/video (60s excerpt for speed & cost).
  • Image analysis: caption, objects, colors, style, product, mood.
  • Brand kit hints: brand_name & brand_color injected into post and image prompts.
  • Image variants per post: /images/zip can generate multiple images for each post variant; supports an optional style reference upload.
  • A/B test: generates A & B for a chosen channel and returns a scorecard + recommended CTA.
  • Exports:

    • ZIP Kit with variants/schedule/image prompt (+ README.md),
    • ICS calendar events,
    • CSV for post ops,
    • Markdown snapshot.

UX Highlights

  • Cards view for human-readable posts (titles, body, hashtags, CTA) with copy-to-clipboard.
  • JSON view for raw output + copy button.
  • “Quick Example” button: instantly pre-fills the form; optional sample image/audio files are auto-loaded.
  • Clear status messages, file size limits (20 MB image / 100 MB media), and 60s request timeout on the frontend to avoid hangs.

Architecture (High-Level)

  • Frontend: Static HTML/CSS/JS (no framework) served by FastAPI’s StaticFiles.
  • Backend: FastAPI + Gemini SDK (text & optional image models), pydub + ffmpeg for media trimming, speech_recognition for transcription.
  • Deployment: Docker → Cloud Build → Cloud Run (with min instance for warm starts); Secret Manager for API key; CORS open for demo; favicon + samples to keep logs clean and make the demo smooth.

Why Multimodal Matters Here

Marketing teams rarely start from a clean text brief. They have assets: a product photo, a CEO voice note, a teaser clip. Letting users drop any of those in and still get coherent text, a schedule, and images removes friction and showcases the power of Google AI Studio’s multimodal stack in a practical, demo-able way.


Try it

(If the image model quota is unavailable, the demo still runs end-to-end with branded placeholders so judges can see the full flow.)

Top comments (0)