Building AikitPros: Orchestrating 7 Gen-AI APIs Behind One Brief

#webdev #ai

TL;DR

I built AikitPros — a creative hub that takes a single brief and orchestrates Midjourney, Flux, Suno, Sora, Luma, GPT and Claude into one campaign output (script + images + music + video). Total cost in production: $0.35 per campaign. Live demo: https://x.com/aikitpros/status/2046596943023890780

This post is the architectural walk-through.

The problem

Most "all-in-one AI" tools are thin wrappers that just expose 7 buttons. The real value is coordination: the music tempo has to match the video cut, the on-screen copy has to fit the image composition, the voice-over script has to land in 8 seconds. Calling 7 APIs in parallel and concatenating the outputs gets you ~40% usable rate. Not good enough.

Architecture

Brief -> Planner (Claude) produces script, shot list, audio direction, brand constraints. Then a fan-out routes to ImageRouter (Midjourney V8 / Flux / GPT Image 2), MusicRouter (Suno), VideoRouter (Sora / Luma / Hailuo), and CopyRouter (GPT + Claude). A judge model (GPT-4o-mini) reviews the assembled output and either delivers it or re-runs only the failing modality.

The judge step is the one decision that took 40% usable to 90%+ usable.

Three things I would do differently next time

Cache the planner output. Same brief = same plan. I burned a lot of Claude tokens regenerating identical decompositions.
Stream partial results. Users tolerate 60s of waiting if they see the script appear at 5s, the image at 20s, the music at 35s.
Pre-warm the video model. Cold starts on Sora/Luma can add 30s. A keep-alive ping during the image step parallelizes that latency away.