William

Posted on Mar 13 • Edited on Mar 20

Building Choragi: How We Orchestrated a 6-Agent Concert Planning System with Gemini Live and Google Cloud

#geminiliveagentchallenge #googlecloud #ai #java

*We created this piece of content for the purposes of entering the Gemini Live Agent Challenge.

GeminiLiveAgentChallenge*

Music has a universal power to heal and bring communities together. But behind every magical live concert is a grueling logistical nightmare. Independent artists and event planners spend weeks scouting venues, making stressful phone calls to negotiate rates, designing promotional materials, and manually wrestling with ad campaigns.

We thought: What if we could build an engine that takes away the logistical burden, turning a month of planning into a 5-minute automated sequence?

Enter Choragi—an autonomous, multi-agent event orchestration system triggered entirely by a natural voice conversation. In this post, I’ll break down how we built this system using Java Spring Boot, Gemini Live API, Google Cloud Run, and the absolute cutting edge of Google's generative AI models.

The Architecture: A Serverless Symphony

Choragi is not a single monolith; it is a highly specialized microservices architecture. We built 6 independent Spring Boot applications and deployed them entirely on Google Cloud Run.

By leveraging Cloud Run, we achieved a scalable, serverless backend where each agent scales independently based on its workload. We secured the entire internal network using Google Cloud's Application Default Credentials (ADC), entirely eliminating the need for hardcoded service account keys in our codebase.

Here is the breakdown of the fleet:

UI Dashboard (ui-client): The real-time WebSockets command center.
Venue Scout (venue-finder): Discovers potential concert spaces.
Live Negotiator (live-negotiator): Telephony agent connecting Twilio to Gemini.
Creative Director (creative-director): Generates 8K posters and cinematic video trailers.
Site Builder (site-builder): Autonomously deploys a live promotional website.
Digital Promoter (digital-promoter): Navigates Google Ads to launch campaigns.

The Brains: Integrating Google AI Models

To make this pipeline truly autonomous, we had to fuse multiple modalities of AI.

1. Real-Time Telephony with Gemini 2.5 Flash Native Audio

The most ambitious part of Choragi was the live-negotiator. We wanted the AI to actually call a venue owner over the phone and negotiate a booking space.

We bridged a Twilio WebSocket stream directly to the Gemini BidiGenerateContent API. Because Twilio streams audio in 8kHz MuLaw format and Gemini strictly requires 16kHz PCM, we had to build an on-the-fly byte transcoder in Java. We utilized the ["AUDIO"] response modality to ensure the model spoke naturally, and we implemented a custom Voice Activity Detection (VAD) algorithm using RMS thresholding to prevent noisy phone lines from interrupting the AI.

2. Multi-Modal Creative Generation

Once the venue is secured, the creative-director agent takes over to generate promotional assets, directly uploading the results to a public Google Cloud Storage bucket.

Tour Posters: We utilized Gemini 2.5 Flash Image to generate highly realistic, professional concert posters that strictly adhered to text prompts for the artist's name and location.
Cinematic Video Trailers: We integrated Vertex AI Veo 3.0 Fast via its Long-Running Operations REST API to generate photorealistic concert stage visuals.

3. Robotic Web Automation with Gemini Vision

For the digital-promoter, we didn't want to use standard APIs. We wanted the agent to navigate the web like a human. Using Microsoft Playwright in a headless Chromium container, the agent literally "looks" at the Google Ads dashboard using Gemini 2.5 Flash. The model analyzes the screenshot of the DOM and outputs precise visual coordinates and text commands (e.g., CLICK_TEXT: Page views, FILL_FIELD: businessName) to autonomously launch the campaign.

The Hardest Technical Challenge: Vertex AI REST Routing

When building hackathon projects, you quickly find the bleeding edge of new APIs. When integrating Veo 3.0 Fast, we bypassed the standard SDK and hit the raw Vertex AI predictLongRunning REST endpoints.

We successfully authenticated and triggered the video generation, receiving an Operation ID (a UUID) to poll for the video's completion. However, we discovered a routing quirk: polling the standard v1 API with a UUID resulted in a 400 Bad Request: The Operation ID must be a Long. The v1 endpoint was strictly expecting numeric legacy IDs!

The Solution: We engineered a resilient fallback mechanism. We stripped the publisher paths from the operation name and routed the polling request to the experimental v1beta1 endpoint. Furthermore, we wrapped the polling loop in a "God Mode" safety net—if the Google routing API threw an exception, our Spring Boot service caught the HttpStatusCodeException, gracefully waited 40 seconds for the backend Veo rendering to finish, and successfully returned the Cloud Storage URL anyway. The architecture held strong.

Conclusion

Building Choragi pushed us to the limits of real-time streaming, asynchronous microservices, and multi-modal AI orchestration. By combining the infrastructure of Google Cloud with the intelligence of Gemini and Vertex AI, we successfully transformed the logistical chaos of event planning into an elegant, autonomous engine.

Artists should spend their time creating music that heals, not navigating ad campaigns and making cold calls. With AI orchestration, we can finally let them get back to the music.

Check out our full repository and project submission for the #GeminiLiveAgentChallenge!

DEV Community