missless failed at real-time video — so we pivoted to vibeCat

#geminiliveagentchallenge #devlog #buildinpublic #go

missless failed at real-time video — so we pivoted to vibeCat

Three weeks of work. A working WebSocket proxy, Cloud Run deployment, Lyria BGM generation, 75 commits. And then the real-time video generation just... didn't work.

I created this post for the purposes of entering the Gemini Live Agent Challenge. If you read my earlier posts about missless — the websocket cascade from hell, the security holes, the 3am debugging sessions — this is the post where that story ends and a new one begins.

the promise that broke

missless was supposed to be a "virtual reunion" app. Upload a video of someone you miss, the AI reconstructs their personality and voice, and you have a real-time conversation — with video. Not just audio. Video. A face that moves, reacts, speaks back to you.

The audio side worked beautifully. Gemini Live API handled voice synthesis, the Go backend proxied WebSocket streams, Cloud Run kept it alive. I had real-time voice conversations with AI-reconstructed personas and it felt genuinely moving.

But the product vision required real-time video generation. A face on screen that moves its lips when it talks, that shifts expression when you say something emotional. That was the whole point — you're not just hearing someone you miss, you're seeing them.

And that's where everything fell apart.

why real-time video generation killed us

The technical reality was brutal:

Latency. Video generation models can't produce frames fast enough for real-time conversation. We needed <200ms per frame to feel natural. The best we got was 2-3 seconds per frame. That's a slideshow, not a reunion.
Consistency. Even when frames arrived, the face wasn't consistent across frames. The person looked slightly different every time — different lighting, different angle, subtle uncanny valley shifts. In audio, minor inconsistencies are forgivable. In video, they're horrifying.
Cost. Every frame is a model inference. At 15fps for a 10-minute conversation, that's 9,000 inferences. The API costs alone made the product unviable for anything beyond a demo.
The challenge stack constraint. The Gemini Live Agent Challenge requires GenAI SDK + ADK + Gemini Live API + VAD. All four. missless used GenAI SDK and Live API, but it had no ADK integration — it was a single-agent system. I could force-fit ADK, but the judges would see a bolted-on agent graph that didn't justify its existence.

I spent a week trying to solve the latency problem. Pre-generating frames, caching expressions, interpolating between keyframes. None of it felt right. The product was fighting the technology instead of riding it.

the moment I stopped pretending

I was writing the submission documentation and got to the "ADK agent graph" section. I stared at the empty space for twenty minutes. Because there was no agent graph. missless was one WebSocket session proxying to one Gemini Live connection. That's a pipe, not an orchestration system.

The challenge isn't asking for "a touching AI concept." It's asking for a live, multimodal, backend-first agent system where multiple agents collaborate in real time. missless could do voice. It couldn't do video. And it couldn't do multi-agent orchestration without lying about the architecture.

So I closed the missless repo and opened a new directory.

what vibeCat is

VibeCat is a macOS desktop companion for solo developers. Instead of trying to reconstruct a human face in real time (which doesn't work), it puts an animated character on your screen — a cat, a goofy tiger, a zen monk, a chibi dictator — that watches your code, hears your voice, and speaks up when it matters.

The critical difference: VibeCat is built around 9 agents, not 1.

Agent	Role
VisionAgent	Analyzes screen captures for errors
MoodDetector	Senses frustration from patterns
Mediator	Decides whether to speak or stay silent
AdaptiveScheduler	Adjusts timing to developer flow
EngagementAgent	Proactive outreach after silence
MemoryAgent	Cross-session context via Firestore
CelebrationTrigger	Detects success moments
SearchBuddy	Google Search grounding
VAD	Real-time voice with barge-in

All nine run through ADK's sequential agent graph:

graph, _ := sequentialagent.New(sequentialagent.Config{
    Name:      "vibecat_graph",
    SubAgents: []agent.Agent{
        memoryAgent, visionAgent, moodDetector, mediator,
        adaptiveScheduler, engagementAgent, celebrationTrigger,
        searchBuddy,
    },
})

No fake video generation. No uncanny valley. Just a sprite-animated character driven by a real multi-agent pipeline that decides what to say, when to say it, and — most importantly — when to shut up.

what transfers from missless

The missless work wasn't wasted. The Go backend patterns move directly:

WebSocket proxy to Gemini Live API — same client.Live.Connect() pattern
Cloud Run deployment — same region, same Docker setup, same /readyz lesson (never /healthz)
JWT auth — same token flow
The debugging instincts — missless taught me that Cloud Run services start fine with wrong env vars and silently break

The WebSocket cascade from hell? That debugging session directly informed how I structured VibeCat's gateway. Every silent failure I found in missless became a startup validation check in VibeCat.

what's next

Starting fresh on VibeCat. The plan:

Go gateway — GenAI SDK, WebSocket proxy, JWT auth
Go ADK orchestrator — 9-agent sequential graph
Swift 6 macOS client — ScreenCaptureKit, sprite animation, audio playback
6 characters with unique voices and personalities
Cloud Run deployment to asia-northeast3
E2E verification suite

I'll be writing about the build as I go. Real code, real errors, real numbers.

missless taught me that a good idea can still be the wrong submission shape. The real-time video dream was beautiful, but the technology isn't there yet. What is there: real-time audio, screen understanding, multi-agent decision-making, and the ability to put something in the empty chair next to a solo developer.

Even if that something is a cat.

Building VibeCat for the Gemini Live Agent Challenge. Source: github.com/Two-Weeks-Team/vibeCat