DEV Community

Hamzeh Alsarabi
Hamzeh Alsarabi

Posted on

I Built a Real-Time AI Voice Agent That Automates Your Business Workflows in 2 days - From Bahrain, During a War

I'm writing this from Bahrain, where the ongoing conflict in the Middle East has turned everything upside down. School went fully online, everything got canceled, and life became unpredictable. I joined the Gemini Live Agent Challenge hackathon late, and then when I finally had time to sit down and build - the war escalated all of a sudeen. I had two spare days before the deadline. That's it.
This is what I built in those two days: Perks Live - a real-time voice AI that watches your screen, analyzes your business workflows, and automatically dispatches an n8n automation blueprint to your inbox, complete with an importable JSON workflow file.
I'm genuinely proud of what I've been able to make in 2 days during high-tension moments and conflict in the region. Here's how it works and what broke along the way.

The Problem

Small business operators and automation agencies spend hours manually analyzing workflows before they can even begin designing automations. What if an AI could watch your screen, understand your operational context, and architect solutions while you talk through the problem out loud?

What It Does

Sees your screen - JPEG frames at 1 FPS sent to Gemini
Hears you - continuous 16-bit PCM audio at 16kHz streamed in real time
Responds with voice - Gemini's native audio output (Aoede voice)
Fires automation blueprints - tool call → n8n webhook → AI Agent → email with downloadable .json workflow attachment
Shows AI reasoning live - separate feed in the UI displaying the model's thought process

The full pipeline runs in under 60 seconds from voice request to email in your inbox.

Architecture

React frontend (Firebase Hosting) → streams audio + screen frames via WebSocket → Gemini Live API → tool call fires → n8n Webhook → AI Agent node generates valid n8n workflow JSON → Convert to File → SMTP email with .json attachment

Tech Stack

Gemini Live API — gemini-2.5-flash-native-audio-preview-12-2025 on v1alpha
Google GenAI SDK — @google/genai for the Live API connection
Firebase Hosting — Google Cloud deployment
Vite + React, Web Audio API, raw WebGL GLSL shader for the background
n8n — webhook, AI Agent node, email dispatch

The Hardest Problems I Hit

Wrong model + wrong API version. My API key didn't have gemini-2.0-flash-live-001. Every connection attempt returned a 1008 disconnect. The fix was discovering gemini-2.5-flash-native-audio-preview-12-2025 on v1alpha.
clientContent vs realtime_input. I was sending audio with turnComplete: true on every 256ms chunk, telling Gemini the conversation was over constantly. The fix:
javascriptwsRef.current.send(JSON.stringify({
realtime_input: {
media_chunks: [{ mime_type: "audio/pcm;rate=16000", data: base64PcmString }]
}
}));
Browser Audio Autoplay Policy. AudioContext must be created synchronously before any await — the moment the call stack yields, the browser revokes the gesture token permanently.
AI-generated JSON was always broken. Gemini kept producing malformed n8n workflow JSON; extra braces, missing position arrays, wrong connection formats. I solved this by removing JSON generation from Gemini entirely and delegating to a dedicated n8n AI Agent node with a schema-aware system prompt. It now produces clean, importable workflows every time.
Audio glitch on tool calls. The AI would say "blueprint has been fent" instead of "sent" because the previous audio stream was still playing when the new response started. Fix: flush the AudioContext immediately when a tool call fires.

What I'd Do Differently

With more time, the vision goes much further. Multiple specialized agents working in parallel — a vision analyst, a workflow architect, a code generator — each callable by the other based on context. Full two-way conversation transcription so every session is logged and searchable. Optional text input alongside voice for hybrid interaction. Persistent session memory so the AI remembers your workflows, your clients, your preferences across sessions. A dashboard where generated blueprints accumulate over time, searchable and editable. Deeper n8n integration where the AI doesn't just generate workflows but can query your existing ones, identify redundancies, and suggest consolidations. The two-day prototype proved the core loop works — the real product is just getting started.

Final Thought

Building something real under pressure; two days, online school, a war outside - taught me more about shipping fast than any tutorial ever could. The Gemini Live API is genuinely impressive. Talking to an AI that can see your screen and respond intelligently in real time still feels like the future, even after two days of debugging it.
I created this project for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

Links:

GitHub Repo
Demo Video
Live App

Top comments (0)