This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge
The Problem Nobody Has Solved
306,000 people starred Open Claw on GitHub. They all want the same thing: a personal AI that actually does things. Sends emails. Manages calendars. Runs code. Browses the web. Learns new skills.
But every solution looks the same: clone the repo, install Docker, configure API keys, run terminal commands, manage a cloud bill. The technology is amazing. The accessibility is terrible.
8 billion people want a personal AI computer. 99% of them will never run a Docker container.
So I built Elora.
What Elora Is
Elora is not a chatbot. She's a personal AI computer that lives on your phone.
She has her own sandbox (a persistent cloud VM where she installs packages, runs code, and saves files - isolated per user). She has her own skill system (she can search for skills, install them, or write new ones from scratch). And she has a security layer that protects everything she does.
You download the app and talk to her. That's it. No setup. No API keys. No Docker.
Elora Live Voice Architecture
Elora Wake Word
The Tech Stack
| Layer | Technology |
|---|---|
| Mobile | Expo / React Native (TypeScript) |
| Voice | Gemini Live API (real-time bidirectional audio) |
| Agent | Google ADK (multi-agent orchestration) |
| LLM | Gemini 2.0 Flash / 2.5 Flash |
| Browser | Playwright + Gemini 2.5 Flash (computer use) |
| Code Sandbox | E2B (per-user persistent VMs) |
| Skills | Custom skill engine (search, install, create, execute) |
| Security | Agntor trust protocol |
| Memory | MemU + Firestore + text-embedding-004 |
| Backend | FastAPI on Google Cloud Run |
| IaC | Terraform + GitHub Actions CI/CD |
Let me walk through how I built the pieces that matter.
1. Voice That Feels Alive - Gemini Live API
The Gemini Live API is what makes Elora feel real. It's full duplex audio - she talks while you talk, you can interrupt her mid-sentence, and she handles it naturally.
Here's the architecture for voice:
Phone (mic) → PCM audio chunks via WebSocket → Cloud Run
→ Gemini Live API session (bidirectional)
→ Audio response chunks → WebSocket → Phone (speaker)
The mobile app maintains three simultaneous WebSocket connections:
- Text chat - ADK agent with full tool calling
- Live audio - Gemini Live API with real-time audio streaming
- Wake word - Always-on "Hey Elora" detection
The wake word detector is its own Gemini Live session configured to only respond with "WAKE" when it hears the trigger phrase. Minimal tokens, always listening.
The hardest part: Gemini Live API doesn't support ADK's tool-calling protocol natively. So I built a parallel system - manual JSON schemas for every tool declaration, a dispatch function that maps tool names to the same Python functions the ADK agent uses, and a response handler that streams tool results back into the Live session. Every tool works in both text mode (ADK) and voice mode (Live API).
2. Vision - She Sees Your World
During a live call, Elora watches through your camera. The mobile app captures frames and sends them as base64 JPEG over the WebSocket. On the backend, a proactive vision loop runs every 3 seconds:
# Simplified proactive vision logic
if camera_active and user_quiet_for_8s and last_proactive_25s_ago:
frame = latest_camera_frame
faces = await recognize_faces(frame, user_id)
prompt = f"[VISION CHECK] You see: {faces}. Comment if relevant."
session.send(prompt + frame)
# If Elora responds with <silent>, swallow the response
She doesn't just respond when asked - she speaks up when she sees something worth mentioning. Point the camera at a friend she's seen before, and she'll say their name. That's face recognition using Gemini Vision with two-pass comparison against stored reference images in Cloud Storage.
3. The Skill System - Why Elora Is a Computer, Not a Chatbot
This is the feature I'm most proud of. Every AI assistant has a fixed set of tools. Elora can learn new ones.
The skill system works in four modes:
Search: Query the skill registry (bundled + community) by keyword.
Install: Download a skill definition (YAML metadata + Python code) into the user's Firestore profile and deploy it to their sandbox.
Execute: Load the skill code, fill in template parameters, and run it in the user's E2B sandbox. Real code, real output.
Create: This is the magic. Tell Elora "create a skill that checks if a website is up" and she:
- Writes the Python code
- Creates a YAML skill definition with parameters
- Tests the code in your sandbox with a dry run
- Validates the output
- Saves it permanently to your library
The skill you asked for now exists forever. You can run it tomorrow, next week, next year.
# Bundled skills ship with Elora
BUNDLED_SKILLS = {
"weather": {
"name": "weather",
"description": "Get current weather for any city",
"code": "import requests\nurl = f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}¤t_weather=true'\n..."
},
"crypto_prices": { ... },
"hackernews": { ... },
"exchange_rates": { ... },
"wikipedia": { ... },
"rss_reader": { ... },
}
Six skills ship bundled. Users can create unlimited custom ones. And there's a community registry where you can publish skills for others to use.
This is what transforms Elora from "assistant" to "computer." A computer isn't defined by what it ships with - it's defined by what you can make it do.
4. Per-User Sandbox - Your Computer in the Cloud
Every Elora user gets their own isolated cloud VM via E2B. This isn't shared compute - it's YOUR machine.
def get_or_create_sandbox(user_id: str):
# Check in-memory cache
if user_id in _active_sandboxes:
return _active_sandboxes[user_id]
# Check Firestore for paused sandbox ID
doc = _get_sandbox_doc(user_id).get()
if doc.exists:
sandbox_id = doc.to_dict().get("sandbox_id")
# Reconnect to existing sandbox
sandbox = Sandbox.connect(sandbox_id)
else:
# Create new sandbox with pre-installed packages
sandbox = Sandbox(timeout=3600, metadata={"user_id": user_id})
sandbox.commands.run("pip install requests beautifulsoup4 feedparser pyyaml")
sandbox.commands.run("mkdir -p /home/user/skills /home/user/workspace /home/user/data")
# Persist sandbox ID
_get_sandbox_doc(user_id).set({"sandbox_id": sandbox.sandbox_id})
_active_sandboxes[user_id] = sandbox
return sandbox
Sandboxes auto-pause when idle and reconnect when needed. Packages you install persist. Files you create persist. The sandbox ID is stored in Firestore so it survives server restarts.
When Elora runs code for you - whether it's a skill, a script you asked for, or a data analysis
it runs in YOUR sandbox. Nobody else can see it or touch it.
5. Security - The Agntor Trust Protocol
When your AI agent has access to your email, calendar, files, and code execution, security isn't optional.
The Agntor trust protocol runs as middleware on every incoming message:
Prompt injection guard - 12 regex patterns + 3 heuristic checks + structural analysis. Catches "ignore previous instructions" and its 50 variants.
PII/secret redaction - Detects and masks API keys, tokens, credit card numbers, and SSNs before they reach the model.
Tool guardrails - Blocklist (shell.exec, eval) and confirmation list (send_email, delete_file). Dangerous tools are blocked. Sensitive tools require explicit confirmation.
SSRF protection - Validates all URLs against private IP ranges with DNS resolution. Prevents the model from being tricked into accessing internal services.
Agent identity - A verifiable identity endpoint that exposes Elora's capabilities and security posture.
$ curl https://elora-backend-453139277365.us-central1.run.app/agent/identity
{
"agent_name": "Elora",
"version": "0.5.0",
"security": {
"prompt_guard": true,
"pii_redaction": true,
"tool_guardrails": true,
"ssrf_protection": true
}
}
The entire security layer is pure Python - no external dependencies. It's fast enough to run on every message without noticeable latency.
6. Multi-Agent Architecture - Google ADK
Elora uses Google's Agent Development Kit (ADK) with a hierarchical multi-agent architecture:
elora_root (orchestrator)
├── web_researcher → web_search + fetch_webpage
├── browser_worker → Playwright + Gemini computer-use
├── email_calendar → Gmail + Google Calendar (full CRUD)
├── file_memory → Cloud Storage + Firestore memory
└── research_loop → LoopAgent with self-verification
The root agent decides which sub-agent to delegate to based on the user's intent. "Send an email" goes to email_calendar. "What's on Hacker News" goes to browser_worker. "Remember that I prefer morning meetings" goes to file_memory.
ADK's constraint of one parent per agent forced clean separation of concerns. Each sub-agent has exactly the tools it needs and nothing more.
7. 40+ Real Tools
These aren't mock tools. They execute real actions:
- Gmail - Send, read, archive, trash, label, batch manage
- Google Calendar - Create, update, delete, list, search
- Browser - Playwright opens real pages, takes screenshots, Gemini reasons about what it sees
- Code execution - Python and JavaScript in your personal sandbox
- SMS - Twilio (with deep-link fallback)
- Google Slides & Docs - Programmatic creation with shareable links
- Face recognition - Two-pass Gemini Vision comparison against stored references
- File management - Upload, read, list, delete in per-user Cloud Storage
- Reminders - Natural language time parsing, push notification delivery
- People memory - Names, relationships, birthdays, contact info, appearance
- Proactive engine - Meeting alerts, birthday nudges, stale contact check-ins
8. Memory - She Remembers Everything
Elora has a 3-layer memory system:
Layer 1: Raw facts. After every conversation, a background task extracts key facts and stores them as vector embeddings (text-embedding-004) in Firestore. Semantic search retrieves relevant memories on every new conversation.
Layer 2: Compacted profile. Periodically, Gemini Flash merges and deduplicates raw facts into a structured user profile - preferences, relationships, work info, goals.
Layer 3: Session summaries. After every call, a summary is generated. The last 3 summaries are injected into the next session for continuity.
This is powered by MemU, which achieves 92% accuracy on the Locomo memory benchmark at 10x lower always-on cost compared to traditional RAG approaches.
Deployment
The entire backend deploys to Google Cloud Run with a single git push:
- GitHub Actions builds the Docker image
- Pushes to Artifact Registry
- Deploys to Cloud Run with all environment variables
- Creates Firestore indexes
Infrastructure is managed with Terraform:
resource "google_cloud_run_service" "elora_backend" {
name = "elora-backend"
location = "us-central1"
template {
spec {
containers {
image = "us-central1-docker.pkg.dev/${var.project_id}/elora/backend:latest"
resources {
limits = { cpu = "2", memory = "2Gi" }
}
}
timeout_seconds = 3600 # Long-running WebSocket connections
}
}
}
40+ Cloud Run revisions tell the development story. The backend has been continuously deployed and iterated throughout the build.
What I Learned
The gap between "chatbot" and "computer" is isolation, persistence, and extensibility. It's not about making the LLM smarter. It's about giving it a sandbox that persists, a skill system that grows, and security that you can trust. That's what makes it a computer.
Security can't be an afterthought. When your agent can read your email, send texts, and execute code, prompt injection isn't theoretical - it's an attack vector. Build the guard first.
ADK multi-agent is production-ready. The one-parent-per-agent constraint feels limiting at first, but it forces clean architecture. Each agent has exactly the tools and context it needs.
Try It
The backend is live:
https://elora-backend-453139277365.us-central1.run.app
The code is open:
https://github.com/Garinmckayl/elora
To run the mobile app:
git clone https://github.com/Garinmckayl/elora.git
cd elora/app
npm install
npx expo start --tunnel
Scan the QR code with Expo Go. Talk to Elora.
Built by a solo developer in Addis Ababa, Ethiopia for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge
GitHub: github.com/Garinmckayl/elora


Top comments (0)